Audo data for DL

참고 : https://www.youtube.com/watch?v=fMqL5vckiU0&list=PL-wATfeyAMNrtbkCNsLcpoAyBBRJZVlnf

1. Waveform

Key concpets

Period ( = seconds / cycle )
- inverse: Frequency ( = cycle / second )
Amplitude

Mathematical expression

\(y(t)=A \sin (2 \pi f t+\varphi)\).

\(t\) : time index
\(A\) : amplitude
\(f\) : frequency
\(\varphi\) : phase

2. Frequency/pitch & Amplitude/loudness

LOW frequency / LOW amplitude

HIGH frequency / HIGH amplitude

HIGH frequency \(\rightarrow\) HIGH pitch

HIGH amplitude \(\rightarrow\) LOWD sound

3. Sampling

sampling period : \(T\)
- time index: \(t_n = n \cdot T\)
samplig rate : \(1/T\)

4. Aliasing vs. Quantization

(1) Aliasing ( = X-axis )

original signal (RED) : high frequency
reconstructed signal (BLUE) : low frequency

\(\rightarrow\) removing certain frequencyes ABOVE ceratin threshold

(2) Quantization ( = Y-axis )

5. Analiog Digital Conversion (ADC)

[X] sample signal at uniform time intervals

[Y] quantize with (limited number of) bits

ex) CD :

sample rate = 44100 Hz ( frequency )
Bit = 16 bits / channel

6. 1 min = xx Byte?

Sampling rate = 44100Hz

44100 points per second

Bit depth = 16 bit

amplitude is quantized into 16 bits ( \(2^{16}\) possibilities)

Total Memory of Sound in 1 minute ( in .wav file )

number of bits per second : \(16 \times 44,100\)
number of megabits per second : \((16 \times 44,100) / 1,048,576\)
number of megabytes per second : \((16 \times 44,100) / (1,048,576\times8)\)
number of megabytes per mintue : \((16 \times 44,100) / (1,048,576\times8)\) \(\times 60 = 5.49\text{MB}\)

\(\rightarrow\) to shrink memory, we use .mp3 file!

4. Fourier Transform

*from TIME domain to FREQUENCY domain

( but time information is lost )

decompse sound into sum of sine waves ( oscillating at different frequencies )

ex) decompose into 2 sine waves

\(s=A_1 \sin \left(2 \pi f_1 t+\varphi_1\right)+A_2 \sin \left(2 \pi f_2 t+\varphi_2\right)\).

\(A_1=0.5, f_1=4, \varphi_1=0\\\).
\(A_2=1.5, f_2=1.5, \varphi_2=0\).

decompose into mulitple waves

5. Short Time Fourier Transform (STFT)

problem: TIME INFORMATION is lost due to FT

solution : Short Time Fourier Transform (STFT)

(1) compute multiple FFT at different intervals
- able to preserve TIME info
(2) FIXED frame size
- ex) 2048 samples per interval
(3) output = SPECTOGRAM

( = time + frequency + magnitude )

6. Pre-processing pipeline for Audio

(1) DL

(2) (Traditional) ML

\(\rightarrow\) requires much feature engineering!

7. Mel frequency Cepstral Coefficients (MFCCs)

MFCCs

Frequency domain feature
Capture timbral/textural aspects of sound
Approximate human auditory system
13 to 40 coefficient
Calculated at each frame
- need to perform SFTF first!

Applications:

speech recognition
music genre classificaiton

Twitter Facebook LinkedIn

Audo data for DL

Seunghan Lee

Audo data for DL

1. Waveform

2. Frequency/pitch & Amplitude/loudness

3. Sampling

4. Aliasing vs. Quantization

(1) Aliasing ( = X-axis )

(2) Quantization ( = Y-axis )

5. Analiog Digital Conversion (ADC)

6. 1 min = xx Byte?

4. Fourier Transform

5. Short Time Fourier Transform (STFT)

6. Pre-processing pipeline for Audio

(1) DL

(2) (Traditional) ML

7. Mel frequency Cepstral Coefficients (MFCCs)

You May Also Enjoy