Audo data for DLPermalink
참고 : https://www.youtube.com/watch?v=fMqL5vckiU0&list=PL-wATfeyAMNrtbkCNsLcpoAyBBRJZVlnf
1. WaveformPermalink
Key concpets
- Period ( = seconds / cycle )
- inverse: Frequency ( = cycle / second )
- Amplitude
Mathematical expression
y(t)=Asin(2πft+φ).
- t : time index
- A : amplitude
- f : frequency
- φ : phase
2. Frequency/pitch & Amplitude/loudnessPermalink
LOW frequency / LOW amplitude
HIGH frequency / HIGH amplitude
HIGH frequency → HIGH pitch
HIGH amplitude → LOWD sound
3. SamplingPermalink
- sampling period : T
- time index: tn=n⋅T
- samplig rate : 1/T
4. Aliasing vs. QuantizationPermalink
(1) Aliasing ( = X-axis )Permalink
- original signal (RED) : high frequency
- reconstructed signal (BLUE) : low frequency
→ removing certain frequencyes ABOVE ceratin threshold
(2) Quantization ( = Y-axis )Permalink
5. Analiog Digital Conversion (ADC)Permalink
[X] sample signal at uniform time intervals
[Y] quantize with (limited number of) bits
ex) CD :
- sample rate = 44100 Hz ( frequency )
- Bit = 16 bits / channel
6. 1 min = xx Byte?Permalink
Sampling rate = 44100Hz
- 44100 points per second
Bit depth = 16 bit
- amplitude is quantized into 16 bits ( 216 possibilities)
Total Memory of Sound in 1 minute ( in .wav
file )
- number of bits per second : 16×44,100
- number of megabits per second : (16×44,100)/1,048,576
- number of megabytes per second : (16×44,100)/(1,048,576×8)
- number of megabytes per mintue : (16×44,100)/(1,048,576×8) ×60=5.49MB
→ to shrink memory, we use .mp3
file!
4. Fourier TransformPermalink
*from TIME domain to FREQUENCY domain
( but time information is lost )
decompse sound into sum of sine waves ( oscillating at different frequencies )
ex) decompose into 2 sine waves
s=A1sin(2πf1t+φ1)+A2sin(2πf2t+φ2).
- A1=0.5,f1=4,φ1=0.
- A2=1.5,f2=1.5,φ2=0.
- decompose into mulitple waves
5. Short Time Fourier Transform (STFT)Permalink
problem: TIME INFORMATION is lost due to FT
solution : Short Time Fourier Transform (STFT)
-
(1) compute multiple FFT at different intervals
- able to preserve TIME info
-
(2) FIXED frame size
- ex) 2048 samples per interval
-
(3) output = SPECTOGRAM
( = time + frequency + magnitude )
6. Pre-processing pipeline for AudioPermalink
(1) DLPermalink
(2) (Traditional) MLPermalink
→ requires much feature engineering!
7. Mel frequency Cepstral Coefficients (MFCCs)Permalink
MFCCs
- Frequency domain feature
- Capture timbral/textural aspects of sound
- Approximate human auditory system
- 13 to 40 coefficient
- Calculated at each frame
- need to perform SFTF first!
Applications:
- speech recognition
- music genre classificaiton