Audo data for DLPermalink

참고 : https://www.youtube.com/watch?v=fMqL5vckiU0&list=PL-wATfeyAMNrtbkCNsLcpoAyBBRJZVlnf


figure2


1. WaveformPermalink

Key concpets

  • Period ( = seconds / cycle )
    • inverse: Frequency ( = cycle / second )
  • Amplitude

figure2

figure2


Mathematical expression

y(t)=Asin(2πft+φ).

  • t : time index
  • A : amplitude
  • f : frequency
  • φ : phase


figure2


2. Frequency/pitch & Amplitude/loudnessPermalink

LOW frequency / LOW amplitude

HIGH frequency / HIGH amplitude

figure2


HIGH frequency HIGH pitch

HIGH amplitude LOWD sound


3. SamplingPermalink

figure2

  • sampling period : T
    • time index: tn=nT
  • samplig rate : 1/T


figure2


4. Aliasing vs. QuantizationPermalink

(1) Aliasing ( = X-axis )Permalink

  • original signal (RED) : high frequency
  • reconstructed signal (BLUE) : low frequency

removing certain frequencyes ABOVE ceratin threshold

figure2


(2) Quantization ( = Y-axis )Permalink

figure2


5. Analiog Digital Conversion (ADC)Permalink

[X] sample signal at uniform time intervals

[Y] quantize with (limited number of) bits

figure2


ex) CD :

  • sample rate = 44100 Hz ( frequency )
  • Bit = 16 bits / channel


6. 1 min = xx Byte?Permalink

Sampling rate = 44100Hz

  • 44100 points per second

Bit depth = 16 bit

  • amplitude is quantized into 16 bits ( 216 possibilities)


Total Memory of Sound in 1 minute ( in .wav file )

  • number of bits per second : 16×44,100
  • number of megabits per second : (16×44,100)/1,048,576
  • number of megabytes per second : (16×44,100)/(1,048,576×8)
  • number of megabytes per mintue : (16×44,100)/(1,048,576×8) ×60=5.49MB

to shrink memory, we use .mp3 file!


4. Fourier TransformPermalink

*from TIME domain to FREQUENCY domain

( but time information is lost )


decompse sound into sum of sine waves ( oscillating at different frequencies )

figure2


ex) decompose into 2 sine waves

s=A1sin(2πf1t+φ1)+A2sin(2πf2t+φ2).

  • A1=0.5,f1=4,φ1=0.
  • A2=1.5,f2=1.5,φ2=0.


figure2

  • decompose into mulitple waves


5. Short Time Fourier Transform (STFT)Permalink

problem: TIME INFORMATION is lost due to FT

solution : Short Time Fourier Transform (STFT)

  • (1) compute multiple FFT at different intervals

    • able to preserve TIME info
  • (2) FIXED frame size

    • ex) 2048 samples per interval
  • (3) output = SPECTOGRAM

    ( = time + frequency + magnitude )


figure2


6. Pre-processing pipeline for AudioPermalink

(1) DLPermalink

figure2


(2) (Traditional) MLPermalink

figure2

requires much feature engineering!


7. Mel frequency Cepstral Coefficients (MFCCs)Permalink

figure2


MFCCs

  • Frequency domain feature
  • Capture timbral/textural aspects of sound
  • Approximate human auditory system
  • 13 to 40 coefficient
  • Calculated at each frame
    • need to perform SFTF first!


Applications:

  • speech recognition
  • music genre classificaiton

figure2

Categories: ,

Updated: