Mel-Frequency Cepstral Coefficients (MFCCs)

참고 : https://www.youtube.com/watch?v=fMqL5vckiU0&list=PL-wATfeyAMNrtbkCNsLcpoAyBBRJZVlnf


1. Introduction

Mel-Frequency Cepstral Coefficients

  • Cepstral: Cepstrum \(\leftrightarrow\) Spectrum

figure2


How to compute Cepstrum?

\(C(x(t))=F^{-1}[\log (F[x(t)])]\).


figure2

figure2


2. Vocal tract

Vocal Tract acts as a filter of a speech

  • vocal tract (성도) : 소리가 나가는 길

figure2


3. Decomposing Speech

figure2

\(\rightarrow\) Peaks of spectral envelope, or formants, carry the identity of sound!


We can see “speech” as a “convolution of (1) with (2)”

  • (1) vocal tract frequency response
  • (2) glottal pulse


\(X(t)=E(t) \cdot H(t)\).

\(\log (X(t))=\log (E(t) \cdot H(t))\).

\(\log (X(t))=\log (E(t))+\log (H(t))\).

figure2


figure2


4. Liftering

Removing the high quefruency values! ( or the “glottal pulse” )

figure2


5. Calculating MFCCs

Waveform \(\rightarrow\) DFT \(\rightarrow\) Log-amplitude Spectrum \(\rightarrow\) Mel-scaling \(\rightarrow\) Discrete cosine transform

But why use discrete cosine transform?

( = similar to inverse transform )

  • simplfied version of FT
  • get real-valued coefficient
  • decorrelate energy in different mel bands
  • reduce # of dim to represent spectrum


How many coefficients to use?

  • First 12~13 coefficients ( low frequencies )

    • 1st : Most information
      • corresponds to “formants”, “spectral envelope”
    • Last : Least information
  • Use \(\Delta\) and \(\Delta \Delta\) MFCCs

    \(\rightarrow\) about 39 coffeicients per frame


Visualization

figure2


Categories: ,

Updated: