Mel-Frequency Cepstral Coefficients (MFCCs)

참고 : https://www.youtube.com/watch?v=fMqL5vckiU0&list=PL-wATfeyAMNrtbkCNsLcpoAyBBRJZVlnf

1. Introduction

Mel-Frequency Cepstral Coefficients

How to compute Cepstrum?

Vocal Tract acts as a filter of a speech

\(\rightarrow\) Peaks of spectral envelope, or formants, carry the identity of sound!

We can see “speech” as a “convolution of (1) with (2)”

\(X(t)=E(t) \cdot H(t)\).

\(\log (X(t))=\log (E(t) \cdot H(t))\).

\(\log (X(t))=\log (E(t))+\log (H(t))\).

Removing the high quefruency values! ( or the “glottal pulse” )

Waveform \(\rightarrow\) DFT \(\rightarrow\) Log-amplitude Spectrum \(\rightarrow\) Mel-scaling \(\rightarrow\) Discrete cosine transform

But why use discrete cosine transform?

( = similar to inverse transform )

First 12~13 coefficients ( low frequencies )
- 1st : Most information
  - corresponds to “formants”, “spectral envelope”
- Last : Least information
Use \(\Delta\) and \(\Delta \Delta\) MFCCs

\(\rightarrow\) about 39 coffeicients per frame