(paper) Wavenet ; A generative model for raw audio

2016, Wavenet, TCN

less than 1 minute read

Seunghan Lee

Seunghan Lee

Deep Learning, Data Science, Statistics

Wavenet : A generative model for raw audio (2016)

Contents

Abstract
WaveNet
1. Dilated Causal Convolutions

0. Abstract

Wavenet : DNN for generating raw audio waveforms

fully probabilistic
autoregressive

1. WaveNet

introduce new generative model
joint probability of waveform \(\mathbf{x}=\left\{x_{1}, \ldots, x_{T}\right\}\) :

factorized as …

\(p(\mathbf{x})=\prod_{t=1}^{T} p\left(x_{t} \mid x_{1}, \ldots, x_{t-1}\right)\).
similar to PixelCNNs, the conditional prob dist is modelled by a stack of CONVOLUTIONAL layers
outputs a categorical distn over the next \(x_t\) with softmax layer
optimized to maximize log likelihood

(1) Dilated Causal Convolutions

key of WaveNet : “casual convolutions”

ensure “time ordering” ( = no cheating )

( \(\approx\) masked convolution for CNN )
for 1D data (ex. audio), easy to implement

( just shift the output of normal convolution by a few timestep )

Train & Inference

[ Train ] : can be made in parallel ( all KNOWN )

\(\rightarrow\) faster than RNN
[ INFERENCE ] : made sequentially

Problem : require many layers

\(\rightarrow\) use dilated convolutions to increase receptive field! ( w.o increasing computational cost )

Dilated Convolution

use “skipping” for “efficieincy”
similar to pooling / strided convolutions

( BUT, difference : output has the SAME size as the input )

Twitter Facebook LinkedIn

You May Also Enjoy

8 minute read

2 minute read

5 minute read

14 minute read