VQ-VAE

Van Den Oord, Aaron, and Oriol Vinyals. "Neural discrete representation learning." Advances in neural information processing systems 30 (2017).

1. Abstract

Vector Quantised VAE

Simple yet powerful generative model that learns discrete representations
Differs from VAEs in two key ways:
- (1) Encoder: Outputs discrete codes
- (2) Prior: Learnt rather than static
Key idea: vector quantisation (VQ)

\(\rightarrow\) Allows the model to circumvent issues of “posterior collapse”

( where the latents are ignored when they are paired with a powerful autoregressive decoder )

Latent embedding space \(e \in R^{K \times D}\)

\(\rightarrow\) There are \(K\) embedding vectors \(e_i \in R^D, i \in 1,2, \ldots, K\).

Step 1) Takes an input \(x\)
Step 2) Encoding
- Produce output \(z_e(x)\).
Step 3) Find the nearest neighbor
- \(z\) are then calculated by a NN look-up using the shared embedding space \(e\)
- \(q(z=k \mid x)= \begin{cases}1 & \text { for } \mathrm{k}=\operatorname{argmin}_j \mid \mid z_e(x)-e_j \mid \mid _2 \\ 0 & \text { otherwise }\end{cases}\).
Step 4) Decoding
- Input to the decoder: NN
  - \(z_q(x)=e_k, \quad \text { where } \quad k=\operatorname{argmin}_j \mid \mid z_e(x)-e_j \mid \mid _2\)…. Eq (2)

No real gradient defined for Eq (2)

\(\rightarrow\) Solve with detach (stop gradient)

Loss function:

\(L=\log p\left(x \mid z_q(x)\right)+ \mid \mid \operatorname{sg}\left[z_e(x)\right]-e \mid \mid _2^2+\beta \mid \mid z_e(x)-\operatorname{sg}[e] \mid \mid _2^2\).

Log-likelihood of the model \(\log p(x)\):

Prior distribution over the discrete latents \(p(z)\) : Categorical distribution

Whilst training the VQ-VAE

\(\rightarrow\)the prior is kept constant and uniform

After training,

\(\rightarrow\) Fit an autoregressive distribution over \(z, p(z)\), so that we can generate \(x\) via ancestral sampling