14.(VI) Variational Inference Intro(1)

Introduction of Variational Inference

1 minute read

Seunghan Lee

Seunghan Lee

Deep Learning, Data Science, Statistics

1. Variational Inference Intro(1)

(1) Mathematical Expression

a. Introduction

Before introducing about VI(Variational Inference), there are several cases where we can not find posterior probability easily.

1 ) When we have difficulty finding the marginal probability ( denominator of posterior probability , p(x) )
2 ) When we want more complex likelihood ( = p(x \mid z) )
3 ) When we want more complex prior ( = p(z) )

Variational Inference is an making an apporximation of p(z|x) ( a posterior probability) into q(z), which we can handle it more easily. Look at the picture below.

https://4.bp.blogspot.com/

P(Z\X) is the posterior probability, which has no special form. We want to approximate it as Q(z), which is a normal distribution, so that we can have a much more convenient calculation afterwards. So, how can we approximate it?

KL-divergence

We have learned about KL-divergence, which is a measure to calculate a difference between two distributions. Using this, you can make a problem of statistical inference into an ‘optimization’ problem. Minimizing the KL-divergence is same as making two distributions similar, and there we find(approximate) a posterior probability!

$KL(Q_\phi(Z|X)||P(Z|X)) = \sum_{z\in Z}q_\phi(z|x)\log\frac{q_\phi(z|x)}{p(z|x)}$

b. Mean Field Approximation

This is how it works.

[STEP 1] select a family of distribution Q as a VI

$Q = {q|q(z) = \prod_{i=1}^{d}q_i(z_i)}$
this will be a product of all the q_i ( the distribution of the ith latent variable )

[STEP 2] Find the best approximation q(z) of p*(z)

$KL[q(z)|p^*(z))] \rightarrow \underset{q\in Q}{min}$

c. Optimization (details about step 2 in Mean Field Optimization)

$KL[q(z)|p^*(z))] = KL(\prod_{i=1}^{d}q_i(z)|p^*(z)) \rightarrow \underset{q_1,q_2,..q_d}{min}$

First optimize with respect to q1, and get a new distribution.

$KL(q||p^*)\rightarrow \underset{q_1}{min}$

Then, optimize with respect to q2, get a new distribution

$KL(q||p^*)\rightarrow \underset{q_2}{min}$

…. it keeps going like this.

[ Mathematical Expression ]

Let’s say we want to minimize KL-divergence with respect to q_k.

$\begin{align*} KL(\prod_{i=1}^{d}q||p^*) &= \int \prod_{i=1}^{d}q_i log\frac{\prod_{i=1}^{d}q_i }{p^*}dz \\ &=\sum_{i=1}^{d}\int \prod_{j=1}^{d}q_j\;log q_i\; dz - \int \prod_{j=1}^{d}q_j \;logp^* \;dz\\ &=\int \prod_{j=1}^{d}q_j\;logq_k\;dz + \sum_{i\neq k}\int \prod_{j=1}^{d}q_j\;log q_i\;dz - \int \prod_{j=1}^{d}q_j\; log_p^*\;dz\\ &=\int q_klogq_k[\int \prod_{j\neq k}q_j\;dz_{\neq k}]dz_k + \sum_{i\neq k}\int \prod_{j=1}^{d}q_j\;log q_i\;dz - \int \prod_{j=1}^{d}q_j\; log_p^*\;dz\\ &=\int q_k\;log q_k \;dz_k + \sum_{i\neq k}\int \prod_{j=1}^{d}q_j\;log q_i\;dz - \int \prod_{j=1}^{d}q_j\; log_p^*\;dz\\ &=\int q_k\;log q_k \;dz_k - \int \prod_{j=1}^{d}q_j\; log_p^*\;dz\\ &=\int q_k\;log q_k \;dz_k - \int q_k [ \int \prod_{j\neq k}q_j\;logp^*\;dz_{\neq k}]dz_k\\ &=\int q_k[logq_k-\int \prod_{j\neq k}q_j\;logp^*\;dz_{\neq k}]dz_k\\ &=\int q_k[logq_k-E_{q_{-k}}logp^*]dz_k\\ \end{align*}$

In the equation above, let $h(z_k) = E_{q_{-k}}logp^*$ and $t(z_k) = \frac{e^{h(z_k)}}{\int e^{h(z'_k)}dz_k}$

Then we can get the following equation.
$\begin{align*} KL(\prod_{i=1}^{d}q||p^*) &= \int q_k[logq_k-E_{q_{-k}}logp^*]dz_k\\ &= \int q_k\;log\frac{q_k}{t}\;dz_k + const \end{align*}$

As a result our final formula will become like this!

$log q_k = E_{q_{-k}}logp^* + const$

Twitter Facebook LinkedIn

You May Also Enjoy

8 minute read

2 minute read

5 minute read

14 minute read