1. Variational Inference Intro(1)

(1) Mathematical Expression

a. Introduction

Before introducing about VI(Variational Inference), there are several cases where we can not find posterior probability easily.

  • 1 ) When we have difficulty finding the marginal probability ( denominator of posterior probability , p(x) )
  • 2 ) When we want more complex likelihood ( = p(x \mid z) )
  • 3 ) When we want more complex prior ( = p(z) )

Variational Inference is an making an apporximation of p(z|x) ( a posterior probability) into q(z), which we can handle it more easily. Look at the picture below.


https://4.bp.blogspot.com/

P(Z\X) is the posterior probability, which has no special form. We want to approximate it as Q(z), which is a normal distribution, so that we can have a much more convenient calculation afterwards. So, how can we approximate it?

KL-divergence

We have learned about KL-divergence, which is a measure to calculate a difference between two distributions. Using this, you can make a problem of statistical inference into an ‘optimization’ problem. Minimizing the KL-divergence is same as making two distributions similar, and there we find(approximate) a posterior probability!



b. Mean Field Approximation

This is how it works.

[STEP 1] select a family of distribution Q as a VI

  • this will be a product of all the q_i ( the distribution of the ith latent variable )

[STEP 2] Find the best approximation q(z) of p*(z)


c. Optimization (details about step 2 in Mean Field Optimization)


First optimize with respect to q1, and get a new distribution.


Then, optimize with respect to q2, get a new distribution


…. it keeps going like this.

[ Mathematical Expression ]

Let’s say we want to minimize KL-divergence with respect to q_k.


In the equation above, let and

Then we can get the following equation.

As a result our final formula will become like this!