1. Variational Inference Intro(1)
(1) Mathematical Expression
a. Introduction
Before introducing about VI(Variational Inference), there are several cases where we can not find posterior probability easily.
- 1 ) When we have difficulty finding the marginal probability ( denominator of posterior probability , p(x) )
- 2 ) When we want more complex likelihood ( = p(x \mid z) )
- 3 ) When we want more complex prior ( = p(z) )
Variational Inference is an making an apporximation of p(z|x) ( a posterior probability) into q(z), which we can handle it more easily.
Look at the picture below.
https://4.bp.blogspot.com/
P(Z\X) is the posterior probability, which has no special form. We want to approximate it as Q(z), which is a normal distribution, so that we can
have a much more convenient calculation afterwards. So, how can we approximate it?
KL-divergence
We have learned about KL-divergence, which is a measure to calculate a difference between two distributions. Using this, you can make a problem of statistical inference into an ‘optimization’ problem.
Minimizing the KL-divergence is same as making two distributions similar, and there we find(approximate) a posterior probability!
b. Mean Field Approximation
This is how it works.
[STEP 1] select a family of distribution Q as a VI
[STEP 2] Find the best approximation q(z) of p*(z)
c. Optimization (details about step 2 in Mean Field Optimization)
First optimize with respect to q1, and get a new distribution.
Then, optimize with respect to q2, get a new distribution
…. it keeps going like this.
[ Mathematical Expression ]
Let’s say we want to minimize KL-divergence with respect to q_k.
In the equation above, let
and
Then we can get the following equation.
As a result our final formula will become like this!