17.LDA_intro2

LDA Model, E-step & M-step

1 minute read

Seunghan Lee

Seunghan Lee

Deep Learning, Data Science, Statistics

5. LDA (Latent Dirichlet Allocation) Model

2) LDA model

a. LDA Model

Goal : find a probabilistic model of a corpus that assigns high probability to members of the corpus ( & to other “similar” documents )

This is how the model looks like.

$p(W,Z,\Theta) = \prod_{d=1}^{D}p(\theta_d)\prod_{n=1}^{N_d}p(z_{dn}|\theta_d)p(w_{dn}|z_{dn})$

[ Interpretation ]

$\prod_{d=1}^{D}$ : for each document ( ex. d=3 : document 3 )
$p(\theta_d)$ : generate topic probabilities ( ex. (0.5,0.2,0.3) )
$\prod_{n=1}^{N_d}$ : for each word ( ex. N4 : word 4 )
$p(z_{dn}|\theta_d)$ : select topic ( ex. with the probability vector theta )
$p(w_{dn}|z_{dn})$ : select word from topic

Very Intuitive!

Let’s see the distributions more carefully.
The probability of theta is a dirichlet distribution with parameter alpha

$p(\theta_d) \sim Dir(\alpha)$

The probability of topics (given theta) will be the component of theta.

$p(z_{dn}|\theta_d) = \theta_{dz_{dn}}$

To select the words, we need to know the probability of each words in the topic. (can find in the matrix Phi! row_num : Z_dn & col_num : w_dn )

$\Phi_{z_{dn}w_{dn}}$

We have to find the matrix Phi in the expression above. There are two constraints.

[ Summary ]

Known

W data

Unknown

$\Phi$ ( parameters, distribution over words for each topic
Z( latent variables, topic of each word )
$\Theta$ ( latent variables, distribution over topics for each document )

(2) E-step & M-step Overview

Goal : train the model by finding the optimal values of phi! ( by maximizing the likelihood )

If we take logarithm for the posterior distribution, it will seem like this.

$logP(\theta,Z,W) = \sum_{d=1}^{D}\begin{bmatrix} \;\sum_{t-1}^{T}(\alpha_t-1)\log\theta_{dt} + \sum_{n=1}^{N_d}\sum_{t=1}^{T}[z_{dn}=t](log\theta_{dt}+log\varphi _{tw_{dn}}) \end{bmatrix}$
( erasing the constant )

We will use EM algorithm to find this distribution.

E step

$KL(q(\theta)q(Z)\;||\;p(\theta,Z|W))\rightarrow \underset{q(\theta),q(Z)}{min}$

M step

$E_{q(\theta)q(T)}log(P(\theta,Z,w))\rightarrow \underset{\phi}{max}$

(3) E-step

$\begin{align*} log(q(\theta)) &= E_{q(z)}log(P(\theta,z|w) + const\\ &= E_{q(z)}log(P(\theta,z,w) + const \\ &= E_{q(z)}\sum_{d=1}^{D}\begin{bmatrix} \;\sum_{t=1}^{T}(\alpha_t-1)\log\theta_{dt} + \sum_{n=1}^{N_d}\sum_{t=1}^{T}[z_{dn}=t]log\theta_{dt}\end{bmatrix}+const\\ &= \sum_{d=1}^{D}\begin{bmatrix} \;\sum_{t=1}^{T}(\alpha_t-1)\log\theta_{dt} + \sum_{n=1}^{N_d}\sum_{t=1}^{T} E_{q(z_{dn})}[z_{dn}=t]log\theta_{dt}\end{bmatrix}+const\\ &= \sum_{d=1}^{D}\sum_{t=1}^{T}[(\alpha_t-1)+\sum_{n=1}^{N_d}\gamma_{dn}]log\theta_{dt}+const \end{align*}$

where $\gamma_{d_n} = E_{q(n)}[z_{d_n}=t]$

As a result, we can express q(theta) like below

$q(\theta) \propto \prod_{d=1}^{D} \prod_{t=1}^{T}\theta_{d_t}^{[\alpha_t+\sum\gamma_{\alpha_n}-1]}$

$q(\theta) = \prod_{d=1}^{D}q(\theta_d) = Dir(\theta_d|\alpha+\sum \gamma_{\alpha_n})$

Twitter Facebook LinkedIn

You May Also Enjoy

8 minute read

2 minute read

5 minute read

14 minute read