3. Gibbs Sampling

(1) A special case of Metropolis Hasting algorithm

Gibbs sampling is a special case of Metropolis Hasting algorithm. We use this in the case when there are multiple parameters.

Let’s suppose the situation

  • \(z^{t} = (z_k^t, z_{-k}^t) \rightarrow z^{*} =\) \((z_k^{*},z_{-k}^t)\)

then, the proposal distribution will become

  • \(q(z^{*}\mid z^t) =\) \(P(z_k^{*},z_{-k}^{t}\mid z_{-k}^{t})\) = \(P(z_k^{*}\mid z_{-k}^{t})\)

In this case, the acceptance probability (refer the post “2.Metropolis-Hastings”) becomes 1! It always holds the balance equation.

(2) Gibbs Sampling

a. Multiple parameter sampling & Full conditional distributions

The algorithm is like below. ( Let’s assume with a case with 2 parameters, \(\theta\) & \(\phi\) )

[Step 1] Initialize

[Step 2] for i=1,…,m repeat:
a) using , draw
b) using , draw

With these two draws, one cycle of the Gibbs sampler is completed! Quite Simple!


https://miro.medium.com/max/2060/0*6QwmVaLCHiJQjbqs.png

b. Conditionally conjugate prior example with Normal Likelihood

Let’s take an example with the case of normal likelihood with unknwon mean & variance. And the parameters (\(\mu\) and \(\sigma\)) follows the distribution below.

\(y_i \mid \mu,\sigma^2 \sim N(\mu, \sigma^2)\)

\(\mu \sim N(\mu_o,\sigma^2_o)\)

\(\sigma^2 \sim InvGamma(\alpha, \beta)\)

Then the posterior distribution can be expressed like this :

\(\begin{align*} p(\mu,\sigma^2\mid y_1,...y_n) &\propto p(y1,...,yn\mid \mu,\sigma^2)p(\mu)p(\sigma^2)\\ &\propto (\sigma^2)^{-n/2} exp[-\frac{1}{2\sigma^2}\sum_{i=1}^{n}(y_i-\mu)^2]exp[-\frac{1}{2\sigma_0^2}\sum_{i=1}^{n}(\mu-\mu_o)^2](\sigma^2)^{-(\alpha+1)}exp[\frac{-\beta_o}{\sigma^2}] \end{align*}\)


First, we’ll look at \(\mu\). So we assume \(\sigma\) is a known constant. Then the expression would be like below.

\(\begin{align*} p(\mu\mid y_1,...y_n) &\propto p(\mu,\sigma^2\mid y_1,..,y_n)\\ &\propto exp[-\frac{1}{2}(\frac{\sum_{i=1}^{n}(y_i-\mu)^2}{\sigma^2}+\frac{(\mu-\mu_o)^2}{\sigma_o^2})]\\ &\propto N(\mu\mid \frac{n\bar{y}/\sigma^2 +\mu_o/\sigma_o^2}{n/\sigma^2+1/\sigma_o^2}, \frac{1}{n/\sigma^2 + 1/\sigma_o^2}) \end{align*}\)</a>

Then, we’ll look at sigma. So we assume \(\mu\) is a known constant. Then the expression would be like below.



\(\begin{align*} p(\sigma\mid y_1,...y_n) &\propto p(\mu,\sigma^2\mid y_1,..,y_n)\\ &\propto InvGamma(\sigma^2\mid \alpha + \frac{n}{2}, \beta_o+\frac{\sum_{i=1}^{n}(y_i-\mu)^2}{2}) \end{align*}\)

Two distributions expressed above, provide the basis of a Gibbs sampler to simulate from a Markov chain, whose stationary distribution is the full posterior distribution for mu and sigma squared. The only thing we have to do is to alternate draws between these mu and sigma, using the most recent draw of one parameter to update the other one.

(3) Gibbs Sampling with R Code

### Conditionally conjugate prior example with Normal Likelihood

# [1] Full Conditional update for mean (mu)
update_mu = function(n,ybar,sig2,mu_0,sig2_0){
  sig2_1 = 1 / (n/sig2 + 1/sig2_0)
  mu_1 = sig2_1 * (n*ybar / sig2 + mu_0 / sig2_0)
  rnorm(n=1, mean=mu_1, sd=sqrt(sig2_1))
}

# [2] Full Conditional update for variance (sigma^2)
update_sig2 = function(n,y,mu,alpha_0, beta_0){
  alpha_1 = alpha_0 + n/2
  sumsq = sum((y-mu)^2)
  beta_1 = beta_0 + sumsq/2
  out_gamma = rgamma(n=1, shape=alpha_1, rate=beta_1)
  1/out_gamma
}

# [3] Gibbs sampling
gibbs = function(y,n_iter,init,prior){
  ybar = mean(y)
  n = length(y)
  
  mu_out = numeric(n_iter)
  sig2_out = numeric(n_iter)
  
  mu_now = init$mu
  
  ## gibbs sampler
  for (i in 1:n_iter){
    sig2_now = update_sig2(n=n, y=y, mu=mu_now, alpha_0 = prior$alpha_0, beta_0 = prior$beta_0)
    mu_now = update_mu(n=n,ybar=ybar, sig2=sig2_now, mu_0=prior$mu_0, sig2_0=prior$sig2_0)
    sig2_out[i] = sig2_now
    mu_out[i] = mu_now
  }
  cbind(mu=mu_out,sig2=sig2_out)
}

y = c(1.2,1.4,-0.5,0.3,0.9,2.3,1.0,0.1,1.3,1.9)
ybar = mean(y)
n = length(y)

# prior
prior = list()
prior$mu_0 = 0
prior$sig2_0 = 1
prior$n_0 = 2
prior$s2_0 = 1
prior$alpha_0 = prior$n_0/2
prior$beta_0 = prior$n_0 * prior$s2_0 / 2

# histogram
hist(y,freq=FALSE, xlim=c(-1.0,3.0))
curve(dnorm(x=x, mean=prior$mu_0, sd=sqrt(prior$sig2_0)),
      lty=2, add=TRUE)
points(y,rep(0,n),pch=1)
points(ybar,0,pch=19)

# implementation
set.seed(53)
init = list()
init$mu = 0

post = gibbs(y=y,n_iter=1e3, init=init, prior=prior)

head(post)

library(coda)
plot(as.mcmc(post))
summary(as.mcmc(post))

여러 깁스샘플링의 종류에 대해서는 아래 블로그에 잘 정리되어 있다! :)

https://niceguy1575.tistory.com/34?category=738476