2. Metropolis-Hastings

about Markov Chains : https://d3c33hcgiwev3.cloudfront.net/adadc80290e52a99b282ca9d7c1a41ee_background_MarkovChains.html?Expires=1583020800&Signature=UJdh6PpuE3m5EvICzH476NP5PxgoQ81DO~rCGk7a7OQcAQ-gnEjYFVSNyYoFJP2427rmBkXVLCiPdzzOWDKToMHFkzMjICyFz2QIOL0Jw0qXS-4NDXiTyeFPU~RfVeM347ZuYEkhgUqpJgMsjclK11baUhZYtMmH2g97mdMki~E&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A

(1) Metropolis-Hastings

a. Algorithm

Metropolis Hastings is one of the general algorithm of MCMC.

Our purpose is to find a ‘transition rule’, which means where the current value \(z^t\) should move on to. And we call this rule a ‘proposal distribution (\(q_t\))’ .

So, how do we get a proper \(q_t\) ?

[Step 1] Let our current value \(z^t\). Take a sample \(z^{*} \sim q(z^{*}\mid z^t)\) .

  • of course, the proposal distribution is inaccurate at first. We’re going to update it

[Step 2] Change \(z^{t+1}\) with a probability \(\alpha\) (acceptance probability)

  • ACCEPT (\(z^{t+1} = z^{*}\)) or REJECT (\(z^{t+1} = z^{t}\))

b. acceptance probability, \(\alpha\)

acceptance probability \(\alpha(z^{*}\mid z^t) = min\{1,r(z^{*}\mid z^t)\}\)

so, what is \(r\) ?

It is a ratio, \(r(z^{*}\mid z^t)\) = \(\frac {q(z^t \mid z^{*}) \cdot P(z^{*})} {q(z^{*} \mid z^{t}) \cdot P(z^{t})}\), and we want this ratio to become 1.

As you can see, making this ‘1’ means that we want our Markov Chain to be ‘reversible’, which makes \(\pi\) a stationary distribution. So if the numerator is bigger than the denominator, we reduce the numerator, and vise versa. If we iterate this many times, we will get an (approximately) proper proposal distribution.

(2) Random Walk Metropolis-Hastings

This is one way of taking a random walk based on transition probability,

\(T_{t,*}^{MH}\) = \(q(z^{*}\mid z^t)\) \(\alpha(z^{*} \mid z^t)\)

Choosing what to become \(q(z^{*}\mid z^t)\) is choosing what type of M-H algorithm we want to have. And Random walk M-H algorithm takes a distribution like below.

  • \(z^{*} \sim\) \(N(z^{t},\sigma^2)\)
  • \(q(z^{*}\mid z^t)\) = \(\frac{1}{\sigma \sqrt{2\pi}}\)​ \(exp(-\frac{(z^{*}-z^t)^2}{2\sigma^2})\)



https://www.researchgate.net/publication/

(3). Example of M-H Algorithm

Let’s understand the Metropolis Hastings algorithm with the example below.

Assume there is a 60% probability that the coin is a loaded coin. The prior is that the probability of the coin is loaded. We can make use of this data when inferencing the posterior probability. Let’s say that we toss this coin 5 times, and compute the posterior probability. In this case, there are only two outcomes for theta, which are ‘fair’ or ‘loaded’. ( case : X=2 )





Let’s apply this example with the Metropolis Hastings algorithm

[Step 1] Start at either theta_o = fair or theta_o = loaded

[Step 2] For i=1,…,m:
a) Propose candidate theta* to be the other state as theta_(i-1)

b)

if theta* is loaded, alpha = (0.0794/0.0125) = 0.635
if theta* is fair, alpha = (0.0125/0.0794) = 1.574

[Step 3] If theta* is fair, then alpha >1 -> accept theta* and set theta_i = fair
If theta* is loaded, then alpha < 1 -> accept theta* and set theta_i = loaded (w.p 0.635), otherwise set theta_i=fair (w.p 0.365).

Then, the transition probability matrix P would look like this.


Let’s check if we have made a good estimation ( ? )

We have found that the posterior probability was



By calculation, we can find that !



(4) Random walk example ( with R )

## Metropolis-Hastings
### Random Walk

log_g = function(mu,n,ybar){
  mu2 = mu^2
  n*(ybar*mu -mu2/2)-log(1.0 +mu2)
}

met_hast = function(n,ybar,n_iter,mu_init,cand_sd){
  # step 1
  mu_out = numeric(n_iter)
  accpt = 0
  mu_now = mu_init
  lg_now = log_g(mu=mu_now,n=n,ybar=ybar)
  
  # step 2
  for (i in 1:n_iter){
    mu_cand = rnorm(1,mean=mu_now, sd=cand_sd)
    
    lg_cand = log_g(mu=mu_cand, n=n, ybar=ybar)
    lalpha = lg_cand - lg_now
    alpha = exp(lalpha)
    
    u = runif(1)
    if (u<alpha){
      mu_now = mu_cand
      accpt = accpt + 1
      lg_now = lg_cand
    }
    mu_out[i] = mu_now
  }
  list(mu=mu_out,accpt=accpt/n_iter)
}

y = c(1.2, 1.4, -0.5, 0.3, 0.9, 2.3, 1.0, 0.1, 1.3, 1.9)
ybar = mean(y)
n = length(y)
hist(y, freq=FALSE, xlist=c(-1.0,3.0))
points(y,rep(0,n))
points(ybar, 0.0, pch=19)
curve(dt(x,df=1),lty=2,add=TRUE) # prior distribution


## posterior sampling
set.seed(43)
post = met_hast(n=n, ybar=ybar, n_iter=1e3, mu_init=0.0,cand_sd=3.0)
str(post)

library(coda)
traceplot(as.mcmc(post$mu))

# increase in candidate standard deviation -> decrease the acceptance rate

# case 2
post2 = met_hast(n=n, ybar=ybar, n_iter=1e3, mu_init=0.0,cand_sd=1)
str(post2)

traceplot(as.mcmc(post2$mu))

# case 3 
post3 = met_hast(n=n, ybar=ybar, n_iter=1e3, mu_init=30.0,cand_sd=1)
str(post3)

traceplot(as.mcmc(post3$mu))

# post analysis
post3$mu_keep = post3$mu[-c(1:100)]
plot(density(post3$mu_keep),xlim=c(-1.0,3.0)) # posterior distribution
curve(dt(x,df=1),lty=2,add=TRUE) # prior distribution