2. Metropolis-Hastings
about Markov Chains : https://d3c33hcgiwev3.cloudfront.net/adadc80290e52a99b282ca9d7c1a41ee_background_MarkovChains.html?Expires=1583020800&Signature=UJdh6PpuE3m5EvICzH476NP5PxgoQ81DO~rCGk7a7OQcAQ-gnEjYFVSNyYoFJP2427rmBkXVLCiPdzzOWDKToMHFkzMjICyFz2QIOL0Jw0qXS-4NDXiTyeFPU~RfVeM347ZuYEkhgUqpJgMsjclK11baUhZYtMmH2g97mdMki~E&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A
(1) Metropolis-Hastings
a. Algorithm
Metropolis Hastings is one of the general algorithm of MCMC.
Our purpose is to find a ‘transition rule’, which means where the current value \(z^t\) should move on to. And we call this rule a ‘proposal distribution (\(q_t\))’ .
So, how do we get a proper \(q_t\) ?
[Step 1] Let our current value \(z^t\). Take a sample \(z^{*} \sim q(z^{*}\mid z^t)\) .
- of course, the proposal distribution is inaccurate at first. We’re going to update it
[Step 2] Change \(z^{t+1}\) with a probability \(\alpha\) (acceptance probability)
- ACCEPT (\(z^{t+1} = z^{*}\)) or REJECT (\(z^{t+1} = z^{t}\))
b. acceptance probability, \(\alpha\)
acceptance probability \(\alpha(z^{*}\mid z^t) = min\{1,r(z^{*}\mid z^t)\}\)
so, what is \(r\) ?
It is a ratio, \(r(z^{*}\mid z^t)\) = \(\frac {q(z^t \mid z^{*}) \cdot P(z^{*})} {q(z^{*} \mid z^{t}) \cdot P(z^{t})}\), and we want this ratio to become 1.
As you can see, making this ‘1’ means that we want our Markov Chain to be ‘reversible’, which makes \(\pi\) a stationary distribution. So if the numerator is bigger than the denominator, we reduce the numerator, and vise versa. If we iterate this many times, we will get an (approximately) proper proposal distribution.
(2) Random Walk Metropolis-Hastings
This is one way of taking a random walk based on transition probability,
\(T_{t,*}^{MH}\) = \(q(z^{*}\mid z^t)\) \(\alpha(z^{*} \mid z^t)\)
Choosing what to become \(q(z^{*}\mid z^t)\) is choosing what type of M-H algorithm we want to have. And Random walk M-H algorithm takes a distribution like below.
- \(z^{*} \sim\) \(N(z^{t},\sigma^2)\)
- \(q(z^{*}\mid z^t)\) = \(\frac{1}{\sigma \sqrt{2\pi}}\) \(exp(-\frac{(z^{*}-z^t)^2}{2\sigma^2})\)
https://www.researchgate.net/publication/
(3). Example of M-H Algorithm
Let’s understand the Metropolis Hastings algorithm with the example below.
Assume there is a 60% probability that the coin is a loaded coin. The prior is that the probability of the coin is loaded. We can make use of this data when inferencing the posterior probability. Let’s say that we toss this coin 5 times, and compute
the posterior probability. In this case, there are only two outcomes for theta, which are ‘fair’ or ‘loaded’. ( case : X=2 )
Let’s apply this example with the Metropolis Hastings algorithm
[Step 1] Start at either theta_o = fair or theta_o = loaded
[Step 2] For i=1,…,m:
a) Propose candidate theta* to be the other state as theta_(i-1)
if theta* is loaded, alpha = (0.0794/0.0125) = 0.635
if theta* is fair, alpha = (0.0125/0.0794) = 1.574
[Step 3]
If theta* is fair, then alpha >1 -> accept theta* and set theta_i = fair
If theta* is loaded, then alpha < 1 -> accept theta* and set theta_i = loaded (w.p 0.635), otherwise set theta_i=fair (w.p 0.365).
Then, the transition probability matrix P would look like this.
Let’s check if we have made a good estimation ( ? )
We have found that the posterior probability was
By calculation, we can find that !
(4) Random walk example ( with R )
## Metropolis-Hastings
### Random Walk
log_g = function(mu,n,ybar){
mu2 = mu^2
n*(ybar*mu -mu2/2)-log(1.0 +mu2)
}
met_hast = function(n,ybar,n_iter,mu_init,cand_sd){
# step 1
mu_out = numeric(n_iter)
accpt = 0
mu_now = mu_init
lg_now = log_g(mu=mu_now,n=n,ybar=ybar)
# step 2
for (i in 1:n_iter){
mu_cand = rnorm(1,mean=mu_now, sd=cand_sd)
lg_cand = log_g(mu=mu_cand, n=n, ybar=ybar)
lalpha = lg_cand - lg_now
alpha = exp(lalpha)
u = runif(1)
if (u<alpha){
mu_now = mu_cand
accpt = accpt + 1
lg_now = lg_cand
}
mu_out[i] = mu_now
}
list(mu=mu_out,accpt=accpt/n_iter)
}
y = c(1.2, 1.4, -0.5, 0.3, 0.9, 2.3, 1.0, 0.1, 1.3, 1.9)
ybar = mean(y)
n = length(y)
hist(y, freq=FALSE, xlist=c(-1.0,3.0))
points(y,rep(0,n))
points(ybar, 0.0, pch=19)
curve(dt(x,df=1),lty=2,add=TRUE) # prior distribution
## posterior sampling
set.seed(43)
post = met_hast(n=n, ybar=ybar, n_iter=1e3, mu_init=0.0,cand_sd=3.0)
str(post)
library(coda)
traceplot(as.mcmc(post$mu))
# increase in candidate standard deviation -> decrease the acceptance rate
# case 2
post2 = met_hast(n=n, ybar=ybar, n_iter=1e3, mu_init=0.0,cand_sd=1)
str(post2)
traceplot(as.mcmc(post2$mu))
# case 3
post3 = met_hast(n=n, ybar=ybar, n_iter=1e3, mu_init=30.0,cand_sd=1)
str(post3)
traceplot(as.mcmc(post3$mu))
# post analysis
post3$mu_keep = post3$mu[-c(1:100)]
plot(density(post3$mu_keep),xlim=c(-1.0,3.0)) # posterior distribution
curve(dt(x,df=1),lty=2,add=TRUE) # prior distribution