( Skip the basic parts + not important contents )
11.Sampling MethodsPermalink
For most probabilsitic models, exact inference is INTRACTABLE!
→ Ch11 : approximate inference based on numerical sampling, known as Monte Carlo techniques
Goal : evaluate E[f]=∫f(z)p(z)dz
- using finite sum : ˆf=1L∑Ll=1f(z(l))
- variance of the estimator : var[ˆf]=1LE[(f−E[f])2]
However, samples z(l) may not be independent!
( thus, effective sample size might be much smaller than the apparent sample size )
11-1. Basic Sampling AlgorithmsPermalink
11-1-1. Standard distributionsPermalink
skip
11-1-2. Rejection SamplingPermalink
sample from complex distn
wish to sample from p(z), which p(z)=1Zp˜p(z)
- sampling is hard
- evaluation is easy
1) use “proposal distribution” q(z)
- more simple distribution
2) introduce constant k
- where kq(z)⩾˜p(z)
- kq(z) : comparison function
Algorithm
-
[step 1] generate z0 from q(z)
-
[step 2] generate u0 from Unif(0,kq(z0))
-
[step 3] if $u_{0}>\widetilde{p}\left(z_{0}\right)$ : reject
( o.w : accept )
11-1-3. Adaptive rejection samplingPermalink
skip
11-1-4. Importance samplingPermalink
Finite sum approximation to expectation
(1) discrete z space into a uniform grid
(2) evaluate the integrand as a sum : E[f]≃∑Ll=1p(z(l))f(z(l)).
Problem : summation grows exponentially with the dim of z
Normalized casePermalink
finite sum over samples {z(l)} drawn from q(z) :
E[f]=∫f(z)p(z)dz=∫f(z)p(z)q(z)q(z)dz≃1LL∑l=1p(z(l))q(z(l))f(z(l)).
we call rl=p(z(l))/q(z(l)), importance weights
Unnormalized casePermalink
p(z) can only be evaluated up to a normalization constant
( p(z)=˜p(z)/Zp )
E[f]=∫f(z)p(z)dz=ZqZp∫f(z)˜p(z)˜q(z)q(z)dz≃ZqZp1LL∑l=1˜rlf(z(l)).
where ˜rl=˜p(z(l))/˜q(z(l)).
since ZpZq=1Zq∫˜p(z)dz=∫˜p(z)˜q(z)q(z)dz≃1LL∑l=1˜rl
E[f]≃∑Ll=1wlf(z(l)), where wl=˜rl∑m˜rm=˜p(z(l))/q(z(l))∑m˜p(z(m))/q(z(m))
11-1-5. Sampling-importance-resamplingPermalink
skip
11-1-6. Sampling and the EM algorithmPermalink
skip
11-2. MCMCPermalink
Metropolis algorithmPermalink
- when proposal distribution is symmetric
- acceptance probability : A(z⋆,z(τ))=min(1,˜p(z⋆)˜p(z(τ))).
Metropolis Hastings algorithmPermalink
-
when proposal distribution does not need to be symmetric
-
acceptance probability : Ak(z⋆,z(τ))=min(1,˜p(z⋆)qk(z(τ)∣z⋆)˜p(z(τ))qk(z⋆∣z(τ)))
-
pf) detailed balance is satisfied!
p(z)qk(z∣z′)Ak(z′,z)=min(p(z)qk(z∣z′),p(z′)qk(z′∣z))=min(p(z′)qk(z′∣z),p(z)qk(z∣z′))=p(z′)qk(z′∣z)Ak(z,z′).
11-3. Gibbs samplingPermalink
- Initialize {zi:i=1,…,M}
- For τ=1,…,T: − Sample z(τ+1)1∼p(z1∣z(τ)2,z(τ)3,…,z(τ)M) − Sample z(τ+1)2∼p(z2∣z(τ+1)1,z(τ)3,…,z(τ)M) : − Sample z(τ+1)j∼p(zj∣z(τ+1)1,…,z(τ+1)j−1,z(τ)j+1,…,z(τ)M) : Sample z(τ+1)M∼p(zM∣z(τ+1)1,z(τ+1)2,…,z(τ+1)M−1)
Acceptance ratio = (always) 1
pf) A(z⋆,z)=p(z⋆)qk(z∣z⋆)p(z)qk(z⋆∣z)=p(z⋆k∣z⋆∖k)p(z⋆∖k)p(zk∣z⋆∖k)p(zk∣z∖k)p(z∖k)p(z⋆k∣z∖k)=1
11-4. Slice SamplingPermalink
difficulties of Metropolis : “sensitive to step size”
- too small : slow decorrelation
- too large : inefficiency due to high rejection rate
Slice sampling : provides an “adaptive step size”
Involves augmenting z with an additional variable u
then, draw samples from joint (z,u) space
Goal : sample from ˆp(z,u)={1/Zp if 0⩽u⩽˜p(z)0 otherwise
where Zp=∫˜p(z)dz.
It is okay to sample from ˆp(z,u) and then just discard u
∫ˆp(z,u)du=∫˜p(z)01Zp du=˜p(z)Zp=p(z).
Algorithm : alternately sample z and u
- step 1) given z, evaluate ~p(z)
- step 2) sample u from Unif(0,~p(z))
- step 3) fix u and sample z , distribution defined by {z:˜p(z)>u}
11-5. The Hybrid MC algorithmPermalink
limitation of Metropolis : “random walk”
introduce a sophisticated class of transition, based on physical system
Applicable to..
- distributions over continuous variables
- for which we can readily evaluate the gradient of log probability w.r.t state variable
11-5-1. Dynamic SystemsPermalink
dynamics : τ
state variable : z
momentum variable : r
ri=dzi dτ.
Hamiltonian function H : H(z,r)=E(z)+K(r)
- p(z)=1Zpexp(−E(z)).
- E(z) : potential energy
- K(r)=12‖r‖2=12∑ir2i.
- kinetic energy
Hamiltonian Equation
- dzi dτ=∂H∂ridri dτ=−∂H∂zi.
Properties
-
1) during the evolution of this dynamical system H is constant
dH dτ=∑i{∂H∂zidzi dτ+∂H∂ridri dτ}=∑i{∂H∂zi∂H∂ri−∂H∂ri∂H∂zi}=0.
-
2) Liouville’s Theorem : preserve volume in phase space
V=(dzdτ,drdτ)divV=∑i{∂∂zidzi dτ+∂∂ridri dτ}=∑i{−∂∂zi∂H∂ri+∂∂ri∂H∂zi}=0.
Joint pdf
- total energy = Hamiiltonian
- p(z,r)=1ZHexp(−H(z,r)).
Due to 2 properties : Hamiltonian dynamics will leave p(z,r) invariant
- volume of this region will remain unchanged
- thus, H, and probability density will remain unchanged
Althoough H is invariant, z and r will vary
In order to arrive at an ergodic sampling…
→ introduce additional moves
→ change the value of H while leaving p(z,r) unchanged!
By “replacing z with one drawn from its distribution conditioned on z “
( z and r are independent in p(z,r), so conditional p(r∣z) is a Gaussian )
Leapfrog discretizationPermalink
successive updates to position (z) and momentum (r)
ˆri(τ+ϵ/2)=ˆri(τ)−ϵ2∂E∂zi(ˆz(τ))ˆzi(τ+ϵ)=ˆzi(τ)+ϵˆri(τ+ϵ/2)ˆri(τ+ϵ)=ˆri(τ+ϵ/2)−ϵ2∂E∂zi(ˆz(τ+ϵ)).- time interval : τ → need τ/ϵ steps
SummaryPermalink
Hamiltonian dynamical approach invloves alternating between
- series of leapfrog updates
- re-sampling of the momentum variables
Unlike Metropolis, able to make use of “gradient of log prob”
11-5-2. Hybrid MCPermalink
Hybrid MC = Hamiltonian dynamics + Metropolis
After each application of leapfrog…
-
resulting candidate state is ACCEPTED/REJECTED according to the Metropolis criterion
( based on the value of Hamiltonian H )
min(1,exp{H(z,r)−H(z⋆,r⋆)}).
-
perfect : H unchanged → 100% accept
-
numerical errors : H may sometimes decrease