( Skip the basic parts + not important contents )

11.Sampling MethodsPermalink

For most probabilsitic models, exact inference is INTRACTABLE!

Ch11 : approximate inference based on numerical sampling, known as Monte Carlo techniques


Goal : evaluate E[f]=f(z)p(z)dz

  • using finite sum : ˆf=1LLl=1f(z(l))
  • variance of the estimator : var[ˆf]=1LE[(fE[f])2]


However, samples z(l) may not be independent!

( thus, effective sample size might be much smaller than the apparent sample size )


11-1. Basic Sampling AlgorithmsPermalink

11-1-1. Standard distributionsPermalink

skip


11-1-2. Rejection SamplingPermalink

sample from complex distn

wish to sample from p(z), which p(z)=1Zp˜p(z)

  • sampling is hard
  • evaluation is easy


1) use “proposal distribution” q(z)

  • more simple distribution

2) introduce constant k

  • where kq(z)˜p(z)
  • kq(z) : comparison function


Algorithm

  • [step 1] generate z0 from q(z)

  • [step 2] generate u0 from Unif(0,kq(z0))

  • [step 3] if $u_{0}>\widetilde{p}\left(z_{0}\right)$ : reject

    ( o.w : accept )


11-1-3. Adaptive rejection samplingPermalink

skip


11-1-4. Importance samplingPermalink

Finite sum approximation to expectation

(1) discrete z space into a uniform grid

(2) evaluate the integrand as a sum : E[f]Ll=1p(z(l))f(z(l)).


Problem : summation grows exponentially with the dim of z


Normalized casePermalink

finite sum over samples {z(l)} drawn from q(z) :

E[f]=f(z)p(z)dz=f(z)p(z)q(z)q(z)dz1LLl=1p(z(l))q(z(l))f(z(l)).

we call rl=p(z(l))/q(z(l)), importance weights


Unnormalized casePermalink

p(z) can only be evaluated up to a normalization constant

( p(z)=˜p(z)/Zp )

E[f]=f(z)p(z)dz=ZqZpf(z)˜p(z)˜q(z)q(z)dzZqZp1LLl=1˜rlf(z(l)).

where ˜rl=˜p(z(l))/˜q(z(l)).


since ZpZq=1Zq˜p(z)dz=˜p(z)˜q(z)q(z)dz1LLl=1˜rl

E[f]Ll=1wlf(z(l)), where wl=˜rlm˜rm=˜p(z(l))/q(z(l))m˜p(z(m))/q(z(m))


11-1-5. Sampling-importance-resamplingPermalink

skip


11-1-6. Sampling and the EM algorithmPermalink

skip


11-2. MCMCPermalink

Metropolis algorithmPermalink

  • when proposal distribution is symmetric
  • acceptance probability : A(z,z(τ))=min(1,˜p(z)˜p(z(τ))).


Metropolis Hastings algorithmPermalink

  • when proposal distribution does not need to be symmetric

  • acceptance probability : Ak(z,z(τ))=min(1,˜p(z)qk(z(τ)z)˜p(z(τ))qk(zz(τ)))

  • pf) detailed balance is satisfied!

    p(z)qk(zz)Ak(z,z)=min(p(z)qk(zz),p(z)qk(zz))=min(p(z)qk(zz),p(z)qk(zz))=p(z)qk(zz)Ak(z,z).


11-3. Gibbs samplingPermalink

  1. Initialize {zi:i=1,,M}
  2. For τ=1,,T: Sample z(τ+1)1p(z1z(τ)2,z(τ)3,,z(τ)M) Sample z(τ+1)2p(z2z(τ+1)1,z(τ)3,,z(τ)M) : Sample z(τ+1)jp(zjz(τ+1)1,,z(τ+1)j1,z(τ)j+1,,z(τ)M) : Sample z(τ+1)Mp(zMz(τ+1)1,z(τ+1)2,,z(τ+1)M1)


Acceptance ratio = (always) 1

pf) A(z,z)=p(z)qk(zz)p(z)qk(zz)=p(zkzk)p(zk)p(zkzk)p(zkzk)p(zk)p(zkzk)=1


11-4. Slice SamplingPermalink

difficulties of Metropolis : “sensitive to step size”

  • too small : slow decorrelation
  • too large : inefficiency due to high rejection rate


Slice sampling : provides an “adaptive step size”


Involves augmenting z with an additional variable u

then, draw samples from joint (z,u) space

figure2


Goal : sample from ˆp(z,u)={1/Zp if 0u˜p(z)0 otherwise 

where Zp=˜p(z)dz.


It is okay to sample from ˆp(z,u) and then just discard u

ˆp(z,u)du=˜p(z)01Zp du=˜p(z)Zp=p(z).


Algorithm : alternately sample z and u

  • step 1) given z, evaluate ~p(z)
  • step 2) sample u from Unif(0,~p(z))
  • step 3) fix u and sample z , distribution defined by {z:˜p(z)>u}


11-5. The Hybrid MC algorithmPermalink

limitation of Metropolis : “random walk”

introduce a sophisticated class of transition, based on physical system

Applicable to..

  • distributions over continuous variables
  • for which we can readily evaluate the gradient of log probability w.r.t state variable


11-5-1. Dynamic SystemsPermalink

dynamics : τ

state variable : z

momentum variable : r

ri=dzi dτ.


Hamiltonian function H : H(z,r)=E(z)+K(r)

  • p(z)=1Zpexp(E(z)).
    • E(z) : potential energy
  • K(r)=12r2=12ir2i.
    • kinetic energy


Hamiltonian Equation

  • dzi dτ=Hridri dτ=Hzi.


Properties

  • 1) during the evolution of this dynamical system H is constant

    dH dτ=i{Hzidzi dτ+Hridri dτ}=i{HziHriHriHzi}=0.

  • 2) Liouville’s Theorem : preserve volume in phase space

    V=(dzdτ,drdτ)

    divV=i{zidzi dτ+ridri dτ}=i{ziHri+riHzi}=0.


Joint pdf

  • total energy = Hamiiltonian
  • p(z,r)=1ZHexp(H(z,r)).


Due to 2 properties : Hamiltonian dynamics will leave p(z,r) invariant

  • volume of this region will remain unchanged
  • thus, H, and probability density will remain unchanged


Althoough H is invariant, z and r will vary


In order to arrive at an ergodic sampling…

introduce additional moves

change the value of H while leaving p(z,r) unchanged!


By “replacing z with one drawn from its distribution conditioned on z

( z and r are independent in p(z,r), so conditional p(rz) is a Gaussian )


Leapfrog discretizationPermalink

successive updates to position (z) and momentum (r)

ˆri(τ+ϵ/2)=ˆri(τ)ϵ2Ezi(ˆz(τ))ˆzi(τ+ϵ)=ˆzi(τ)+ϵˆri(τ+ϵ/2)ˆri(τ+ϵ)=ˆri(τ+ϵ/2)ϵ2Ezi(ˆz(τ+ϵ)).
  • time interval : τ need τ/ϵ steps


SummaryPermalink

Hamiltonian dynamical approach invloves alternating between

  • series of leapfrog updates
  • re-sampling of the momentum variables

Unlike Metropolis, able to make use of “gradient of log prob”


11-5-2. Hybrid MCPermalink

Hybrid MC = Hamiltonian dynamics + Metropolis


After each application of leapfrog…

  • resulting candidate state is ACCEPTED/REJECTED according to the Metropolis criterion

    ( based on the value of Hamiltonian H )

    min(1,exp{H(z,r)H(z,r)}).

  • perfect : H unchanged 100% accept

  • numerical errors : H may sometimes decrease