Neural Adaptive Sequential Monte CarloPermalink


AbstractPermalink

SMC (Sequential Monte Carlo) = particle filtering

  • sampling from an intractable target
  • use sequence of simpler intermediate distn
  • dependent on proposal distn


Presents a new method for “automatically adapting the proposal”,

using an approximation of KL-div between “true posterior” & “posterior distn”

Neural Adaptive SMC ( NASMC )


1. IntroductionPermalink

SMC : the sequence constructs a proposal for importance sampling


Importance Sampling : dependent on the choice of proposal

  • bad proposal low effective sample size & high variance


SMC have deeveloped approaches to solve this!

  • ex) resampling to improve particle diversity, when the effective sample size is low
  • ex) applying MCMC transition kernels


This paper suggests…“new gradient-based black-box adaptive SMC method”

  • that automatically tunes flexible proposal distn
  • can be assessed using KL-div between (1) target & (2) parameterized proposal
  • very general & tractably handles complex proposal distn


2. Sequential Monte CarloPermalink

2 fundamental SMC algorithms :

  • **Sequential Importance Sampling (SIS) **
  • Sequential Importance Resampling (SIR)

  • probabilistic model containing hidden state (z1:T) & observed state(x1:T)

    joint distn : p(z1:T,x1:T)=p(z1)p(x1z1)Tt=2p(ztz1:t1)p(xtz1:t,x1:t1)


(1) Sequential Importance Sampling (SIS)Permalink

Goal : approximate posterior, over hidden state sequence through a weighted set of N sampled trajectories drawn from simpler proposal distribution {z(n)1:T}n=1:Nq(z1:Tx1:T)

  • p(z1:Tx1:T)Nn=1˜w(n)tδ(z1:Tz(n)1:T).


Factorize proposal distribution :

  • q(z1:Tx1:T)=q(z1x1)Tt=2q(ztz1:t1,x1:t).


Normalized Importance Weights ( defined by recursion )

  • w(z(n)1:T)=p(z(n)1:T,x1:T)q(z(n)1:Tx1:T).
  • ˜w(z(n)1:T)=w(z(n)1:T)nw(z(n)1:T)˜w(z(n)1:T1)p(z(n)Tz(n)1:T1)p(xTz(n)1:T,x1:T1)q(z(n)Tz(n)1:T1,x1:T).


Problem?

  • highly skewed as t increases

use Sequential Importance Resampling (SIR)


(2) Sequential Importance Resampling (SIR)Permalink

  • add “additional step that resamples z(n)t at time t,

    from a Multinomial distribution given by ˜w(z(n)1:t)

  • requires knowledge of full trajectory of previous samples

    each new particle needs to update its ancestry information

    ( a(n)τ,t : ancestral index of particle n at time t for state zτ )


2-1. The Critical Role of Proposal Distributions in SMCPermalink

Optimal choice : “intractable posterior” qϕ(z1:Tx1:T)=pθ(z1:Tx1:T)

  • ( factorization ) q(ztz1:t1,x1:t)=p(ztz1:t1,x1:t)


Bootstrap filter

  • use prior as proposal distn!
  • q(ztz1:t1,x1:t)=p(ztz1:t1,x1:t1).


3. Adapting Proposals by Descending the Inclusive KL-divergencePermalink

“Proposal distn will be optimized using inclusive KL-div”

  • KL[pθ(z1:Tx1:T)qϕ(z1:Tx1:T)].

Why this objective function?

  • 1) direct measure of quality of the proposal

  • 2) ( if true posterior lies in proposal family )

    it has a global optimum at this point

  • 3) ( if true posterior does not lie in proposal family )

    tends to find proposal distn that have higher entropy than the original

  • 4) derivative can be approximated efficiently


Gradient of negative KL-div ( + approximated using samples from SMC ) :

ϕKL[pθ(z1:Tx1:T)qϕ(z1:Tx1:T)]=pθ(z1:Tx1:T)ϕlogqϕ(z1:Tx1:T)dz1:Ttn˜w(n)tϕlogqϕ(z(n)tx1:t,zA(n)t11:t1).


Online & Batch variants

  • gradient update for the model params θ
  • θlog[pθ(x1:T)]tn˜w(n)tθlogpθ(xt,z(n)tx1:t1,zA(n)t11:t1).


Algorithm SummaryPermalink

figure2

Categories:

Updated: