Neural Adaptive Sequential Monte CarloPermalink
AbstractPermalink
SMC (Sequential Monte Carlo) = particle filtering
- sampling from an intractable target
- use sequence of simpler intermediate distn
- dependent on proposal distn
Presents a new method for “automatically adapting the proposal”,
using an approximation of KL-div between “true posterior” & “posterior distn”
→ Neural Adaptive SMC ( NASMC )
1. IntroductionPermalink
SMC : the sequence constructs a proposal for importance sampling
Importance Sampling : dependent on the choice of proposal
- bad proposal → low effective sample size & high variance
SMC have deeveloped approaches to solve this!
- ex) resampling to improve particle diversity, when the effective sample size is low
- ex) applying MCMC transition kernels
This paper suggests…“new gradient-based black-box adaptive SMC method”
- that automatically tunes flexible proposal distn
- can be assessed using KL-div between (1) target & (2) parameterized proposal
- very general & tractably handles complex proposal distn
2. Sequential Monte CarloPermalink
2 fundamental SMC algorithms :
- **Sequential Importance Sampling (SIS) **
-
Sequential Importance Resampling (SIR)
-
probabilistic model containing hidden state (z1:T) & observed state(x1:T)
→ joint distn : p(z1:T,x1:T)=p(z1)p(x1∣z1)∏Tt=2p(zt∣z1:t−1)p(xt∣z1:t,x1:t−1)
(1) Sequential Importance Sampling (SIS)Permalink
Goal : approximate posterior, over hidden state sequence through a weighted set of N sampled trajectories drawn from simpler proposal distribution {z(n)1:T}n=1:N∼q(z1:T∣x1:T)
- p(z1:T∣x1:T)≈∑Nn=1˜w(n)tδ(z1:T−z(n)1:T).
Factorize proposal distribution :
- q(z1:T∣x1:T)=q(z1∣x1)∏Tt=2q(zt∣z1:t−1,x1:t).
Normalized Importance Weights ( defined by recursion )
- w(z(n)1:T)=p(z(n)1:T,x1:T)q(z(n)1:T∣x1:T).
- ˜w(z(n)1:T)=w(z(n)1:T)∑nw(z(n)1:T)∝˜w(z(n)1:T−1)p(z(n)T∣z(n)1:T−1)p(xT∣z(n)1:T,x1:T−1)q(z(n)T∣z(n)1:T−1,x1:T).
Problem?
- highly skewed as t increases
→ use Sequential Importance Resampling (SIR)
(2) Sequential Importance Resampling (SIR)Permalink
-
add “additional step that resamples z(n)t at time t,
from a Multinomial distribution given by ˜w(z(n)1:t)
-
requires knowledge of full trajectory of previous samples
→ each new particle needs to update its ancestry information
( a(n)τ,t : ancestral index of particle n at time t for state zτ )
2-1. The Critical Role of Proposal Distributions in SMCPermalink
Optimal choice : “intractable posterior” qϕ(z1:T∣x1:T)=pθ(z1:T∣x1:T)
- ( factorization ) q(zt∣z1:t−1,x1:t)=p(zt∣z1:t−1,x1:t)
Bootstrap filter
- use prior as proposal distn!
- q(zt∣z1:t−1,x1:t)=p(zt∣z1:t−1,x1:t−1).
3. Adapting Proposals by Descending the Inclusive KL-divergencePermalink
“Proposal distn will be optimized using inclusive KL-div”
- KL[pθ(z1:T∣x1:T)‖qϕ(z1:T∣x1:T)].
Why this objective function?
-
1) direct measure of quality of the proposal
-
2) ( if true posterior lies in proposal family )
→ it has a global optimum at this point
-
3) ( if true posterior does not lie in proposal family )
→ tends to find proposal distn that have higher entropy than the original
-
4) derivative can be approximated efficiently
Gradient of negative KL-div ( + approximated using samples from SMC ) :
−∂∂ϕKL[pθ(z1:T∣x1:T)‖qϕ(z1:T∣x1:T)]=∫pθ(z1:T∣x1:T)∂∂ϕlogqϕ(z1:T∣x1:T)dz1:T≈∑t∑n˜w(n)t∂∂ϕlogqϕ(z(n)t∣x1:t,zA(n)t−11:t−1).
Online & Batch variants
- gradient update for the model params θ
- ∂∂θlog[pθ(x1:T)]≈∑t∑n˜w(n)t∂∂θlogpθ(xt,z(n)t∣x1:t−1,zA(n)t−11:t−1).