Divide And Couple : Using Monte Carlo Variational Objectives for Posterior ( NeurIPS 2019 )


Abstract

recent VI : use idea from MC estimation to make tighter bounds

given a VI objective (defined by MC estimator of likelihood)….use Divide and Couple

  • to identify augmented proposal & target distn
  • so that the gap between
    • “VI objective” & “log-likelihood” is equal to
    • the divergence between “augmented proposal” & “target distn”


1. Introduction

VI : \(\log p(x)=\underset{q(\mathbf{z})}{\mathbb{E}}\left[\log \frac{p(\mathbf{z}, x)}{q(\mathbf{z})}\right]+\mathrm{KL}[q(\mathbf{z}) \mid \mid p(\mathbf{z} \mid x)]\).

Tighter objectives :

  • let \(R\) be the estimator of likelihood
    • \(\mathbb{E} R=\) \(p(x)\) .
    • \(\log p(x) \geq \mathbb{E} \log R\) ( by Jensen’s Inequality )
  • Standard \(\mathrm{VI}\) : \(R=p(z, x) / q(z)\)
  • Importance-weighted autoencoders (IWAEs) : \(R=\frac{1}{M} \sum_{m=1}^{M} p\left(\mathbf{z}_{m}, x\right) / q\left(\mathbf{z}_{m}\right)\)


show how to find a distn \(Q(z)\) such that

  • divergence between \(Q(z)\) and \(p(z \mid x)\) is at most the gap between \(\mathbb{E} \log R\) and \(\log p(x)\).
  • Thus, “better estimator” = “better posterior approximations”
  • how to find \(Q(z)\) ? by divide and couple


Divide and Couple

  • Divide : maximize \(\mathbb{E} \log R\) = minimize gap between \(\mathbb{E} \log R\) and \(\log p(x)\).
  • Couple : divergence is an upper bound to \(\mathrm{KL}[Q(\mathrm{z}) \mid \mid p(\mathrm{z} \mid x)]\).


2. Setup and Motivation

ELBO decomposition & Jensen’s Inequality

  • \(\log p(x) \geq \mathbb{E} \log R\).
  • traditional VI ) \(R=p(\mathrm{z}, x) / q(\mathrm{z})\)
  • Many other estimators \(R\) of \(p(x)\)….


2-1. Example

Target distn \(p(z,x)\) & Gaussian \(q(z)\)

Tightening the likelihood bound has made \(q\) cloes to \(p\)


**Antithetic sampling **

  • \(R^{\prime}=\frac{1}{2}\left(\frac{p(z, x)+p(T(z), x)}{q(z)}\right), z \sim q\).

    where \(T(z) = \mu - (z - \mu)\).

    • \(z\) “reflected” around the mean \(\mu\) of \(q\)


3. The Divide-and-Couple Framework

Posterior inference for general non-negative estimators, using divide & couple


3-1. Divide

\(\mathbb{E}_{Q(\boldsymbol{\omega})} R(\boldsymbol{\omega})=p(x)\), where \(\boldsymbol{\omega} \sim Q(\omega)\).

Divide step

  • interpret \(\mathbb{E}_{Q(\boldsymbol{\omega})} R(\boldsymbol{\omega})\) as an ELBO, by defining \(P^{MC}\)

    so that \(R(\omega)=P^{\mathrm{MC}}(\omega, x) / Q(\omega)\).

  • \(P^{MC}\) and \(Q\) divide to produce \(R\)

figure2


3-2. Couple

Suggest coupling \(P^{\mathrm{MC}}(\omega, x)\) and \(p(z, x)\)

into some new distribution (augmented distribution) \(P^{\mathrm{MC}}(\omega, z, x)\) with \(P^{\mathrm{MC}}(z, x)=p(z, x) .\)


for \(P^{\mathrm{MC}}(z, \omega, x)=\) \(P^{\mathrm{MC}}(\omega, x) a(z \mid \omega)\) to be a valid coupling,

we require that \(\int P^{\mathrm{MC}}(\omega, x) a(z \mid \omega) d \omega=p(z, x)\).


figure2

  • Key Point : if \(R\) is a good estimator, it means that…
    • \(\mathbb{E} \log R\) is close to \(\log p(x)\)
    • \(Q(z)\) must e close to \(p(z \mid x)\)


3-3. Example

\(R^{\prime}=\frac{1}{2}\left(\frac{p(z, x)+p(T(z), x)}{q(z)}\right), z \sim q\).

  • tighter VI
  • but less similar to the target! ( check the figure below )

figure2


Since \(Q(w)\) is a poor approximation.. how can we solve?

consider coupling distn

  • \(a(z \mid \omega)=\pi(\omega) \delta(z-\omega)+(1-\pi(\omega)) \delta(z-T(\omega))\).

  • \(\pi(\omega)=\frac{p(\omega, x)}{p(\omega, x)+p(T(\omega), x)}\).

Thus, the augmented variational distn is \(Q(\omega, z)=Q(\omega) a(z \mid \omega)\).

  • draw \(\omega \sim Q\) and select \(z=\omega\) with probability \(\pi(\omega)\) or \(z=T(\omega)\) otherwis


4. Conclusion

Central Insight :

  • approximate posterior can be constructed from an estimator using “coupling”

    ( this posterior’s divergence is bounded by the looseness of the likelihood bound )

Categories:

Updated: