Robust Inference with Variational Bayes (2015)


1. Introduction

Posterior should be robust to variation of prior & likelihood ( If posterior changes substantially, this lacks objectivity! )

Measuring the sensitivity of the posterior to variation in the likelihood & prior is central concern of the field of robust Bayes ( but tools of robust Bayes are not commonly used in practice, due to the difficulty of calculating the robust measures from MCMC draws )

In contrast to MCMC, VB(Variational Bayes) are readily amenable to robustness analysis.

\(\rightarrow\) derivative of posterior expectation w.r.t prior & data perturbation is a measure of local robustness to prior & likelihood


This paper develops local prior robustness measures for MFVB (Mean Field Variational Bayes)


2. Robustness Measures

Notation

  • \(x=\left(x_{1}, \ldots, x_{N}\right)\) with \(x_{n} \in \mathbb{R}^{D}\)
  • parameter : \(\theta \in \mathbb{R}^{K}\)
  • prior parameters : \(\alpha\) where \(\alpha \in \mathbb{R}^{M}\)


Posterior distn of \(\theta\) :

  • \(p_{x}^{\alpha}(\theta):=p(\theta \mid x, \alpha)=\frac{p(x \mid \theta) p(\theta \mid \alpha)}{p(x)}\).
  • Bayesian analysis = posterior expectation of some function \(g(\theta)\) ( mean or variance ) : \(\mathbb{E}_{p_{x}^{\alpha}}[g(\theta)]\)


How much \(\mathbb{E}_{p_{x}^{\alpha}}[g(\theta)]\) changes locally, in response to small perturbations in the value of \(\alpha\) : \(\frac{d \mathbb{E}_{p_{x}^{\alpha}}[g(\theta)]}{d \alpha} \mid _{\alpha} \Delta \alpha\)


3. Linear Response Variational Bayes and extensions

[ MFVB posterior approximation ]

\(q_x^{\alpha}\) : variational approximation to posterior \(p_x^{\alpha}\)

( variational family \(Q\) is a class of products of exponential family distributions )


\(\begin{aligned} q_{x}^{\alpha} &:=\operatorname{argmin}_{q \in \mathcal{Q}}\{S-L\} \quad \\ &\text { for } \quad \mathcal{Q}=\left\{q: q(\theta)=\prod_{k=1}^{K} q\left(\theta_{k}\right) ; \quad \forall k, q\left(\theta_{k}\right) \propto \exp \left(\eta_{k}^{T} \theta_{k}\right)\right\} \\\\ L &:=\mathbb{E}_{q}[\log p(x \mid \theta)]+\mathbb{E}_{q}[\log p(\theta \mid \alpha)], \quad S:=\mathbb{E}_{q}[\log q(\theta)] \end{aligned}\).


Assume \(q_x^{\alpha}\) , which is the solution to the above, has interior exponential family parameter \(\eta_k\)

\(\rightarrow\) \(q_x^{\alpha}\) can be completely characterized by its mean param, \(m:=\mathbb{E}_{q_{x}^{\alpha}}[\theta]\)


perturb the objective in the direction of a function \(f\) of the mean param \(m\) by some amout \(t\)

\(\rightarrow\) \(q_{t}:=\operatorname{argmin}_{q \in \mathcal{Q}}\left\{S-L+f(m)^{T} t\right\}\).

\(\rightarrow\) Solution :

\(\frac{d \mathbb{E}_{q_{t}}[\theta]}{d t^{T}} \mid _{t=0}=(I-V H)^{-1} V=: \hat{\Sigma}, \quad \text { where } V:=\operatorname{Cov}_{q_{x}^{\alpha}}(\theta) \text { and } H:=\frac{\partial^{2} L}{\partial m \partial m^{T}}\).


General Form : \(\frac{d h\left(m_{t}\right)}{d t}=\nabla h^{T} \hat{\Sigma} \nabla f\).


Taylor expansion in \(\Delta \alpha t\) :

\(\begin{aligned} \mathbb{E}_{q}\left[\log \left(p\left(\theta \mid \alpha_{t}\right)\right)\right] &=\mathbb{E}_{q}[\log (p(\theta \mid \alpha))]+\frac{d}{d \alpha^{T}} \mathbb{E}_{q}[\log (p(\theta \mid \alpha))] \Delta \alpha t+O\left(t^{2}\right) \Rightarrow \\ f(m) &:=\frac{d}{d \alpha^{T}} \mathbb{E}_{q}[\log (p(\theta \mid \alpha))] \Delta \alpha \quad \text { and } \quad h(m):=\mathbb{E}_{q_{x}^{\alpha}}[g(\theta)] \end{aligned}\).

  • with \(f(m)\) and \(h(m)\) defined as aboveā€¦.

    \(\frac{d h\left(m_{t}\right)}{d t}=\nabla h^{T} \hat{\Sigma} \nabla f\) gives the robustness measure!


4. Robustness measures from LRVB

Calculate \(f(m)\) from \(f(m) :=\frac{d}{d \alpha^{T}} \mathbb{E}_{q}[\log (p(\theta \mid \alpha))] \Delta \alpha\)

  • let \(g(\theta)=\theta\)

    Then, \(\log p(\theta \mid \alpha)=\alpha^{T} \pi(\theta)\)

    So, \(f(m)=\mathbb{E}_{q_{x}^{\alpha}}[\pi(\theta)] \Delta \alpha\).


Second, consider changing the functional form of \(p(\theta \mid \alpha)\)

  • Assume \(q_{x}^{\alpha}(\theta)=q\left(\theta_{i}\right) q\left(\theta_{-i}\right) \quad \text { and } \quad p(\theta \mid \alpha)=p\left(\theta_{i} \mid \alpha_{i}\right) p\left(\theta_{-i} \mid \alpha_{-i}\right)\).

  • In order to ensure that perturbed prior is properly normalizedā€¦.

    \(p\left(\theta_{i} \mid \alpha_{i}, \epsilon\right)=(1-\epsilon) p\left(\theta_{i} \mid \alpha_{i}\right)+\epsilon p_{c}\left(\theta_{i}\right)\) ( called \(\epsilon\)-contamination )

  • Influence function :

    \(\frac{d \mathbb{E}_{q}[\theta]}{d \epsilon}=\frac{q_{x}^{\alpha}\left(\theta_{i 0}\right)}{p\left(\theta_{i 0} \mid \alpha\right)}(I-V H)^{-1}\left(\begin{array}{c} \theta_{i 0}-m_{i} \\ 0 \end{array}\right)\).

    • \(p\left(\theta_{i 0} \mid \alpha\right)\) is known \(a\) priori
    • \(q_{x}^{\alpha}\left(\theta_{i 0}\right)\) is a function of moment parameters \(m\)

Categories:

Updated: