Robust Inference with Variational Bayes (2015)

1. Introduction

Posterior should be robust to variation of prior & likelihood ( If posterior changes substantially, this lacks objectivity! )

Measuring the sensitivity of the posterior to variation in the likelihood & prior is central concern of the field of robust Bayes ( but tools of robust Bayes are not commonly used in practice, due to the difficulty of calculating the robust measures from MCMC draws )

In contrast to MCMC, VB(Variational Bayes) are readily amenable to robustness analysis.

\(\rightarrow\) derivative of posterior expectation w.r.t prior & data perturbation is a measure of local robustness to prior & likelihood

This paper develops local prior robustness measures for MFVB (Mean Field Variational Bayes)

2. Robustness Measures

Notation

\(x=\left(x_{1}, \ldots, x_{N}\right)\) with \(x_{n} \in \mathbb{R}^{D}\)
parameter : \(\theta \in \mathbb{R}^{K}\)
prior parameters : \(\alpha\) where \(\alpha \in \mathbb{R}^{M}\)

Posterior distn of \(\theta\) :

\(p_{x}^{\alpha}(\theta):=p(\theta \mid x, \alpha)=\frac{p(x \mid \theta) p(\theta \mid \alpha)}{p(x)}\).
Bayesian analysis = posterior expectation of some function \(g(\theta)\) ( mean or variance ) : \(\mathbb{E}_{p_{x}^{\alpha}}[g(\theta)]\)

How much \(\mathbb{E}_{p_{x}^{\alpha}}[g(\theta)]\) changes locally, in response to small perturbations in the value of \(\alpha\) : \(\frac{d \mathbb{E}_{p_{x}^{\alpha}}[g(\theta)]}{d \alpha} \mid _{\alpha} \Delta \alpha\)

3. Linear Response Variational Bayes and extensions

[ MFVB posterior approximation ]

\(q_x^{\alpha}\) : variational approximation to posterior \(p_x^{\alpha}\)

( variational family \(Q\) is a class of products of exponential family distributions )

\(\begin{aligned} q_{x}^{\alpha} &:=\operatorname{argmin}_{q \in \mathcal{Q}}\{S-L\} \quad \\ &\text { for } \quad \mathcal{Q}=\left\{q: q(\theta)=\prod_{k=1}^{K} q\left(\theta_{k}\right) ; \quad \forall k, q\left(\theta_{k}\right) \propto \exp \left(\eta_{k}^{T} \theta_{k}\right)\right\} \\\\ L &:=\mathbb{E}_{q}[\log p(x \mid \theta)]+\mathbb{E}_{q}[\log p(\theta \mid \alpha)], \quad S:=\mathbb{E}_{q}[\log q(\theta)] \end{aligned}\).

Assume \(q_x^{\alpha}\) , which is the solution to the above, has interior exponential family parameter \(\eta_k\)

\(\rightarrow\) \(q_x^{\alpha}\) can be completely characterized by its mean param, \(m:=\mathbb{E}_{q_{x}^{\alpha}}[\theta]\)

perturb the objective in the direction of a function \(f\) of the mean param \(m\) by some amout \(t\)

\(\rightarrow\) \(q_{t}:=\operatorname{argmin}_{q \in \mathcal{Q}}\left\{S-L+f(m)^{T} t\right\}\).

\(\rightarrow\) Solution :

\(\frac{d \mathbb{E}_{q_{t}}[\theta]}{d t^{T}} \mid _{t=0}=(I-V H)^{-1} V=: \hat{\Sigma}, \quad \text { where } V:=\operatorname{Cov}_{q_{x}^{\alpha}}(\theta) \text { and } H:=\frac{\partial^{2} L}{\partial m \partial m^{T}}\).

General Form : \(\frac{d h\left(m_{t}\right)}{d t}=\nabla h^{T} \hat{\Sigma} \nabla f\).

Taylor expansion in \(\Delta \alpha t\) :

\(\begin{aligned} \mathbb{E}_{q}\left[\log \left(p\left(\theta \mid \alpha_{t}\right)\right)\right] &=\mathbb{E}_{q}[\log (p(\theta \mid \alpha))]+\frac{d}{d \alpha^{T}} \mathbb{E}_{q}[\log (p(\theta \mid \alpha))] \Delta \alpha t+O\left(t^{2}\right) \Rightarrow \\ f(m) &:=\frac{d}{d \alpha^{T}} \mathbb{E}_{q}[\log (p(\theta \mid \alpha))] \Delta \alpha \quad \text { and } \quad h(m):=\mathbb{E}_{q_{x}^{\alpha}}[g(\theta)] \end{aligned}\).

with \(f(m)\) and \(h(m)\) defined as above….

\(\frac{d h\left(m_{t}\right)}{d t}=\nabla h^{T} \hat{\Sigma} \nabla f\) gives the robustness measure!

4. Robustness measures from LRVB

Calculate \(f(m)\) from \(f(m) :=\frac{d}{d \alpha^{T}} \mathbb{E}_{q}[\log (p(\theta \mid \alpha))] \Delta \alpha\)

let \(g(\theta)=\theta\)

Then, \(\log p(\theta \mid \alpha)=\alpha^{T} \pi(\theta)\)

So, \(f(m)=\mathbb{E}_{q_{x}^{\alpha}}[\pi(\theta)] \Delta \alpha\).

Second, consider changing the functional form of \(p(\theta \mid \alpha)\)

Assume \(q_{x}^{\alpha}(\theta)=q\left(\theta_{i}\right) q\left(\theta_{-i}\right) \quad \text { and } \quad p(\theta \mid \alpha)=p\left(\theta_{i} \mid \alpha_{i}\right) p\left(\theta_{-i} \mid \alpha_{-i}\right)\).
In order to ensure that perturbed prior is properly normalized….

\(p\left(\theta_{i} \mid \alpha_{i}, \epsilon\right)=(1-\epsilon) p\left(\theta_{i} \mid \alpha_{i}\right)+\epsilon p_{c}\left(\theta_{i}\right)\) ( called \(\epsilon\)-contamination )
Influence function :

\(\frac{d \mathbb{E}_{q}[\theta]}{d \epsilon}=\frac{q_{x}^{\alpha}\left(\theta_{i 0}\right)}{p\left(\theta_{i 0} \mid \alpha\right)}(I-V H)^{-1}\left(\begin{array}{c} \theta_{i 0}-m_{i} \\ 0 \end{array}\right)\).
- \(p\left(\theta_{i 0} \mid \alpha\right)\) is known \(a\) priori
- \(q_{x}^{\alpha}\left(\theta_{i 0}\right)\) is a function of moment parameters \(m\)

Twitter Facebook LinkedIn

79.Robust Inference with Variational Bayes

Seunghan Lee

Robust Inference with Variational Bayes (2015)

1. Introduction

2. Robustness Measures

3. Linear Response Variational Bayes and extensions

4. Robustness measures from LRVB

You May Also Enjoy