Robust Inference with Variational Bayes (2015)
1. Introduction
Posterior should be robust to variation of prior & likelihood ( If posterior changes substantially, this lacks objectivity! )
Measuring the sensitivity of the posterior to variation in the likelihood & prior is central concern of the field of robust Bayes ( but tools of robust Bayes are not commonly used in practice, due to the difficulty of calculating the robust measures from MCMC draws )
In contrast to MCMC, VB(Variational Bayes) are readily amenable to robustness analysis.
\(\rightarrow\) derivative of posterior expectation w.r.t prior & data perturbation is a measure of local robustness to prior & likelihood
This paper develops local prior robustness measures for MFVB (Mean Field Variational Bayes)
2. Robustness Measures
Notation
- \(x=\left(x_{1}, \ldots, x_{N}\right)\) with \(x_{n} \in \mathbb{R}^{D}\)
- parameter : \(\theta \in \mathbb{R}^{K}\)
- prior parameters : \(\alpha\) where \(\alpha \in \mathbb{R}^{M}\)
Posterior distn of \(\theta\) :
- \(p_{x}^{\alpha}(\theta):=p(\theta \mid x, \alpha)=\frac{p(x \mid \theta) p(\theta \mid \alpha)}{p(x)}\).
- Bayesian analysis = posterior expectation of some function \(g(\theta)\) ( mean or variance ) : \(\mathbb{E}_{p_{x}^{\alpha}}[g(\theta)]\)
How much \(\mathbb{E}_{p_{x}^{\alpha}}[g(\theta)]\) changes locally, in response to small perturbations in the value of \(\alpha\) : \(\frac{d \mathbb{E}_{p_{x}^{\alpha}}[g(\theta)]}{d \alpha} \mid _{\alpha} \Delta \alpha\)
3. Linear Response Variational Bayes and extensions
[ MFVB posterior approximation ]
\(q_x^{\alpha}\) : variational approximation to posterior \(p_x^{\alpha}\)
( variational family \(Q\) is a class of products of exponential family distributions )
\(\begin{aligned} q_{x}^{\alpha} &:=\operatorname{argmin}_{q \in \mathcal{Q}}\{S-L\} \quad \\ &\text { for } \quad \mathcal{Q}=\left\{q: q(\theta)=\prod_{k=1}^{K} q\left(\theta_{k}\right) ; \quad \forall k, q\left(\theta_{k}\right) \propto \exp \left(\eta_{k}^{T} \theta_{k}\right)\right\} \\\\ L &:=\mathbb{E}_{q}[\log p(x \mid \theta)]+\mathbb{E}_{q}[\log p(\theta \mid \alpha)], \quad S:=\mathbb{E}_{q}[\log q(\theta)] \end{aligned}\).
Assume \(q_x^{\alpha}\) , which is the solution to the above, has interior exponential family parameter \(\eta_k\)
\(\rightarrow\) \(q_x^{\alpha}\) can be completely characterized by its mean param, \(m:=\mathbb{E}_{q_{x}^{\alpha}}[\theta]\)
perturb the objective in the direction of a function \(f\) of the mean param \(m\) by some amout \(t\)
\(\rightarrow\) \(q_{t}:=\operatorname{argmin}_{q \in \mathcal{Q}}\left\{S-L+f(m)^{T} t\right\}\).
\(\rightarrow\) Solution :
\(\frac{d \mathbb{E}_{q_{t}}[\theta]}{d t^{T}} \mid _{t=0}=(I-V H)^{-1} V=: \hat{\Sigma}, \quad \text { where } V:=\operatorname{Cov}_{q_{x}^{\alpha}}(\theta) \text { and } H:=\frac{\partial^{2} L}{\partial m \partial m^{T}}\).
General Form : \(\frac{d h\left(m_{t}\right)}{d t}=\nabla h^{T} \hat{\Sigma} \nabla f\).
Taylor expansion in \(\Delta \alpha t\) :
\(\begin{aligned} \mathbb{E}_{q}\left[\log \left(p\left(\theta \mid \alpha_{t}\right)\right)\right] &=\mathbb{E}_{q}[\log (p(\theta \mid \alpha))]+\frac{d}{d \alpha^{T}} \mathbb{E}_{q}[\log (p(\theta \mid \alpha))] \Delta \alpha t+O\left(t^{2}\right) \Rightarrow \\ f(m) &:=\frac{d}{d \alpha^{T}} \mathbb{E}_{q}[\log (p(\theta \mid \alpha))] \Delta \alpha \quad \text { and } \quad h(m):=\mathbb{E}_{q_{x}^{\alpha}}[g(\theta)] \end{aligned}\).
-
with \(f(m)\) and \(h(m)\) defined as aboveā¦.
\(\frac{d h\left(m_{t}\right)}{d t}=\nabla h^{T} \hat{\Sigma} \nabla f\) gives the robustness measure!
4. Robustness measures from LRVB
Calculate \(f(m)\) from \(f(m) :=\frac{d}{d \alpha^{T}} \mathbb{E}_{q}[\log (p(\theta \mid \alpha))] \Delta \alpha\)
-
let \(g(\theta)=\theta\)
Then, \(\log p(\theta \mid \alpha)=\alpha^{T} \pi(\theta)\)
So, \(f(m)=\mathbb{E}_{q_{x}^{\alpha}}[\pi(\theta)] \Delta \alpha\).
Second, consider changing the functional form of \(p(\theta \mid \alpha)\)
-
Assume \(q_{x}^{\alpha}(\theta)=q\left(\theta_{i}\right) q\left(\theta_{-i}\right) \quad \text { and } \quad p(\theta \mid \alpha)=p\left(\theta_{i} \mid \alpha_{i}\right) p\left(\theta_{-i} \mid \alpha_{-i}\right)\).
-
In order to ensure that perturbed prior is properly normalizedā¦.
\(p\left(\theta_{i} \mid \alpha_{i}, \epsilon\right)=(1-\epsilon) p\left(\theta_{i} \mid \alpha_{i}\right)+\epsilon p_{c}\left(\theta_{i}\right)\) ( called \(\epsilon\)-contamination )
-
Influence function :
\(\frac{d \mathbb{E}_{q}[\theta]}{d \epsilon}=\frac{q_{x}^{\alpha}\left(\theta_{i 0}\right)}{p\left(\theta_{i 0} \mid \alpha\right)}(I-V H)^{-1}\left(\begin{array}{c} \theta_{i 0}-m_{i} \\ 0 \end{array}\right)\).
- \(p\left(\theta_{i 0} \mid \alpha\right)\) is known \(a\) priori
- \(q_{x}^{\alpha}\left(\theta_{i 0}\right)\) is a function of moment parameters \(m\)