[Paper Review] VI/BNN/NF paper 21~30
I have summarized the must read + advanced papers of papers regarding….
-
various methods using Variational Inference
-
Bayesian Neural Network
-
Probabilistic Deep Learning
-
Normalizing Flows
21.Variational Inference using Implicit Distributions
( Ferenc Huszar , 2017 )
( download paper here : Download )
Variational Inference = use \(q\) to approximate \(p\)
- ex) MFVI : simple, fast but may be inaccurate!
Key Point : Expand the Variational Family \(q_{\theta}(z)\)
Implicit distribution : (1) easy to sample (2) hard to evaluate
\(\rightarrow\) can make more expressive distribution :)
but, using Implicit Distribution is hard in Variational Inference
( \(\because\) entropy term in ELBO is intractable )
\(\rightarrow\) thus, use “Density Ratio Estimation”
Density Ratio Estimation
- by training a classifier \(D(z)\)
- \(y=1\) : sample from \(q_{\theta}(z)\)
- \(y=0\) : sample from \(p(z)\)
- Algorithm summary
- ELBO : \(\mathbb{E}_{q_{\theta}(z)}[\log p(x \mid z)]-\mathbb{E}_{q_{\theta}(z)}[\log D(z)-\log (1-D(z))]\)
- step 1) follow gradient estimate of the ELBO w.r.t \(\theta\) ( with reparam trick )
- step 2) for each \(\theta,\) fit \(D(z)\) so that \(D(z) \approx D^{*}(z)\)
- limitations :
- unstable learning when discriminator does not catch up
- overfits in high dimension
summary : Download
22. Semi-Implicit Variational Inference
( M Yin, 2018 )
( download paper here : Download )
using Implicit Distribution is hard in Variational Inference
( \(\because\) entropy term in ELBO is intractable )
instead of Density Ratio Estimation, use
\(\rightarrow\) “Semi-implicit Distribution” (SIVI)
SIVI : “optimize LOWER BOUND of ELBO”
summary : Download
23.Unbiased Implicit Variational Inference
( Michalis K. Titsias, Francisco J. R. Ruiz, 2019 )
( download paper here : Download )
using Implicit Distribution is hard in Variational Inference
( \(\because\) entropy term in ELBO is intractable )
instead of Density Ratio Estimation, use
\(\rightarrow\) “Unbiased-implicit Distribution” (UIVI)
UIVI : “DIRECTLY optimize ELBO”
( better performance than SIVI )
summary : Download
24.A Contrastive Divergence for Combining Variational Inference and MCMC
( Francisco J. R. Ruiz 1 2 Michalis K. Titsias, 2019 )
( download paper here : Download )
challenges of MCMC in VI
- 1) intractable
- 2) objective depend weakly on \(\theta\)
\(\rightarrow\) use an alternative divergence, “Variational Contrastive Divergence” (VCD)
\(\mathcal{L}_{\mathrm{VCD}}(\theta)=\underbrace{\mathrm{KL}\left(q_{\theta}^{(0)}(z) \| p(z \mid x)\right)-\mathrm{KL}\left(q_{\theta}(z) \| p(z \mid x)\right)}_{\geq 0}+\underbrace{\mathrm{KL}\left(q_{\theta}(z) \| q_{\theta}^{(0)}(z)\right)}_{\geq 0}\).
can be also written as…
\(\mathcal{L}_{\mathrm{VCD}}(\theta)=-\mathbb{E}_{q_{\theta}^{(0)}(z)}\left[\log p(x, z)-\log q_{\theta}^{(0)}(z)\right]+\mathbb{E}_{q_{\theta}(z)}\left[\log p(x, z)-\log q_{\theta}^{(0)}(z)\right]\).
problem #1 ) (intractability)
- solution: \(\log q_{\theta}^{(0)}(z)\) cancels out
problem #2 ) (weak dependence)
- solution : \(\mathcal{L}_{\mathrm{VCD}}(\theta) \stackrel{t \rightarrow \infty}{\longrightarrow} \mathrm{KL}\left(q_{\theta}^{(0)}(z) \| p(z \mid x)\right)+\mathrm{KL}\left(p(z \mid x) \| q_{\theta}^{(0)}(z)\right)\)
Steps
-
1) Sample \(z_{0} \sim q_{\theta}^{(0)}(z)\) (reparameterization)
-
2) Sample \(z \sim Q^{(t)}\left(z \mid z_{0}\right)\) (run \(t\) MCMC steps)
-
3) Estimate the gradient \(\nabla_{\theta} \mathcal{L}_{\mathrm{VCD}}(\theta)\)
-
4) Take gradient step w.r.t. \(\theta\)
summary : Download
25.Non-linear Independent Components Estimation (NICE)
( Laurent Dinh, et al, 2014 )
( download paper here : Download )
( need to know about “Variable Transformation & determinant of Jacobian” )
contribution of NICE :
- 1) computing the determinant of Jacobian & inverse Jacobian is trivial
- 2) still learn complex non-linear transformations ( with composition of simple blocks )
Coupling layer
-
(1) bijective transformation (2) triangular Jacobian ( makes it tractable!)
-
additive coupling layer
combining coupling layer
-
allows “Rescaling”
summary : Download
26.Variational Inference with Normalizing Flows
( Danilo Jimenez Rezende, Shakir Mohamed, 2016 )
( download paper here : Download )
limitations of variational methods : choice of posterior approximation are often limited
\(\rightarrow\) richer approximation is needed!
“Amortized Variational Inference” = (1) + (2)
- (1) MC gradient estimation
- (2) Inference network
For successful VI…need 2 requirements
-
1) efficient computation of derivatives of expected log-liklihood in ELBO
\(\rightarrow\) by Amortized Variational Inference
-
2) rich approximating distribution
\(\rightarrow\) by “NORMALIZING FLOW”
Formula of NF ( Successive application )
- \[\mathbf{z}_{K}=f_{K} \circ \ldots \circ f_{2} \circ f_{1}\left(\mathbf{z}_{0}\right)\]
- \[\ln q_{K}\left(\mathbf{z}_{K}\right)=\ln q_{0}\left(\mathbf{z}_{0}\right)-\sum_{k=1}^{K} \ln \left|\operatorname{det} \frac{\partial f_{k}}{\partial \mathbf{z}_{k-1}}\right|\]
-
\[\mathbb{E}_{q_{K}}[h(\mathbf{z})]=\mathbb{E}_{q_{0}}\left[h\left(f_{K} \circ f_{K-1} \circ \ldots \circ f_{1}\left(\mathbf{z}_{0}\right)\right)\right]\]
\(\rightarrow\) does not depend on \(q_{k}\)
for successful NF, we must
- 1) specify a class of invertible transformations
- 2) efficient mechanism for computing the determinant of Jacobian
\(\rightarrow\) should have low-cost computation of the determinant ( or where Jacobian is not needed )
Invertible Linear-time Transformations
- some types of flows can be invertible + calculated in linear time
- ex) Planar Flows, Radial Flows
summary : Download
27.Density Estimation using Real NVP
( Laurent Dinh, et al., 2017 )
( download paper here : Download )
real NVP
- real-valued non-volume preserving transformation
-
“Powerful, Invertible, Learnable” transformation
- tractable yet expressive approach to model high-dimensional data!
bijection : Coupling Layer
- \[y_{1: d} =x_{1: d}\] \[y_{d+1: D} =x_{d+1: D} \odot \exp \left(s\left(x_{1: d}\right)\right)+t\left(x_{1: d}\right)\]
-
1) easy calculation of Jacobian
2) invertible
Masked convolution ( with binary mask)
Combining coupling layers
Batch Normalizations
summary : Download
28.Glow_Generative Flow with Invertible 1x1 Convolutions
( Diederik P. Kingma, Prafulla Dhariwal, 2018 )
( download paper here : Download )
Glow
- simple type of generative flow, using “invertible 1 x 1 convolution”
- significant improvement in log-likelihood on standard benchmarks
Generative Modeling have advanced with likelihood-based methods Likelihood-based methods : three categories
- 1) Autoregressive models
- 2) VAEs
- 3) Flow-based generative models ( ex. NICE, RealNVP )
Proposed Generative Flow
- built on NICE and RealNVP
- consists of a series of steps of flows
- combined with multi-scale architecture
Architecture
- 1) actnorm (using scale & bias)
- 2) Invertible 1 x 1 convolution
- 3) Affine Coupling Layers
summary : Download
29.What Uncertainties Do We Need in Bayesian Deep Learning
( download paper here : Download )
30.Uncertainty quantification using Bayesian neural networks in classification_Application to ischemic stroke lesion segmentation
( download paper here : Download )