87.Flexible MFVI using mixtures of non-overlapping exponential families

Paper Review by Seunghan Lee

1 minute read

Seunghan Lee

Seunghan Lee

Deep Learning, Data Science, Statistics

Flexible MFVI using mixtures of non-overlapping exponential families ( NeurIPS 2020 )

Abstract

Sparse model : perform…

1) automatic variable selection
2) aid interpretability
3) provide regularization

analytically obtaining a posterior distribution over the parameters of interest is intractable

MFVI (Mean Field Variational Inference)

simple and popular framework
often amenable to “analytically” deriving closed-form
but….fail to produce sensible results for models with sparsity-inducing priors (ex. spike-and-slab)

Mixture of exponential family distns with non-overlapping support form an exponential family

1. Introduction

Variational Inference

avoids sampling, rather fits an approximate posterior
usually, use reverse KL-div (\(KL(Q \mid \mid P)\) )
simple variational families
- (pros) more computationally tractable
- (cons) large approximation error
MFVI : computationally efficient

Many approaches have been developed

coupled hidden Markov models
- more expressive variational famililes
- an approach called, structured VI
black-box VI
- seeks to automatically derive gradient estimators
combining DL with Bayesian graphical models

2. More flexible exponential families

mixtures of distributions from an exponential family no longer form an exponential family.

BUT, not in the case of distinct support!

But, the above are of little use in MFVI, unless they are conjugate to widely used likelihood models.

\(\rightarrow\) how to create a model with “(1) more flexible” **prior, while maintaining **“(2) conjugacy”

Meaning?

[ Interpretation 1 ]

given conjugate prior ( say \(P\) ), we can create a more flexible prior that maintains conjugacy, by combining non-overlapping component distn
[ Interpretation 2 ]

if each component exponential family is conjugate to a distn & distns in different component families have non-overlapping support, then mixture distn form a conjugate exponential family

Twitter Facebook LinkedIn

You May Also Enjoy

2 minute read

2 minute read

8 minute read

2 minute read