[Paper Review] VI/BNN/NF paper 11~20

I have summarized the must read + advanced papers of papers regarding….

  • various methods using Variational Inference

  • Bayesian Neural Network

  • Probabilistic Deep Learning

  • Normalizing Flows

11. Deep Neural Networks as Gaussian Processes

Lee, J., et al. (2017)

( download paper here : Download )

  • Neal (1994) : 1-layer NN = GP

    this paper : DNN = GP

  • recursive, deterministic computation of kernel function

  • summary : Download



12. Representing Inferential Uncertainty in Deep Neural Networks through Sampling

McClure, P., & Kriegeskorte, N. (2016)

( download paper here : Download )

  • Bayesian model catches model uncertainty
  • Bayesian DNN trained with..
    • 1) Bernoulli drop out
    • 2) Bernoulli drop connect
    • 3) Gaussian drop out
    • 4) Gaussian drop connect
    • 5) Spike-and Slab Dropout
  • summary : Download



13. Bayesian Uncertainty Estimation for Batch Normalized Deep Networks

Teye, M., Azizpour, H., & Smith, K. (2018)

( download paper here : Download )

  • BN (Batch Normalization) = approximate inference in Bayesian models

    \(\rightarrow\) allow us to estimate “model uncertainty” under the “conventional architecture” !!

    ( Previous works : mostly required modification of architecture )

  • BN

    • training) use mini batch ( estimated mean & var for each mini-batch )
  • evaluation) use all the training data

  • (1) Bayesian Modeling : VA (Variational Approximation)

    (2) DNN with Batch Normalization

    \(\rightarrow\) result : (1) = (2)

  • predictive uncertainty in Batch Normalized Deep Nets!

  • network is trained just as a regular BN network!

    ( but, instead of replacing \(w=\{\mu_B, \sigma_B\})\) with population values from \(D\),

    update these params stochastically! )

  • summary : Download



14. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles

Lakshminarayanan, B., Pritzel, A., & Blundell, C. (2017)

( download paper here : Download )

  • propose an alternative to BNN

    ( not a Bayesian Method )

  • advantages

    • 1) simple to implement, 2) parallelizable. 3) very little hyperparameter tuning,

      4) yields high quality of predictive uncertainty estimates

  • 2 evaluation measures

    • (1) calibration ( measure by proper scoring rules )
    • (2) generalization to unknown class ( OOD examples )
  • two modifications

    • Ensembles

    • Adversarial training ( adversarial examples : close to the original train data, but misclassified by NN )

  • summary : Download



15. Fast Dropout Training

Wang, S., & Manning, C. (2013)

( download paper here : Download )

  • Dropout : repeatedly sampling makes it slower

    \(\rightarrow\) use Gaussian Approximation to make it faster!

  • problems with dropout

    • 1) slow training
    • 2) loss of information
  • with Gaussian Approximation \(Y \rightarrow S\):

    • (1) faster ( without actually sampling )

      (2) efficient ( use all data )

    • ( \(m\) : number of dimension , \(K\) : number of samples)

      original dropout ) sample from \(Y\) …. \(O(mK)\) times

      with GA ) sample from \(S\) …. \(O(K)\) times

  • summary : Download



16. Variational Dropout and Local Reparameterization Trick

Kingma, D. P., Salimans, T., & Welling, M. (2015)

( download paper here : Download )

  • propose LRT for reducing variance of SGVB
    • LRT : Local Reparameterization Trick
    • SGVB : Stochastic Gradient Variational Bayes
  • LRT :

    • translates uncertainty about global parameters into local noise ( which is independent across mini-batch )
    • can be parallelized
    • has variance ( inversely proportional to the mini-batch size \(M\) )
  • connection with dropout

    • Gaussian dropout = SGVB with LRT

    • propose “Variational Dropout”

      ( = generalization of Gaussian dropout )

  • summary : Download



17. Dropout as a Bayesian Approximation : Representing Model Uncertainty in Deep Learning

Gal, Y., & Ghahramani, Z. (2016)

( download paper here : Download )

  • Model Uncertainty with Dropout NNs

    ( Dropout in NN = approximate Bayesian inference in deep Gaussian )

  • Dropout

    • can be interpreted as “Bayesian Approximation” of GP
    • avoid overfitting
    • approximately integrates over the models’ weights
  • obtaining model uncertainty

    • [step 1] sample \(T\) set of vectors
    • [step 2] find \(W\)
    • [step 3] MC Dropout ( estimate mean, variance )
  • summary : Download



18. Variational Dropout Sparsifies Deep Neural Networks

Molchanov, D., Ashukha, A., & Vetrov, D. (2017)

( download paper here : Download )

  • Key point

    • 1) Sparse Variational Dropout

      \(\rightarrow\) extend VD(Variational Dropout) to the case where “dropout rates are unbounded”

    • 2) reduce the variance of the gradient estimator

      \(\rightarrow\) leads to faster convergence

  • Instead of injecting noise…use “Sparsity”

  • summary : Download



19. Relevance Vector Machine Explained

Fletcher, T. (2010)

( download paper here : Download )

  • problems with SVM (Support Vector Machine)

    • not a probabilistic prediction
    • only binary decision
    • have to tune the hyperparameter \(C\)
  • SVM vs RVM

    • Sparsity : RVM > SVM

    • Generalization : RVM > SVM

    • Need to estimate hyperparamter : only SVM

    • Training time : RVM > SVM

      ( can be solved with sparsity )

  • summary : Download



20. Uncertainty in Deep Learning (1)

Gal, Y. (2016)

( download paper here : Download )

  • Importance of knowing what we don’t know

  • Model Uncertainty

    • 1) Aleatoric Uncertainty ( = Data Uncertainty )
      • noisy data
    • 2) Epistemic Uncertainty ( = Model Uncertainty )
      • uncertainty in model parameters
    • 3) Out of Distribution
      • the point lies outside the data distribution
  • BNN (Bayesian Neural Network)

    • GP can be recovered with infinitely many weights
    • model uncertainty can be obtained by placing “distribution over weights”
  • models which gives uncertainty usually do not scale well

    \(\rightarrow\) need practical techniques (ex. SRT)

  • SRT (Stochastic Regularization Techniques)

    • adapt the model output “stochastically” as a way of model regularization
    • predictive mean/variance \& random output
  • summary : Download



20. Uncertainty in Deep Learning (2)

  • review of Variational Inference
    • ELBO = (1) + (2)
      • (1) encourage $q$ to explain the data well
      • (2) encourage $q$ to be close to the prior
    • replace “marginalization” to “optimization”
    • does not scale to large data \& complex models
  • previous histories of BNN
  • 3 Modern Approximate Inference
    • 1) Variational Inference ( above )
    • 2) Sampling based techniques ( HMC, Langevin method , SGLD …)
    • 3) Ensemble methods ( produce point estimate many times )
  • summary : Download



20. Uncertainty in Deep Learning (3)

  • Bayesian Deep Learning : based on 2 works

    • (1) MC estimation ( Graves, 2011 )
    • (2) VI ( Hinton and Van Camp, 1993 )
  • BNN inference + SRTs

  • Steps

    • [step 1] analyze variance of several stochastic estimators (used in VI)
    • [step 2] tie these derivations to SRTs
    • [step 3] propose practical techniques to obtain “model uncertainty”
  • “expected log likelihood” in ELBO

    • problem 1) high computation

      \(\rightarrow\) solve by data sub-sampling (mini-batch optimization)

    • problem 2) intractable integral

      \(\rightarrow\) MC integration

  • MC estimators

    • 1) score function estimator ( = likelihood ratio estimator, Reinforce)
    • 2) path-wise derivative estimator ( = reparameterization trick )
    • 3) characteristic function estimator
  • SRT (Stochastic Regularization Techniques)

    • regularize model through injection of stochastic noise
    • Dropout, multiplicative Gaussian Noise….
  • alternative of Dropout…

    • 1) Additive Gaussian Noise
    • 2) Multiplicative Gaussian Noise
    • 3) Drop connect

    algorithm 1(VI) = algorithm 2(DO) ( KL Condition )

    • VI) minimize divergence

    • DO) optimization of NN with Dropout

  • Model uncertainty in BNN

  • Uncertainty in Classification

    • variation ratios / predictive entropy / mutual information
  • Bayesian CNN/RNN

  • summary : Download