Variational Denoising Network : Toward Blind Noise Modeling and Removal ( NeurIPS 2019 )


Abstract

goal : Blind Image Denoising

method : new VI method

  • integrates both (1) noise estimation & (2) image denoising into unique Bayesian framework

  • approximate posterior : parameterized by DNN

  • intrinsic clean image & noise variances as latent variables,

    conditioned on noisy input


Variational Denoising Network ( VDN )

  • perform denoising efficienctly due to its explicit form of posterior


1. Introduction

Image denoising?

  • recover underlying clean image of noisy one


2 main methods

  • (1) MAP ( with loss & regularized terms )
    • limitations due to assumption on image prior & noise
  • (2) Deep Learning
    • first, collect large amount of “noisy-clean image pairs”
    • easy to overfit


Propose a new VI method!

  • directly infer both (1) underlying clean image & (2) noise distn from a noisy image


2. Related Works

(1) model-driven MAP based

(2) data-driven Deep Learning based


Model-driven MAP based methods

  • most classical image denoising methods
  • fidelty/loss term & regularization
  • pre-known image prior


Data-driven Deep Learning based Mtehods

  • instead of pre-setting image prior, directly learn a denoiser
  • input : large collection of noisy-clean image pair


3. Variational Denoising Network for Blind Noise Modeling

Training data \(D=\left\{\boldsymbol{y}_{j}, \boldsymbol{x}_{j}\right\}_{j=1}^{n}\).

  • \(\boldsymbol{x}_{j}\) : clean image
  • \(\boldsymbol{y}_{j}\) : noisy image


3-1. Constructing Full Bayesian Model based on training data

notation

  • \(\boldsymbol{x}=\left[x_{1}, \cdots, x_{d}\right]^{T}\).
  • \(\boldsymbol{y}=\left[y_{1}, \cdots, y_{d}\right]^{T}\).
  • \(z \in \mathbb{R}^{d}\) :latent clean image


(1) generation process of noisy image :

  • \(y_{i} \sim \mathcal{N}\left(y_{i} \mid z_{i}, \sigma_{i}^{2}\right), i=1,2, \cdots, d\).

(2) conjugate Gaussian prior on \(z\) :

  • \(z_{i} \sim \mathcal{N}\left(z_{i} \mid x_{i}, \varepsilon_{0}^{2}\right), i=1,2, \cdots, d\).

(3) conjugate Inverse Gamma prior on \(\sigma^2\) :

  • \(\sigma_{i}^{2} \sim \operatorname{IG}\left(\sigma_{i}^{2} \mid \frac{p^{2}}{2}-1, \frac{p^{2} \xi_{i}}{2}\right), i=1,2, \cdots, d\).


with (1) ~ (3) : full Bayesian model can be obtained.

Goal : Infer the posterior of latent variables \(z\) and \(\sigma^2\) from noisy image \(y\)


3-2. Variational Form of Posterior

Assume conditional independence between \(\sigma^2\) and \(z\)

\(q\left(\boldsymbol{z}, \boldsymbol{\sigma}^{2} \mid \boldsymbol{y}\right)=q(\boldsymbol{z} \mid \boldsymbol{y}) q\left(\boldsymbol{\sigma}^{2} \mid \boldsymbol{y}\right)\).

  • \(q(\boldsymbol{z} \mid \boldsymbol{y})=\prod_{i}^{d} \mathcal{N}\left(z_{i} \mid \mu_{i}\left(\boldsymbol{y} ; W_{D}\right), m_{i}^{2}\left(\boldsymbol{y} ; W_{D}\right)\right)\).
  • \(q\left(\boldsymbol{\sigma}^{2} \mid \boldsymbol{y}\right)=\prod_{i}^{d} \operatorname{IG}\left(\sigma_{i}^{2} \mid \alpha_{i}\left(\boldsymbol{y} ; W_{S}\right), \beta_{i}\left(\boldsymbol{y} ; W_{S}\right)\right)\).

where

  • \(\mu_{i}\left(\boldsymbol{y} ; W_{D}\right)\) and \(m_{i}^{2}\left(\boldsymbol{y} ; W_{D}\right)\) : posterior params of latent variable \(z\) ….. D-Net (Denoising)
  • \(\alpha_{i}\left(\boldsymbol{y} ; W_{S}\right)\) and \(\beta_{i}\left(\boldsymbol{y} ; W_{S}\right)\) : posterior params of \(\sigma^2\) ……. S-Net (Sigma)


figure2


3-3. Variational Lower Bound of Marginal Data Likelihood

Decompose its marginal likelihood

  • \(\log p\left(\boldsymbol{y} ; \boldsymbol{z}, \boldsymbol{\sigma}^{2}\right)=\mathcal{L}\left(\boldsymbol{z}, \boldsymbol{\sigma}^{2} ; \boldsymbol{y}\right)+D_{K L}\left(q\left(\boldsymbol{z}, \boldsymbol{\sigma}^{2} \mid \boldsymbol{y}\right) \mid \mid p\left(\boldsymbol{z}, \boldsymbol{\sigma}^{2} \mid \boldsymbol{y}\right)\right)\).

    where \(\mathcal{L}\left(\boldsymbol{z}, \boldsymbol{\sigma}^{2} ; \boldsymbol{y}\right)=E_{q\left(\boldsymbol{z}, \boldsymbol{\sigma}^{2} \mid \boldsymbol{y}\right)}\left[\log p\left(\boldsymbol{y} \mid \boldsymbol{z}, \boldsymbol{\sigma}^{2}\right) p(\boldsymbol{z}) p\left(\boldsymbol{\sigma}^{2}\right)-\log q\left(\boldsymbol{z}, \boldsymbol{\sigma}^{2} \mid \boldsymbol{y}\right)\right]\) ( = ELBO )

  • \(\log p\left(\boldsymbol{y} ; \boldsymbol{z}, \boldsymbol{\sigma}^{2}\right) \geq \mathcal{L}\left(\boldsymbol{z}, \boldsymbol{\sigma}^{2} ; \boldsymbol{y}\right)\).


Rewrite

  • \(\mathcal{L}\left(\boldsymbol{z}, \boldsymbol{\sigma}^{2} ; \boldsymbol{y}\right)=E_{q\left(\boldsymbol{z}, \boldsymbol{\sigma}^{2} \mid \boldsymbol{y}\right)}\left[\log p\left(\boldsymbol{y} \mid \boldsymbol{z}, \boldsymbol{\sigma}^{2}\right)\right]-D_{K L}(q(\boldsymbol{z} \mid \boldsymbol{y}) \mid \mid p(\boldsymbol{z}))-D_{K L}\left(q\left(\boldsymbol{\sigma}^{2} \mid \boldsymbol{y}\right) \mid \mid p\left(\boldsymbol{\sigma}^{2}\right)\right)\).
  • It can be integrated analytically!

  • Term 1)

    \(E_{q\left(z, \sigma^{2} \mid y\right)}\left[\log p\left(\boldsymbol{y} \mid \boldsymbol{z}, \boldsymbol{\sigma}^{2}\right)\right]=\sum_{i=1}^{d}\left\{-\frac{1}{2} \log 2 \pi-\frac{1}{2}\left(\log \beta_{i}-\psi\left(\alpha_{i}\right)\right)-\frac{\alpha_{i}}{2 \beta_{i}}\left[\left(y_{i}-\mu_{i}\right)^{2}+m_{i}^{2}\right]\right\}\).

  • Term 2)

    \(\begin{array}{c} D_{K L}(q(\boldsymbol{z} \mid \boldsymbol{y}) \mid \mid p(\boldsymbol{z}))=\sum_{i=1}^{d}\left\{\frac{\left(\mu_{i}-x_{i}\right)^{2}}{2 \varepsilon_{0}^{2}}+\frac{1}{2}\left[\frac{m_{i}^{2}}{\varepsilon_{0}^{2}}-\log \frac{m_{i}^{2}}{\varepsilon_{0}^{2}}-1\right]\right\} \end{array}\).

  • Term 3)

    \(\begin{aligned} D_{K L}\left(q\left(\sigma^{2} \mid y\right) \mid \mid p\left(\sigma^{2}\right)\right)=\sum_{i=1}^{d}\left\{\left(\alpha_{i}\right.\right.&\left.-\frac{p^{2}}{2}+1\right) \psi\left(\alpha_{i}\right)+\left[\log \Gamma\left(\frac{p^{2}}{2}-1\right)-\log \Gamma\left(\alpha_{i}\right)\right] \\ &\left.+\left(\frac{p^{2}}{2}-1\right)\left(\log \beta_{i}-\log \frac{p^{2} \xi_{i}}{2}\right)+\alpha_{i}\left(\frac{p^{2} \xi_{i}}{2 \beta_{i}}-1\right)\right\} \end{aligned}\).


Final Objective Function : \(\min _{W_{D}, W_{S}}-\sum_{j=1}^{n} \mathcal{L}\left(z_{j}, \sigma_{j}^{2} ; y_{j}\right)\).


4. Conclusion

New Variational Inference Algorithm, called VDN(Variational Denoising Network) ,for blinding image denoising

Main Idea : learn an approximate posterior to the true posterior, with latent variables (clean image & noise variance)

proposed VDN is a generative method, which can estimate the noise distn from the input data.

Categories:

Updated: