Efficient Variational Inference for Gaussian Process Regression Networks (2013)Permalink


AbstractPermalink

( Multi-output regression ) correlation between Ys may vary with input space

GPRNs (Gaussian Process Regression Networks)

  • flexible
  • intractable

Thus, propose 2 efficient VI methods for GPRNs


(1) GPRN-MF

  • adopts mean-field with full Gaussian over GPRN’s parameters


(2) GPRN-NPV

  • non-parametric VI
  • derive analytical forms of ELBO
  • closed-form updates of parameters
  • O(N) for parameter’s covariances


1. IntroductionPermalink

Challenge in multi-output :

  • 1) develop flexible models able to capture the dependencies between Ys
  • 2) efficient inference


Various non-probabilistic approaches have been developed.

It is crucial to have full posterior probabilities

GP have proved very effective tools for single & multiple output


GP-based methods :

  • before) assume that the dependencies between the Ys are fixed

    ( = independent of the input space )

  • after ) correlation between Ys can be spatially adaptive

    GAUSSIAN PROCESS REGRESSION NETWORKS (GPRNs)


This paper proposes “efficient approximate inference methods for GPRNs”

(1) First method : simple MF approach of GPRN

  • show that…
    • 1) can obtain analytical expression of ELBO & closed-form update of variational params
    • 2) parameterize the corresponding covariances with only O(N) params


(2) Second method : exploits VI

  • non-parametric VI to approximate posterior of GPRN’s params
  • approximate complex distn, which are not well approximated by single Gaussian
  • needs O(N) variational params


2. GPRNPermalink

Input : xRD.

Output : y(x)RP.

  • assumed to be linear combination of Q noisy latent functions f(x)RQ
  • corrupted by Gaussian noise

Mixing Coefficients : W(x)RP×RQ


[ GPRN model ]

y(x)=W(x)[f(x)+σfϵ]+σyzfj(x)GP(0,κf),j=1QWij(x)GP(0,κw),i=1,,P;j=1,QϵN(ϵ;0,IQ)zN(z;0,IP).


Advantage of GPRN model :

  • 1) dependencies of outputs y are induced via latent functions f

  • 2) mixing coefficients W(x) explicitly depends on x

    ( = correlations are spatially adaptive )


Notation

  • Observed Inputs : X={(xi)}Ni=1

  • Observed Outputs : D={(yi)}Ni=1

  • concatenation of latent function params & weights : u=(ˆf,w),

  • noisy version of latent function values : ˆf=f+σfϵ,

  • hyperparameters of GPRN : θ={θf,θw,σf,σy}


Prior :

u : p(uθf,θw,σf)=N(u;0,Cu).


Conditional Likelihood :

p(Du,θ)=Nn=1N(y(xn);W(xn)ˆf(xn),σ2yIP).

  • y(x)=W(x)[f(x)+σfϵ]+σyz.

  • zN(z;0,IP).


Posterior :

p(uD,θ)p(uθf,θw,σf)p(Du,σy).


3. VI for GPRNsPermalink

minimize KL-divergence :

  • KL(q(u)p(uD))=Eq[logq(u)p(uD)].

maximize ELBO :

  • L(q)=Eq[logp(Df,w)]+Eq[logp(f,w)]+Hq[q(f,w)].


for mean-field method, we can obtain…

  • 1) analytical expression for ELBO
  • 2) need only O(N) params for covariances


3-1. MFVI for GPRNPermalink

factorized distributions :

q(f,w)=Qj=1q(fj)Pi=1q(wij).

  • q(fj)=N(fj;μfj,Σfj).
  • q(wij)=N(wij;μwij,Σwij).


3-1-1. Closed-form ELBOPermalink

( full Gaussian mean-field approximation ) ELBO

(1) First term :

Eq[logp(Df,w)]=NP2log(2πσ2y)12σ2yNn=1(YTnΩwnνfn)T(YTnΩwnνfn)12σ2yPi=1Qj=1[diag(Σfj)T(μwijμwij)+diag(Σwij)T(μfjμfj)].


(2) Second term :

Eq[logp(f,w)]=12Qj=1(log|Kf|+μTfjK1fμfj+tr(K1fΣfj))12i,j(log|Kw|+μwijK1wμwij+tr(K1wΣwij)).


(3) Third term :

$$\mathcal{H}[q(\mathbf{f}, \mathbf{w})]=\frac{1}{2} \sum_{j=1}^{Q} \log \left \boldsymbol{\Sigma}{\mathrm{f}{\mathrm{j}}}\right +\frac{1}{2} \sum_{i, j} \log \left \boldsymbol{\Sigma}{\mathrm{w}{\mathrm{ij}}}\right +\mathrm{const}$$.


3-1-2. Efficient Closed-form Updates for Variational ParametersPermalink

Parameters for q(fj)

  • μfj=1σ2yΣfjPi=1(Yikjμwikμfk)μwij.
  • Σfj=(K1f+1σ2yPi=1diag(μwijμwij+Var(wij)))1.


Parameters for q(wij)

  • μwij=1σ2yΣwij(Yikjμfkμwik)μfj>
  • Σwij=(K1w+1σ2ydiag(μfjμfj+Var(fj)))1.


3-1-3. Hyper-parameters LearningPermalink

hyperparameters : θ={θf,θw,σf,σy}.

learn by gradient-based optimization of ELBO


3-2. Non-parametric VI for GPRNPermalink

approximate posterior of GPRN, using mixture of K isotropic Gaussian

q(u)=1KKk=1q(k)(u)=1KKk=1N(u;μ(k),σ2kI).

  • in practice, K is very small, so complexity is O(N)


3-2-1. Closed-form ELBOPermalink

q(u)=1KKk=1q(k)(u)=1KKk=1N(u;μ(k),σ2kI). cannot be computed analytically

need approximation


Expectations decompose as … (using mean-field) :

(1) First term

Eq[logp(Df,w)]=12Kσ2ykn(YTnΩ(k)wnν(k)fn)T(YTnΩ(k)Wnν(k)fn)12K(k,jPσ2kσ2yμ(k)Tfjμ(k)fj+k,i,jPσ2kσ2yμ(k)Twijμ(k)wij)12K(kσ4kσ2yNPQ+NPlog(2πσ2y)).


(2) Second term

Eq[logp(f,w)]=12(Qlog|Kf|+PQlog|Kw|)12K[k,jμ(k)TfjK1fμ(k)fj+σ2ktr(K1f)+k,i,jμ(k)TwijK1wμ(k)wij+σ2ktr(K1w)].


(3) Third term

Hq[q(u)]1KKk=1log1KKj=1N(\boldμ(k);\boldμ(j),(σ2k+σ2j)I).


(1)~(3) define tight ELBO


3-2-2. Optimization of Variational Parameters and Hyper-parametersPermalink

Optimization of variational params {μ(k)fj,μ(k)wij} & hyperparameters θ


3-3. Predictive DistributionPermalink

for non-parametric VI, predictive mean turns out to be…

E[yx,D]=1KKk=1KwK1wμ(k)wKfK1fμ(k)f.

Categories:

Updated: