Efficient Variational Inference for Gaussian Process Regression Networks (2013)Permalink
AbstractPermalink
( Multi-output regression ) correlation between Ys may vary with input space
GPRNs (Gaussian Process Regression Networks)
- flexible
- intractable
Thus, propose 2 efficient VI methods for GPRNs
(1) GPRN-MF
-
adopts mean-field with full Gaussian over GPRN’s parameters
(2) GPRN-NPV
- non-parametric VI
- derive analytical forms of ELBO
- closed-form updates of parameters
- O(N) for parameter’s covariances
1. IntroductionPermalink
Challenge in multi-output :
- 1) develop flexible models able to capture the dependencies between Ys
- 2) efficient inference
Various non-probabilistic approaches have been developed.
It is crucial to have full posterior probabilities
GP have proved very effective tools for single & multiple output
GP-based methods :
-
before) assume that the dependencies between the Ys are fixed
( = independent of the input space )
-
after ) correlation between Ys can be spatially adaptive
→ GAUSSIAN PROCESS REGRESSION NETWORKS (GPRNs)
This paper proposes “efficient approximate inference methods for GPRNs”
(1) First method : simple MF approach of GPRN
- show that…
- 1) can obtain analytical expression of ELBO & closed-form update of variational params
- 2) parameterize the corresponding covariances with only O(N) params
(2) Second method : exploits VI
- non-parametric VI to approximate posterior of GPRN’s params
- approximate complex distn, which are not well approximated by single Gaussian
- needs O(N) variational params
2. GPRNPermalink
Input : x∈RD.
Output : y(x)∈RP.
- assumed to be linear combination of Q noisy latent functions f(x)∈RQ
- corrupted by Gaussian noise
Mixing Coefficients : W(x)∈RP×RQ
[ GPRN model ]
y(x)=W(x)[f(x)+σfϵ]+σyzfj(x)∼GP(0,κf),j=1…QWij(x)∼GP(0,κw),i=1,…,P;j=1,…Qϵ∼N(ϵ;0,IQ)z∼N(z;0,IP).
Advantage of GPRN model :
-
1) dependencies of outputs y are induced via latent functions f
-
2) mixing coefficients W(x) explicitly depends on x
( = correlations are spatially adaptive )
Notation
-
Observed Inputs : X={(xi)}Ni=1
-
Observed Outputs : D={(yi)}Ni=1
-
concatenation of latent function params & weights : u=(ˆf,w),
-
noisy version of latent function values : ˆf=f+σfϵ,
-
hyperparameters of GPRN : θ={θf,θw,σf,σy}
Prior :
u : p(u∣θf,θw,σf)=N(u;0,Cu).
Conditional Likelihood :
p(D∣u,θ)=∏Nn=1N(y(xn);W(xn)ˆf(xn),σ2yIP).
-
y(x)=W(x)[f(x)+σfϵ]+σyz.
-
z∼N(z;0,IP).
Posterior :
p(u∣D,θ)∝p(u∣θf,θw,σf)p(D∣u,σy).
3. VI for GPRNsPermalink
minimize KL-divergence :
- KL(q(u)‖p(u∣D))=Eq[logq(u)p(u∣D)].
maximize ELBO :
- L(q)=Eq[logp(D∣f,w)]+Eq[logp(f,w)]+Hq[q(f,w)].
for mean-field method, we can obtain…
- 1) analytical expression for ELBO
- 2) need only O(N) params for covariances
3-1. MFVI for GPRNPermalink
factorized distributions :
q(f,w)=∏Qj=1q(fj)∏Pi=1q(wij).
- q(fj)=N(fj;μfj,Σfj).
- q(wij)=N(wij;μwij,Σwij).
3-1-1. Closed-form ELBOPermalink
( full Gaussian mean-field approximation ) ELBO
(1) First term :
Eq[logp(D∣f,w)]=−NP2log(2πσ2y)−12σ2y∑Nn=1(YT⋅n−Ωwnνfn)T(YT⋅n−Ωwnνfn)−12σ2y∑Pi=1∑Qj=1[diag(Σfj)T(μwij∙μwij)+diag(Σwij)T(μfj∙μfj)].
(2) Second term :
Eq[logp(f,w)]=−12∑Qj=1(log|Kf|+μTfjK−1fμfj+tr(K−1fΣfj))−12∑i,j(log|Kw|+μwijK−1wμwij+tr(K−1wΣwij)).
(3) Third term :
$$\mathcal{H}[q(\mathbf{f}, \mathbf{w})]=\frac{1}{2} \sum_{j=1}^{Q} \log \left | \boldsymbol{\Sigma}{\mathrm{f}{\mathrm{j}}}\right | +\frac{1}{2} \sum_{i, j} \log \left | \boldsymbol{\Sigma}{\mathrm{w}{\mathrm{ij}}}\right | +\mathrm{const}$$. |
3-1-2. Efficient Closed-form Updates for Variational ParametersPermalink
Parameters for q(fj)
- μfj=1σ2yΣfj∑Pi=1(Y⋅i−∑k≠jμwik∙μfk)∙μwij.
- Σfj=(K−1f+1σ2y∑Pi=1diag(μwij∙μwij+Var(wij)))−1.
Parameters for q(wij)
- μwij=1σ2yΣwij(Y⋅i−∑k≠jμfk∙μwik)∙μfj>
- Σwij=(K−1w+1σ2ydiag(μfj∙μfj+Var(fj)))−1.
3-1-3. Hyper-parameters LearningPermalink
hyperparameters : θ={θf,θw,σf,σy}.
learn by gradient-based optimization of ELBO
3-2. Non-parametric VI for GPRNPermalink
approximate posterior of GPRN, using mixture of K isotropic Gaussian
q(u)=1K∑Kk=1q(k)(u)=1K∑Kk=1N(u;μ(k),σ2kI).
- in practice, K is very small, so complexity is O(N)
3-2-1. Closed-form ELBOPermalink
q(u)=1K∑Kk=1q(k)(u)=1K∑Kk=1N(u;μ(k),σ2kI). cannot be computed analytically
→ need approximation
Expectations decompose as … (using mean-field) :
(1) First term
Eq[logp(D∣f,w)]=−12Kσ2y∑k∑n(YT⋅n−Ω(k)wnν(k)fn)T(YT⋅n−Ω(k)Wnν(k)fn)−12K(∑k,jPσ2kσ2yμ(k)Tfjμ(k)fj+∑k,i,jPσ2kσ2yμ(k)Twijμ(k)wij)−12K(∑kσ4kσ2yNPQ+NPlog(2πσ2y)).
(2) Second term
Eq[logp(f,w)]=−12(Qlog|Kf|+PQlog|Kw|)−12K[∑k,jμ(k)TfjK−1fμ(k)fj+σ2ktr(K−1f)+∑k,i,jμ(k)TwijK−1wμ(k)wij+σ2ktr(K−1w)].
(3) Third term
Hq[q(u)]≥−1K∑Kk=1log1K∑Kj=1N(\boldμ(k);\boldμ(j),(σ2k+σ2j)I).
(1)~(3) define tight ELBO
3-2-2. Optimization of Variational Parameters and Hyper-parametersPermalink
Optimization of variational params {μ(k)fj,μ(k)wij} & hyperparameters θ
3-3. Predictive DistributionPermalink
for non-parametric VI, predictive mean turns out to be…
E[y∗∣x∗,D]=1K∑Kk=1K∗wK−1wμ(k)wK∗fK−1fμ(k)f.