[Paper Review] 16. InterFaceGAN : Interpreting the Disentangled Face Representation Learned by GANs


Contents

  1. Abstract
  2. Semantics in Latent Space
    1. Single Semantic
    2. Multiple Semantics
  3. Manipulation in Latent Space
    1. Single Attribute Manipulation
    2. Conditional Manipulation


0. Abstract

GAN lacks enough understanding, of what GANS have learned in latent representation

\(\rightarrow\) propose “InterFaceGAN”,

  • to interpret the disentangled face representation, learned by SOTA models


(1) find that GANS learn various semantics in some linear subspaces of latent space

(2) after identifying these subspaces, realistically manipulate the corresponding facial attributes


1. Semantics in Latent Space

analysis of properties of the semantics, emerging in the latent representations

Notation

  • generator : \(g: \mathcal{Z} \rightarrow \mathcal{X}\)

  • semantic scoring function : \(f_{S}: \mathcal{X} \rightarrow \mathcal{S}\)

    where \(\mathcal{S} \subseteq \mathbb{R}^{m}\) = semantic space with \(m\) semantics


Bridge the latent space \(\mathcal{Z}\) & semantic space \(\mathcal{S}\) with \(\mathrm{s}=f_{S}(g(\mathbf{z}))\),

  • s = semantic scores
  • \(\mathrm{z}\) = sampled latent coded


(1) Single Semantic

Interpolation

  • widely observed that when linearly interpolating two latent codes, \(\mathrm{z}_{1}\) and \(\mathrm{z}_{2}\)…

    \(\rightarrow\) appearance of the synthesis changes continuously ( change gradually )


Assumption

  • for any binary semantic ( ex) male/female ),

    there exists a HYPERPLANE in the latent space, serving as BOUNDARY

  • given a hyperplane with unit normal vector \(\mathbf{n} \in \mathbb{R}^{d}\),

    define DISTANCE ( from \(\mathbf{z}\) ~ hyperplane ) as : \(\mathrm{d}(\mathbf{n}, \mathbf{z})=\mathbf{n}^{T} \mathbf{z}\)

  • it is just when the “distance” changes its numerical sign that the semantic attribute reverses

    \(\rightarrow\) \(f(g(\mathbf{z}))=\lambda \mathrm{d}(\mathbf{n}, \mathbf{z})\)

    • \(f(\cdot)\) : scoring function
    • \(\lambda >0\) : measure “how fast semantic varies” along with the “change of distance”


(2) Multiple Semantics

case : \(m\) different semantics


Just Multivariate Version of (1) !

  • \[\mathbf{s} \equiv f_{S}(g(\mathbf{z}))=\Lambda \mathbf{N}^{T} \mathbf{z}.\]
    • where \(\mathrm{s}=\left[s_{1}, \ldots, s_{m}\right]^{T}\)denotes semantic scores
    • \(\operatorname{diag}\left(\lambda_{1}, \ldots, \lambda_{m}\right)\) : diagonal matrix with linear coefficients
    • \(\mathbf{N}=\left[\mathbf{n}_{1}, \ldots, \mathbf{n}_{m}\right]\) : separation boundaries


\(\mathrm{s} \sim \mathcal{N}\left(\mathbf{0}, \boldsymbol{\Sigma}_{\mathrm{s}}\right)\).

  • mean of \(s\) :

    • \(\mu_{\mathrm{s}} =\mathbb{E}\left(\Lambda \mathbf{N}^{T} \mathbf{z}\right)=\Lambda \mathbf{N}^{T} \mathbb{E}(\mathbf{z})=\mathbf{0}\).
  • covariance of \(s\) :

    • \(\boldsymbol{\Sigma}_{\mathbf{s}} =\mathbb{E}\left(\Lambda \mathbf{N}^{T} \mathbf{z} \mathbf{Z}^{T} \mathbf{N} \Lambda^{T}\right)=\Lambda \mathbf{N}^{T} \mathbb{E}\left(\mathbf{z} \mathbf{z}^{T}\right) \mathbf{N} \Lambda^{T} =\Lambda \mathbf{N}^{T} \mathbf{N} \Lambda\).

    • Different entries of s are disentangled if and only if \(\Sigma_{\mathrm{s}}\) is a diagonal matrix

      \(\rightarrow\) requires \(\left\{\mathbf{n}_{1}, \ldots, \mathbf{n}_{m}\right\}\) to be orthogonal with each other


2. Manipulation in Latent Space

introduce how to use the semantics found in the latent space for image editing


(1) Single Attribute Manipulation

edit original latent code \(\mathbf{z}\) with….

  • \(\mathrm{z}_{\text {edit }}=\mathrm{z}+\alpha \mathbf{n}\).


Ex) will make synthesis look more positive with that semantic, if \(\alpha >0\),

since \(f\left(g\left(\mathbf{z}_{\text {edit }}\right)\right)= f(g(\mathbf{z}))+\lambda \alpha\)


(2) Conditional Manipulation

when more than 1 attribute….

\(\rightarrow\) editing one may affect another! ( \(\because\) some semantics may be entagled )


for more precise control…propose CONDITIONAL manipulation

  • by manually forcing \(\mathbf{N}^{T} \mathbf{N}\) to be diagonal
  • use projection to make different vectors orthogonal!


figure2


Implementation Details

5 key facial attributes :

  • pose / smile(expression) / age / gender / eyeglasses

Tags:

Categories:

Updated: