[Paper Review] 24. Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing


Contents

  1. Abstract
  2. Introduction
  3. Related Work
    1. Optimization-based editing methods
    2. Learning-based editing methods
    3. Local editing methods
  4. StyleMapGAN
    1. Stylemap-based generator
    2. Training procedure and losses
    3. Local Editing


0. Abstract

figure2


Editing REAL images with GAN : suffer from…

  • [ problem 1 ]

    time-consuming optimization for projecting real \(\rightarrow\) latent

  • [ problem 2 ]

    inaccurate embedding through an encoder


Propose StyleMapGAN

  • intermediate latent space has SPATIAL dimensions
  • spatially variant modulation replaces AdaIN


1. Introduction

still challenging to apply manipulations to REAL IMAGES

  • since GAN lacks an inverse mapping from image back to latent code


[ Manipulating REAL images ]

(1) image-to-image translation

  • synthesize an output image, given a user’s input directly
  • problem : need pre-defined tasks & heavy supervision

(2) pretrained GAN

  • directly optimize the latent code for eadch image

(3) train extra encoder

  • more practical

  • project an image into its corresponding latent code

  • single feed-forward

  • BUT, low fidelity of projected images

    due to “absence of spatial dimension” in the latent space


StyleMapGAN

  • exploits style map, a novel representation of latent space

  • vector-based representation (X)

    tensor with explicit spatial dimensions (O)


2. Related Work

1) Optimization-based editing methods

iteratively update latent vector of pre-trained GANs

examples)

  • Image2StyleGAN
  • In-DomainGAN
  • Neural Collage, pix2latent


but this paper exploits an encoder,

which is faster than the above methods!


2) Learning-based editing methods

train an extra encoder to DIRECTLY infer the latent code

examples)

  • ALI
  • BiGAN
  • ALAE


FAST, but all those methods lack spatial dimensions!


3) Local editing methods

editing specific parts

examples)

  • Editing in Style
  • Structured Noise
  • SEAN


3. StyleMapGAN

GOAL :

  • project images to latent space
  • with an encoder
  • in real-time
  • and locally manipulate images on latent space


Propose StyleMapGAN…

  • 1) intermediate latent space with spatial dimensions
  • 2) spatially variant modulation based on the stylemap


1) Stylemap-based generator

figure2


Spatial dimensions

  • much more effective at inference
  • enables local editing


Affine Transform

  • produces parameters for modulation, regarding the resized stylemaps

  • modulation operation of the i-th layer :

    \(h_{i+1}=\left(\gamma_{i} \otimes \frac{h_{i}-\mu_{i}}{\sigma_{i}}\right) \oplus \beta_{i}\).


Remove per-pixel noise

  • per-pixel noise : extra source of spatially varying inputs
  • BUT stylemap already provides spatially varying inputs!


2) Training procedure and losses

figure2

  • F : mapping network
  • G : synthesis network with stylemap resizer
  • E : encoder
  • D : discriminator


3) Local Editing

GOAL

  • transplant some parts of reference image to an original image, w.r.t a mask

  • project original & reference image through the encoder

    and obtain stylemaps \(\mathrm{w}\) and \(\widetilde{\mathrm{w}}\)

  • editied style map \(\ddot{\mathbf{w}}\) :

    \(\ddot{\mathbf{w}}=\mathbf{m} \otimes \widetilde{\mathbf{w}} \oplus(1-\mathbf{m}) \otimes \mathbf{w}\).


figure2

Tags:

Categories:

Updated: