[Paper Review] 24. Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing
Contents
- Abstract
- Introduction
- Related Work
- Optimization-based editing methods
- Learning-based editing methods
- Local editing methods
- StyleMapGAN
- Stylemap-based generator
- Training procedure and losses
- Local Editing
0. Abstract
Editing REAL images with GAN : suffer from…
-
[ problem 1 ]
time-consuming optimization for projecting real \(\rightarrow\) latent
-
[ problem 2 ]
inaccurate embedding through an encoder
Propose StyleMapGAN
- intermediate latent space has SPATIAL dimensions
- spatially variant modulation replaces AdaIN
1. Introduction
still challenging to apply manipulations to REAL IMAGES
- since GAN lacks an inverse mapping from image back to latent code
[ Manipulating REAL images ]
(1) image-to-image translation
- synthesize an output image, given a user’s input directly
- problem : need pre-defined tasks & heavy supervision
(2) pretrained GAN
- directly optimize the latent code for eadch image
(3) train extra encoder
-
more practical
-
project an image into its corresponding latent code
-
single feed-forward
-
BUT, low fidelity of projected images
due to “absence of spatial dimension” in the latent space
StyleMapGAN
-
exploits style map, a novel representation of latent space
-
vector-based representation (X)
tensor with explicit spatial dimensions (O)
2. Related Work
1) Optimization-based editing methods
iteratively update latent vector of pre-trained GANs
examples)
- Image2StyleGAN
- In-DomainGAN
- Neural Collage, pix2latent
but this paper exploits an encoder,
which is faster than the above methods!
2) Learning-based editing methods
train an extra encoder to DIRECTLY infer the latent code
examples)
- ALI
- BiGAN
- ALAE
FAST, but all those methods lack spatial dimensions!
3) Local editing methods
editing specific parts
examples)
- Editing in Style
- Structured Noise
- SEAN
3. StyleMapGAN
GOAL :
- project images to latent space
- with an encoder
- in real-time
- and locally manipulate images on latent space
Propose StyleMapGAN…
- 1) intermediate latent space with spatial dimensions
- 2) spatially variant modulation based on the stylemap
1) Stylemap-based generator
Spatial dimensions
- much more effective at inference
- enables local editing
Affine Transform
-
produces parameters for modulation, regarding the resized stylemaps
-
modulation operation of the i-th layer :
\(h_{i+1}=\left(\gamma_{i} \otimes \frac{h_{i}-\mu_{i}}{\sigma_{i}}\right) \oplus \beta_{i}\).
Remove per-pixel noise
- per-pixel noise : extra source of spatially varying inputs
- BUT stylemap already provides spatially varying inputs!
2) Training procedure and losses
- F : mapping network
- G : synthesis network with stylemap resizer
- E : encoder
- D : discriminator
3) Local Editing
GOAL
-
transplant some parts of reference image to an original image, w.r.t a mask
-
project original & reference image through the encoder
and obtain stylemaps \(\mathrm{w}\) and \(\widetilde{\mathrm{w}}\)
-
editied style map \(\ddot{\mathbf{w}}\) :
\(\ddot{\mathbf{w}}=\mathbf{m} \otimes \widetilde{\mathbf{w}} \oplus(1-\mathbf{m}) \otimes \mathbf{w}\).