[Paper Review] 20. Seeing what a GAN cannot generate
Contents
- Abstract
- Introduction
- Method
    - Quantifying distribution-level model collapse
- Quantifying instance-level mode collapse
 
0. Abstract
GAN training’s key issue : mode collapse
- little work have focused on “understanding & quantifying mode collapse”
\(\rightarrow\) visualize mode collapse at both..
- 1) distribution level
- 2) instance level
1. Introduction
this paper aims to provide insights about dropped modes
( not to measure GAN quality using single number )
Particularly, wish to know…
- Does GAN deviate from target distn, by ignoring difficult images altogether?
- are there specific, semantically meaningful parts and objects that a GAN decides not to learn?


2. Method
goal : visualize & understand the semantic concepts that GAN “CAN NOT” generate
( in both (1) distn & (2) image instance )
Proceed in 2 steps!
[Step 1] Generated Image Segmentation
- 
    segment both “1) generated” & “2) target” images 
- 
    identify types of object that generator omits ( compared to distn of real images ) 
[Step 2] Layer Inversion
- visualize HOW the dropped object classes are omitted for individual images
(1) Quantifying distribution-level model collapse
errors of GAN :
- 
    analyzed by exploiting “hierarchical structure” of scene image 
- 
    each scene = natural decomposition into objects \(\rightarrow\) we can estimate deviations from true distn of scenes 
Example)
GAN that render bedrooms, should also render some curtains
- 
    if curtain statistics depart from true image, we will know we can look at curtains to see a specific flaw in GAN 
use segmentation using Unified Perceptual Parsing Network
( labels each pixel with 336 object classes)
Figure 2

- visualization of mean statistics for 2 networks
- mean segmentation frequencuy
    - network1 vs true distn
- network2 vs true distn
 
Possible to summarize statistical differences in segmentation in a single number
- FSD (Frechet Segmentation Distance)
[ Conclusion ]
- 
    Generated Image Segmentation Statistics measure the entire distn 
- 
    BUT, do not single out specific images, where an object should have been generated, but was not! 
(2) Quantifying instance-level mode collapse
compare image pairs \((\mathbf{x},\mathbf{x}^{'})\)
- 
    \(\mathbf{x}\) : real image ( have particular object, which is NOT in GENERATED image ) 
- 
    \(\mathbf{x}^{'}\) : projection onto space of all images ( output of GAN, where input is \(z^{'}\) ) 
Tractable Inversion Problem
seek \(\mathrm{x}^{\prime}=G\left(\mathbf{z}^{*}\right)\)
- 
    where \(\mathrm{z}^{*}=\arg \min _{\mathrm{z}} \ell(G(\mathbf{z}), \mathrm{x})\) 
- 
    but, fail to solve this inversion! ( due to many layers in \(G\) ) therefore, solve a tractable subproblem of full inversion \(\rightarrow\) decompose \(G\) ! 
Decomposition of \(G\)
- 
    \(G=G_{f}\left(g_{n}\left(\cdots\left(\left(g_{1}(\mathbf{z})\right)\right)\right)\right.\). 
- 
    thus, \(\operatorname{range}(G) \subset \operatorname{range}\left(G_{f}\right)\) - 
        meaning : any image that can not be generated by \(G_f\) , can not be generated by \(G\) either 
 
- 
        
Layer Inversion
- 
    goal : visualize omissions 
- 
    by solving easier problem of inverting the later layers \(G_f\) - 
        \(\mathrm{x}^{\prime}=G_{f}\left(\mathrm{r}^{*}\right)\), where \(\mathbf{r}^{*}=\underset{\mathbf{r}}{\arg \min } \ell\left(G_{f}(\mathbf{r}), \mathbf{x}\right)\) 
 
- 
        
- 
    solve in 2 steps - 
        step 1) construct (NN) \(E\), that approximately inverts entire \(G\) & compute an estimate \(\mathbf{z}_{0}=E(\mathbf{x})\) 
- 
        step 2) identify an intermediate representation \(\mathbf{r}^{*} \approx \mathbf{r}_{0}=g_{n}\left(\cdots\left(g_{1}\left(\mathbf{z}_{0}\right)\right)\right)\). where \(G_{f}\left(\mathbf{r}^{*}\right)\) closely recover \(x\) 
 
- 
        
Layer-wise Network Inversion
train a small network \(e_i\) to (approximately) invert \(g_i\)
- 
    train to minimize both left & right inversion losses \(\begin{aligned} \mathcal{L}_{\mathrm{L}} & \equiv \mathbb{E}_{\mathbf{Z}}\left[ \mid \mid \mathbf{r}_{i-1}-e\left(g_{i}\left(\mathbf{r}_{i-1}\right)\right) \mid \mid _{1}\right] \\ \mathcal{L}_{\mathrm{R}} & \equiv \mathbb{E}_{\mathbf{Z}}\left[ \mid \mid \mathbf{r}_{i}-g_{i}\left(e\left(\mathbf{r}_{i}\right)\right) \mid \mid _{1}\right] \\ e_{i} &=\underset{e}{\arg \min } \quad \mathcal{L}_{\mathrm{L}}+\lambda_{\mathrm{R}} \mathcal{L}_{\mathrm{R}} \end{aligned}\). 
once layers are all inverted…
compose an inversion network for entire \(G\)!
- \(E^{*}=e_{1}\left(e_{2}\left(\cdots\left(e_{n}\left(e_{f}(\mathrm{x})\right)\right)\right)\right)\).
Layer-wise image optimization
inverting entire \(G\) is difficult ( mentioned above )
Thus, start from…
- 
    1) \(\mathbf{r}_{0}=g_{n}\left(\cdots\left(g_{1}\left(\mathbf{z}_{0}\right)\right)\right)\) 
- 
    2) seek an intermediate representation \(\mathrm{r}^{*}\) - where \(G_{f}\left(\mathrm{r}^{*}\right)\) becomes reconstructed image
 
- 
    (summary) \(\begin{aligned} \mathbf{z}_{0} & \equiv E(\mathbf{x}) \\ \mathrm{r} & \equiv \delta_{n}+g_{n}\left(\cdots\left(\delta_{2}+g_{2}\left(\delta_{1}+g_{1}\left(\mathbf{z}_{0}\right)\right)\right)\right) \\ \mathbf{r}^{*} &=\underset{\mathbf{r}}{\arg \min }\left(\ell\left(\mathrm{x}, G_{f}(\mathrm{r})\right)+\lambda_{\mathrm{reg}} \sum_{i} \mid \mid \delta_{i} \mid \mid ^{2}\right) \end{aligned}\). - begin with initial guess \(\mathbf{z}_{0}\)
- then learn small perturbations of each layer
 

