[Paper Review] 21. Editing in Style
Contents
- Abstract
- Related Works
- GAN-based Image Editing
- Local Semantics in Generative Models
- Feature Factorization
- Local Editing
0. Abstract
ability to control & condition the output is still limited
\(\rightarrow\) introduce a simple & effective method for making local, semantically-aware edits to a target output image
1. Related Works
goal is NOT to propose new GAN,
BUT to offer local editing method for its output
( by changing the style of specific objects )
(1) GAN-based Image Editing
semantic image editing
- 1) latent code-based : for GLOBAL attribute editing
- 2) activation-based : for LOCAL ~
Latent code-based
-
learn a manifold in latent space
-
perform semantic edits, by traversing paths along this manifold
-
example )
-
use AE to disentangle image into semantic subspaces & reconstruct the image
-
global changes in color/light/;pose/…
-
Activation-based
-
directly manipulate specific SPATIAL positions on
activation tensor, at certain CNN layer
-
example )
- GAN Dissection controls the presence/absence of objects at given position
This paper focuses on latent code-based approach for local editing
- neither rely on external supervision
- nor involves complex spatial blending operations
2. Local Semantics in Generative Models
(1) Feature Factorization
DFF (Deep Feature Factorization)
-
explains CNN’s learned representation, via salicency maps
-
with this, it has been shown that…
CNNs learns features that act as (1) semantic object & (2) object-part detectors
Inspired by DFF, conduct a similar analysis
-
apply spherical k-means to \(C\)-dim activation vectors
( activation tensor : \(\mathbf{A} \in \mathbb{R}^{N \times C \times H \times W}\) )
-
clustering generates a tensor of cluster membership
( membership : \(\mathbf{U} \in\{0,1\}^{N \times K \times H \times W}\) )
- \(K\) : user-defined
Result
-
at certain layers of generator,
cluster correspond well to semantic objects & parts
-
each pixel in the heatmap is color-coded to indicate cluster index
\(M_{k, c}\) : Contribution of channel \(c\) to semantic cluster \(k\)
-
via cluster memberships, \(\mathbf{U} \in\{0,1\}^{N \times K \times H \times W}\)
-
\(\boldsymbol{M}_{k, c}=\frac{1}{N \dot{H} \dot{W}} \sum_{n, h, w} \mathbf{A}_{n, c, h, w}^{2} \odot \mathbf{U}_{n, k, h, w}\).
-
feature maps of \(\mathbf{A}_{l}\) ~ N(0,1)
\(\rightarrow\) contribution : 0~1
-
(2) Local Editing
a) Style GAN review
-
latent vector \(z\) ~ prior
-
\(z\) is transformed to intermediate latent vector \(\boldsymbol{w} \in \mathbb{W}\)
\(\rightarrow\) show better *disentanglement properties
-
\(\mathbf{A} \in \mathbb{R}^{(C \times H \times W)}\) : input to a convolutional layer
-
\(w\) : alters feature maps, via a per-layer style
-
motivated by style transfer
b) Conditioned Interpolation
Notation
- target image : \(S\)
- reference image : \(R\)
would like to transfer the appearance of a specified local object/part from \(R\) to \(S\)
[ Global transfer ]
-
\(\sigma^{G}=\sigma^{S}+\lambda\left(\sigma^{R}-\sigma^{S}\right)\).
where \(0 \leq \lambda \leq 1\)
[ Selective local editing ]
-
control style interpolation with matrix transfomration
-
\(\sigma^{G}=\sigma^{S}+Q\left(\sigma^{R}-\sigma^{S}\right)\).
-
\(Q\) : diagonal matrix ( where \(q \in[0,1]^{C}\) )
( \(q\) : query vector )
-
c) Choosing the query
best query \(q\) = one that favor channels that..
- affect the ROI (region of interest)
- while ignoring channels that have an effect outside the ROI
[ Simple Approach ]
-
use \(M_{k^{\prime}, c}\)
-
clipping \(\boldsymbol{q}_{c}=\min \left(1, \lambda \boldsymbol{M}_{k^{\prime}, c}\right)\)
where \(\boldsymbol{q}_{c}\) is the \(c\)-th channel element of \(\boldsymbol{q}\),
-
updates all channels at same time
[ Proposed Approach ]
-
sequential approach
-
first set the most relevant channel to the maximum slope of 1,
before raising the slope of the second-most relevant, …
-
solve this by sorting channels based on \(M_{k^{\prime}}\)
& greedily assigning \(q_c=1\) to most relevant channels