[Paper Review] 28. StarGAN : Unified GAN for Multi-Domain Image-to-Image translation

Abstract
Introduction
StarGAN
1. Multi-domain Image-to-Image Translation
2. Training with Multiple Datasets

0. Abstract

limited scalability and robustness in handling more than 2 domains

$\rightarrow$ propose “StarGAN”

novel & scalable approach, that can perform image-to-image translations for multiple domains using only a SINGLE model

1. Introduction

Definition

1) attribute :
- ex) hair color, gender, age
2) attribute value :
- ex) black/bond/brown for hair color, male/female for gender
3) domain :
- set of images sharing the same attribute values
- ex) images of women

Previous methods :

Mapping among $k$ domains, $k(k-1)$ generators have to be trained

Propose “StarGAN”

instead of learning fixed translation,

takes in as inputs “both image & domain information”

and learns to “flexibly translate the image into the corresponding domain”
use label ( binary / one hot vector ) to represent domain information
introduce simple & effective approach,

that enables joint training between domains of “different datasets”

by adding a “mask vector” to the domain label

$\rightarrow$ ignore unknown labels, and focus on the label provided by particular dataset

2. StarGAN

address multi-domain image-to-image translation
discuss how StarGAN incorporates multiple datasets, containing different label sets

1) Multi-domain Image-to-Image Translation

Notation

target domain label : $c$
original domain : $c’$

training $G$ …. ( $G(x,c) \rightarrow y$ )

translate input image $xx$
into output image $y$
conditioned on $c$

training $D$ …. ( $D: x \rightarrow\left{D_{s r c}(x), D_{c l s}(x)\right}$ )

probability distn over sources
probability distn over domain labels

[ Adversarial Loss ]

generated images vs real images
$\mathcal{L}{a d v}= \mathbb{E}{x}\left[\log D_{s r c}(x)\right]+ \mathbb{E}{x, c}\left[\log \left(1-D{s r c}(G(x, c))\right)\right]$.

[ Domain Classification Loss ]

given input image $x$ & target domain label $c$…
goal : translate $x$ to $y$

which is properly classified to class $c$
decompose objective into 2 terms
- 1) domain classification loss of “real images” used to optimize $D$
  - $\mathcal{L}{c l s}^{r}=\mathbb{E}{x, c^{\prime}}\left[-\log D_{c l s}\left(c^{\prime} \mid x\right)\right]$.
- 2) domain classification loss of “fake images” used to optimize $G$
  - $\mathcal{L}{c l s}^{f}=\mathbb{E}{x, c}\left[-\log D_{c l s}(c \mid G(x, c))\right]$.

[ Reconstruction Loss ]

apply a cycle consistency loss to generator
$\mathcal{L}{r e c}=\mathbb{E}{x, c, c^{\prime}}\left[\left|x-G\left(G(x, c), c^{\prime}\right)\right|_{1}\right]$.

[ Full Objective ]

Objective functions to optimize $G$ and $D$

$\mathcal{L}{D}=-\mathcal{L}{a d v}+\lambda_{c l s} \mathcal{L}_{c l s}^{r}$.
$\mathcal{L}{G}=\mathcal{L}{a d v}+\lambda_{c l s} \mathcal{L}{c l s}^{f}+\lambda{r e c} \mathcal{L}_{r e c}$.

2) Training with Multiple Datasets

simultaneously incorporates “multiple datasets, containing different types of labels”

so that StarGAN can control all the labels at the test phase!

[ Mask Vector ]

introduce mask vector $m$

allows StarGAN to ignore unspecified labels
define a unified version of label as a vector
- $\tilde{c}=\left[c_{1}, \ldots, c_{n}, m\right]$.
if use two datasets ( CelebA & RaFD datasets ) : $n=2$

vector of the known label $c_i$ : can be either

1) Binary Vector ( for binary attributes )
2) Categorical Attributes ( for categorical attributes )

Twitter Facebook LinkedIn

[Paper Review] 28.(i2i translation) StarGAN ; Unified GAN for Multi-Domain Image-to-Image translation

Seunghan Lee