TeachAugment : Data Augmentation Optimization using Teacher Knowledge

Abstract
Introduction
Related Work
Data Augmentation Optimization using teacher knowledge
1. Preliminaries
2. TeachAugment
3. Improvement Techniques
Data Augmentation using NN

0. Abstract

Adversarial Data Augmentation strategies :

search augmentation, maximizing task loss
show improvement in model generalization

\(\rightarrow\) but require careful parameter tunining

TeachAugment

propose a DA optimization method based on adversarial strategy
without requiring careful tuning, by leveraging a teacher model

1. Introduction

AutoAugment : requires thousands of GPU

Online DA optimization

alternately update (1) augmentation policies & (2) target network
Advantages
- a) reduce computational costs
- b) simplify the DA pipeline
mostly based on adversarial strategy
- searches augmentation by maximizing task loss for target model
  
  ( = improve model generalization )
- problem : unstable
  - maximizing loss = can be achieved by collapsing the inherent images
  - to avoid collapse ….. regularize augmentation based on prior knowledge
    
    \(\rightarrow\) need lots of tuned parameters!

To alleviate tuning problem …. propose TeachAugment

online DA optimization using teacher knowledge
based on adversarial DA strategy
search augmentation where transformed image is RECOGNIZABLE for a TEACHER MODEL
do not require priors / hyperparameters

Propose DA using NN that represent 2 functions

(1) geometric augmentation
(2) color augmentation
why NN?
- a) update using GD
- B) reduce # of functions in the search space to “2”

Contributions

online DA ( w.o careful parameter tuning )
DA using NN

Conventional DA :

geometric & color transformation are widely used!
using DA, improvements are made on…
- (1) image recognition accuracy
- (2) un/semi-supervised representation learning
usually improves model generalization,

but sometimes hurts performance, or induce unexpected biases

\(\rightarrow\) need to find effective augmentation policies

ex) AutoAugment

\(\rightarrow\) automatically search for effective data augmentation

Data Augmentation search

category 1) proxy task based
category 2) proxy task free

Proxy Task based

search DA strategies on proxy tasks, that uses subsets of data and/or small models to reduce computational costs
thus, might be SUB-OPTIMAL

Proxy Task free

DIRECTLY search DA strategies on the target network with all data
thus, potentially OPTIMAL
Ex) RandAugment, Trivial Augment
- randomize the parameters search & reduce the size of search space
Ex) Adversarial AutoAugment, PointAugment
- update augmentation policies in an online manner
  
  ( = alternately update target network & augmentation policies )

This paper focus on PROXY TASK FREE methods , updating policies in an ONLINE manner

reason 1) can directly search DA strategies on target network with all data
reason 2) unify the search & training process

3. Data Augmentation Optimization using teacher knowledge

(1) Preliminaries

Notation

dataset : \(x \sim \mathcal{X}\)
\(a_{\phi}\) : augmentation function, parameterized by \(\phi\)
\(f_\theta\) : target network
- fed into target network : \(f_{\theta}\left(a_{\phi}(x)\right)\)

Training procedure : \(\min _{\theta} \mathbb{E}_{x \sim \mathcal{X}} L\left(f_{\theta}\left(a_{\phi}(x)\right)\right.\).

( Adversarial DA : searches \(\phi\), maximizing the loss )

\(\rightarrow\) \(\max _{\phi} \min _{\theta} \mathbb{E}_{x \sim \mathcal{X}} L\left(f_{\theta}\left(a_{\phi}(x)\right)\right.\)

alternately updating \(\phi\) & \(\theta\)

PROBLEM?

maximizing the loss, w.r.t \(\phi\) can be just obtained by collapsing the inherent meanings of \(x\)

\(\rightarrow\) solution : utilize teacher model to avoid the collapse !

(2) TeachAugment

Notation :

\(f_{\hat{\theta}}\) : teacher model
\(f_{\theta}\) : target model

Suggest 2 types of teacher model

(1) pre-trained teacher
(2) EMA teacher

( = weights are updated as an exponential moving average of target model’s weights )

Proposed Objective :

\(\max _{\phi} \min _{\theta} \mathbb{E}_{x \sim \mathcal{X}}\left[L\left(f_{\theta}\left(a_{\phi}(x)\right)\right)-L\left(f_{\hat{\theta}}\left(a_{\phi}(x)\right)\right)\right]\).
- maximize for TARGET model
- minimize for TEACHER model
avoids collapsing the inherent meanings of images

( \(\because\) if not, loss for TEACHER model will explode!! )

objective is solved by *ALTERNATIVELY updating the augmentation function & target model
process
- step 1) update TARGET network for \(n_{inner}\) steps
- step 2) update AUGMENTATION function

(3) Improvement Techniques

training procedure : similar to GANs & actor-critic in RL

( lots of strategies to mitigate instabilities & improve training )

4. Data Augmentation using NN

Two NNs

(1) color augmentation model : \(c_{\phi_{c}}\)
(2) geometric augmentation model : \(g_{\phi_{g}}\)
(1) + (2) = \(a_{\phi}=g_{\phi_{g}} \circ c_{\phi_{c}}\)
- parameters : \(\phi=\left\{\phi_{c}, \phi_{g}\right\}\)

input image : \(x \in \mathbb{R}^{M \times 3}\)

( \(M\) : number of pixels )
data augmentation probability
- color : \(p_{c} \in(0,1)\)
- gemoetric : \(p_{g} \in(0,1)\)

Data Augmentation

[ color ] \(\tilde{x}_{i}=t\left(\alpha_{i} \odot x_{i}+\beta_{i}\right),\left(\alpha_{i}, \beta_{i}\right)=c_{\phi_{c}}\left(x_{i}, z, c\right)\)
[ geometric ] \(\hat{x}=\operatorname{Affine}(\tilde{x}, A+I), A=g_{\phi_{g}}(z, c)\)
- affine transformation of \(\tilde{x}\) with a parameter \(A+I\)
( + also learn the probabilities \(p_g\) & \(p_c\) )

Twitter Facebook LinkedIn

(paper 13) TeachAugment

Seunghan Lee

TeachAugment : Data Augmentation Optimization using Teacher Knowledge

Contents

0. Abstract

TeachAugment

1. Introduction

Contributions

Proxy Task based

Proxy Task free

3. Data Augmentation Optimization using teacher knowledge

(1) Preliminaries

(2) TeachAugment

(3) Improvement Techniques

4. Data Augmentation using NN

You May Also Enjoy

(paper 13) TeachAugment

Seunghan Lee

TeachAugment : Data Augmentation Optimization using Teacher Knowledge

Contents

0. Abstract

TeachAugment

1. Introduction

Contributions

2. Related Work

Proxy Task based

Proxy Task free

3. Data Augmentation Optimization using teacher knowledge

(1) Preliminaries

(2) TeachAugment

(3) Improvement Techniques

4. Data Augmentation using NN

You May Also Enjoy