(paper) Domain-Adversarial Training of Neural Networks

(2015,4320)

1 minute read

Seunghan Lee

Seunghan Lee

Deep Learning, Data Science, Statistics

Domain-Adversarial Training of Neural Networks ( 2015, 4320 )

Contents

Abstract
Introduction
Domain Adaptation
DANN (Domain Adaptation NN)

0. Abstract

data at train & test time : “different distn”

train = SOURCE domain
- labeled data
test = TARGET domain
- unlabeled ( or few-labeled ) data

Features :

1) “discriminative” for main learning task ( on source domain )
2) “indiscriminative” w.r.t shift between domains

Example ) used in…

1) document sentiment analysis
2) image classification

1. Introduction

Costly to generate labeled data!

Domain Adaptation (DA)

learning a discriminative predictor, in the presence of “shift between train/test distributions”
mapping between domains, where target domain data are..
- 1) fully unlabeled : UNSUPERVISED DA
- 2) few-labeled : SEMI-SUPERVISED DA

This paper focus on learning features, that combine

(1) discriminativeness
(2) domain invariance

by using..

1) label predictor
- MINIMIZE loss of label classifier
2) domain classifier
- MAXIMIZE loss of domain classifier
( works adversarially )

\(\rightarrow\) encourages “domain invariant” features

2. Domain Adaptation

Notation

\(X\) : input space
\(Y=\{0,1, \ldots, L-1\}\) : set of \(L\) possible labels
2 different distributions over \(X \times Y\)
- 1) Source domain : \(\mathcal{D}_{\mathrm{S}}\)
- 2) Target domain : \(\mathcal{D}_{\mathrm{T}}\)

Unsupervised DA

given…
- 1) \(n\) labeled source samples … \(S \sim \mathcal{D}_{\mathrm{S}}\)
- 2) \(n^{'}\) unlabeled target samples … \(T \sim \mathcal{D}_{\mathrm{T}}^{X}\)
  
  ( \(\mathcal{D}_{\mathrm{T}}^{X}\) = marginal distn of \(\mathcal{D}_{\mathrm{T}}\) over \(X\) )
notation
- \(S=\left\{\left(\mathbf{x}_{i}, y_{i}\right)\right\}_{i=1}^{n} \sim\left(\mathcal{D}_{\mathrm{S}}\right)^{n}\).
- \(T=\left\{\mathbf{x}_{i}\right\}_{i=n+1}^{N} \sim\left(\mathcal{D}_{\mathrm{T}}^{X}\right)^{n^{\prime}}\).
- total # of data : \(N = n+n^{'}\).
Goal
- build a classifier \(\eta: X \rightarrow Y\),
  
  with a low target risk : \(R_{\mathcal{D}_{\mathrm{T}}}(\eta)=\operatorname{Pr}_{(\mathrm{x}, y) \sim \mathcal{D}_{\mathrm{T}}}(\eta(\mathrm{x}) \neq y)\)

(1) Domain Divergence

( several notions of distance have been proposed for DA )

Goal :

minimize “target domain error”
upper bound of target domain error

= “source domain error” + “domain divergence”

\(\rightarrow\) (1) classify well in source domain & (2) minimize domain divergence

a) \(\mathcal{H}\) - divergence

\(d_{\mathcal{H}}\left(\mathcal{D}_{\mathrm{S}}^{X}, \mathcal{D}_{\mathrm{T}}^{X}\right)=2 \sup _{\eta \in \mathcal{H}} \mid \underset{\mathbf{x} \sim \mathcal{D}_{\mathrm{S}}^{X}}{\operatorname{Pr}}[\eta(\mathbf{x})=1]-\underset{\mathbf{x} \sim \mathcal{D}_{\mathrm{T}}^{X}}{\operatorname{Pr}}[\eta(\mathbf{x})=1] \mid\).

b) empirical \(\mathcal{H}\) - divergence

\(\hat{d}_{\mathcal{H}}(S, T)=2\left(1-\min _{\eta \in \mathcal{H}}\left[\frac{1}{n} \sum_{i=1}^{n} I\left[\eta\left(\mathbf{x}_{i}\right)=0\right]+\frac{1}{n^{\prime}} \sum_{i=n+1}^{N} I\left[\eta\left(\mathbf{x}_{i}\right)=1\right]\right]\right)\).

3. DANN (Domain Adaptation NN)

Twitter Facebook LinkedIn

You May Also Enjoy

8 minute read

2 minute read

5 minute read

14 minute read