A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts

https://arxiv.org/pdf/2303.15361.pdf

0. Abstract

Robust model = generalize well to test sample

Problem) performance drop due to UNKNOWN test distribution
Solution) TTA (Test-time adaptation)

TTA (Test-time adaptation)

Adapt a pre-trainedd model to unlabeled data DURING TESTING
Three categories
- (1) SFDA (Source-free domain adaptation = Test-time domain adaptation)
- (2) TTBA (Test-time batch adaptation)
- (3) OTTA (Online test-time adaptation)
- (4) TTPA (Test-time prior adaptation)

1. Introduction

Traditional ML: assume train distn = test distn

\(\rightarrow\) Not true in real world

To solve this issue…

(1) DG (Domain Generalization)
- Inductive setting ( only access to train data during training )
- Train a model using data from (one or more) source domains
- Inference on OOD target domain
(2) DA (Domain Adaptation)
- Transductive setting ( have access to both train & test data for inference )
- Leverage knowledge from a labeled source \(\rightarrow\) unlabeled target domain
(3) TTA (Test-time Adaptation) \(\rightarrow\) main focus

TTA > (DG, DA)

TTA vs. DG
- DG) operates only on training phase
- TTA) can access test data from the target domain during test phase
  
  ( adaptation with the availability to test data )
TTA vs. DA
- DA) requires access to both labeled source & unlabeled target
  - not suitable for privacy-sensitive applications
- TTA) only requires access to the pretrained model from the source domain
  - more secure & practical

Categories of TTA

( Notation: \(m\) unlabeled minibatches \(\left\{b_1, \cdots, b_m\right\}\) at test time )

(1) SFDA (Source-free domain adaptation = Test-time domain adaptation)
(2) TTBA (Test-time batch adaptation)
(3) OTTA (Online test-time adaptation)
(4) TTPA (Test-time prior adaptation)

(1) SFDA
- utilizes all \(m\) test batches for adaptation before generating final predictions
(2) TTBA
- individually adapts the pre-trained model to one or a few instances
  
  ( = predictions of each mini-batch are independent of the predictions for the other mini-batches )
(3) OTTA
- adapts the pre-trained model to the target data \(\left\{b_1, \cdots, b_m\right\}\) in an online manner
  
  ( = each mini-batch can only be observed only once )
(Not main focus) (4) TTPA (Test-time prior adaptation)
- (1)~(3) : Data shift ( = covariate shift = \(X\) shift )
- (4) : Label shift ( = \(Y\) shift )

Outlines

Concept of TTA & view four topics ( SFDA, TTBA, OTTA, TTPA )
Advanced algorithms of these topics

(1) DA & DG

Domain Shift

(1) Covariate (\(X\)) shift
(2) Label (\(Y\)) shift

DA & DG: Both are transfer learning techniques

DA: Domain Adaptaiton
DG: Domain Generralization

DA vs. DG:

DG: inductive
- Train model using (source) train data & Inference (target) test data
DA: transductive
- Inference using both (source) train & (target) test data
  - Example of transductive model) KNN
- 4 categories
  - a) Input-level translation
  - b) Feature-level alignment
  - c) Output-level regularization
  - d) Prior estimation

DA method for SFDA

SFDA problem can be solved using DA methods,

if it is possible to generate TRAINING DATA from the source model

(1) One-shot DA
- Adapting to only “ONE unlabeled target” instance & “source” data
(2) Online DA
- Similar to One-shot DA, but streaming target data ( = deleted after adaptation )
(3) Federated DA
- Acquires feedback from the target data to source data

(2) Hypotheseis Transfer Learning (HTL)

Pretrained models retain infformation about previously encountered tasks

\(\rightarrow\) Still require a certain number of labeled data in target domain

(3) Continual Learning & Meta-Learning

a) Continual Learning (CL)

Learning a model for mulitple tasks in a SEQUENCE
Knowledge from previous tasks is gradually accumulated
Three scenarios
- (1) Task-incremental
- (2) Domain-incremental
- (3) Class-incremental
Three categories
- (1) Rehearsal-based
- (2) Parameter-based regularization
- (3) Generative-based
- (1) vs. (2,3)
  - (1) Access to training data of previous task (O)
  - (2,3) Access to training data of previous task (X)

b) Meta learning

( Meta Learning = Learning to learn )

Similar to CL
But with training data randomly drawn from a task distribution

& test data are tasks with few examples
Offers a solution for TTA w/o incorporation of test data in the meta-training stage

(4) Data-Free Knowledge Distillation

Knowledge Distillation (KD)

Knowledge from teacher model \(\rightarrow\) student model
To address privacy concerns … Data-Free KD is proposed

Two categories of Data-Free KD

(1) Adversarial training
- Generates worst-case synthetic samples for student learning
(2) Data prior matching
- Generates synthetic samples that satisfies certain priors
  - i.e.) class prior, batch-norm statistics

Compared with TTA…

Data-Free KD focues on
- transfer between models (O)
- transfer between datasets (X)

(5) Self-supervised & Semi-supervised Learning

Self-supervised Learning

Learn from unlabeled data

Semi-supervised Learning

Learning from both labeled & unlabeled data
Common objective = (1) + (2)
- (1) Supervised Loss ( calculated with labeled data )
- (2) Unsupervised Loss ( calculated with labeled + unlabeled data )
Depending on Loss (2), can be divieded into ..
- a) Self-training
- b) Consistency regularization
- c) Model variations
( https://seunghan96.github.io/ssl/SemiSL_intro/ )

Self- & Semi- SL can also be incorporated to unsupervisedly update the pretrained model for TTA tasks

3. Source-Free Domain Adaptation (SFDA)

(1) Problem Definition

a) Domain

Domain \(\mathcal{D}\) is \(p(x, y)\) defined on space \(\mathcal{X} \times \mathcal{Y}\),

\(x \in \mathcal{X}\) and \(y \in \mathcal{Y}\) denote the input & output

Notation

Target domain \(p_{\mathcal{T}}(x, y)\)
- domain of our interest
- unlabeled data
Source domain \(p_{\mathcal{S}}(x, y)\)
- labeled data
( Unless otherwise specified, \(\mathcal{Y}\) is a \(C\)-cardinality label set )

b) Settings

Settings

Labeled source domain \(\mathcal{D}_{\mathcal{S}}=\left\{\left(x_1, y_1\right), \ldots,\left(x_{n_s}, y_{n_s}\right)\right\}\)
Unlabeled target domain \(\mathcal{D}_{\mathcal{T}}=\left\{x_1, \ldots, x_{n_t}\right\}\)
Data distribution shifts: \(\mathcal{X}_{\mathcal{S}}=\mathcal{X}_{\mathcal{T}}, p_{\mathcal{S}}(x) \neq p_{\mathcal{T}}(x)\),

including the covariate shift assumption \(\left(p_{\mathcal{S}}(y \mid x)=\right.\) \(p_{\mathcal{T}}(y \mid x)\) ).

Unsupervised domain adaptation (UDA)

= leverage knowledge in \(\mathcal{D}_{\mathcal{S}}\) to help infer the label of each target sample in \(\mathcal{D}_{\mathcal{T}}\).

Three scenarios

(1) Source classifier with accessible models and parameters
(2) Source classifier as a black-box model
(3) Source class means as representatives.

\(\rightarrow\) Utilizes all the test data to adjust the classifier learned from the training data

c) Source-free Domain Adaptation (SFDA)

Notation

Pretrained classifier \(f_{\mathcal{S}}: \mathcal{X}_{\mathcal{S}} \rightarrow \mathcal{Y}_{\mathcal{S}}\) on the \(\mathcal{D}_{\mathcal{S}}\)
Unlabeled target domain \(\mathcal{D}_{\mathcal{T}}\),

SFDA:

aims to leverage the labeled knowledge implied in \(f_{\mathcal{S}}\)

to infer labels of all the samples in \(\mathcal{D}_{\mathcal{T}}\),

in a transductive learning manner.
All test data (target data) are required to be seen during adaptation.

(2) Taxonomy on SFDA algorithm

a) Pseudo-labeling

Centroid-based pseudo labels
Neighbor-based pseudo labels
Complementary pseudo labels
Optimization-based pseudo labels
Ensemble-based pseudo labels

b) Consistency Training

Consistency under data variations
Consistency under model variations
Consistency under data & model variations
Miscellaneous consistency regularization

c) Clustering-based Training

Entropy minimization
Mutual-information maximization
Explicit clustering

d) Source Distribution Estimation

Data generation
Data translation
Data selection
Feature estimation
Virtual doomain alignment

e) Others

(3.2.5) Self-superviesed Learning
(3.2.6) Optimization Strategy
(3.2.7) Beyond Vanilla Source Model

(3) Learning Scenarios of SFDA algorithms

a) Closed-set vs. Open-set

( Most existing SFDA methods focus on a closed-set scenario )

Closed- set: \(\mathcal{C}_s=\mathcal{C}_t\)
Partial-set: \(\mathcal{C}_t \subset \mathcal{C}_s\)
Open-set: \(\mathcal{C}_s \subset \mathcal{C}_t\)
Open-partial-set: \(\left(\mathcal{C}_s \backslash \mathcal{C}_t \neq \emptyset, \mathcal{C}_t \backslash \mathcal{C}_s \neq \emptyset\right.\))

Several recent studies even develop a unified framework for both open-set and open-partial-set scenarios.

b) Single-source vs. Multi-source

c) Single-target vs. Multi-target

Multi-target DA

Multiple unlabeled target domains exist at the same time
Domain label of each target data may be even unknown
Each target domain may come in a streaming manner

\(\rightarrow\) model is successively adapted to different target domains

d) Unsupervised vs. Semi-supervised

e) White-box vs. Black-box

f) Active SFDA

Few target data can be selected to be labeled by human annotators

e) Imbalanced SFDA

ex) ISFDA: class-imbalanced SFDA

source & tareget label distns are different & extremely imbalanced

Twitter Facebook LinkedIn

A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts

Seunghan Lee

A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts

0. Abstract

TTA (Test-time adaptation)

1. Introduction

Categories of TTA

Outlines

(1) DA & DG

DA method for SFDA

(2) Hypotheseis Transfer Learning (HTL)

(3) Continual Learning & Meta-Learning

a) Continual Learning (CL)

b) Meta learning

(4) Data-Free Knowledge Distillation

(5) Self-supervised & Semi-supervised Learning

3. Source-Free Domain Adaptation (SFDA)

(1) Problem Definition

a) Domain

b) Settings

c) Source-free Domain Adaptation (SFDA)

(2) Taxonomy on SFDA algorithm

a) Pseudo-labeling

b) Consistency Training

c) Clustering-based Training

d) Source Distribution Estimation

e) Others

(3) Learning Scenarios of SFDA algorithms

a) Closed-set vs. Open-set

b) Single-source vs. Multi-source

c) Single-target vs. Multi-target

d) Unsupervised vs. Semi-supervised

e) White-box vs. Black-box

f) Active SFDA

e) Imbalanced SFDA

You May Also Enjoy

A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts

Seunghan Lee

A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts

0. Abstract

TTA (Test-time adaptation)

1. Introduction

Categories of TTA

Outlines

2. Related Research Topics

(1) DA & DG

DA method for SFDA

(2) Hypotheseis Transfer Learning (HTL)

(3) Continual Learning & Meta-Learning

a) Continual Learning (CL)

b) Meta learning

(4) Data-Free Knowledge Distillation

(5) Self-supervised & Semi-supervised Learning

3. Source-Free Domain Adaptation (SFDA)

(1) Problem Definition

a) Domain

b) Settings

c) Source-free Domain Adaptation (SFDA)

(2) Taxonomy on SFDA algorithm

a) Pseudo-labeling

b) Consistency Training

c) Clustering-based Training

d) Source Distribution Estimation

e) Others

(3) Learning Scenarios of SFDA algorithms

a) Closed-set vs. Open-set

b) Single-source vs. Multi-source

c) Single-target vs. Multi-target

d) Unsupervised vs. Semi-supervised

e) White-box vs. Black-box

f) Active SFDA

e) Imbalanced SFDA

You May Also Enjoy