A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts
https://arxiv.org/pdf/2303.15361.pdf
0. Abstract
Robust model = generalize well to test sample
- 
    Problem) performance drop due to UNKNOWN test distribution 
- 
    Solution) TTA (Test-time adaptation) 
TTA (Test-time adaptation)
- Adapt a pre-trainedd model to unlabeled data DURING TESTING
- Three categories
    - (1) SFDA (Source-free domain adaptation = Test-time domain adaptation)
- (2) TTBA (Test-time batch adaptation)
- (3) OTTA (Online test-time adaptation)
- (4) TTPA (Test-time prior adaptation)
 
1. Introduction
Traditional ML: assume train distn = test distn
\(\rightarrow\) Not true in real world
To solve this issue…
- (1) DG (Domain Generalization)
    - Inductive setting ( only access to train data during training )
- Train a model using data from (one or more) source domains
- Inference on OOD target domain
 
- (2) DA (Domain Adaptation)
    - Transductive setting ( have access to both train & test data for inference )
- Leverage knowledge from a labeled source \(\rightarrow\) unlabeled target domain
 
- (3) TTA (Test-time Adaptation) \(\rightarrow\) main focus
TTA > (DG, DA)
- 
    TTA vs. DG - 
        DG) operates only on training phase 
- 
        TTA) can access test data from the target domain during test phase ( adaptation with the availability to test data ) 
 
- 
        
- 
    TTA vs. DA - DA) requires access to both labeled source & unlabeled target
        - not suitable for privacy-sensitive applications
 
- TTA) only requires access to the pretrained model from the source domain
        - more secure & practical
 
 
- DA) requires access to both labeled source & unlabeled target
        
Categories of TTA
( Notation: \(m\) unlabeled minibatches \(\left\{b_1, \cdots, b_m\right\}\) at test time )
- (1) SFDA (Source-free domain adaptation = Test-time domain adaptation)
- (2) TTBA (Test-time batch adaptation)
- (3) OTTA (Online test-time adaptation)
- (4) TTPA (Test-time prior adaptation)
- 
    (1) SFDA - utilizes all \(m\) test batches for adaptation before generating final predictions
 
- 
    (2) TTBA - 
        individually adapts the pre-trained model to one or a few instances ( = predictions of each mini-batch are independent of the predictions for the other mini-batches ) 
 
- 
        
- 
    (3) OTTA - 
        adapts the pre-trained model to the target data \(\left\{b_1, \cdots, b_m\right\}\) in an online manner ( = each mini-batch can only be observed only once ) 
 
- 
        
- 
    (Not main focus) (4) TTPA (Test-time prior adaptation) - (1)~(3) : Data shift ( = covariate shift = \(X\) shift )
- (4) : Label shift ( = \(Y\) shift )
 
Outlines
- 
    Concept of TTA & view four topics ( SFDA, TTBA, OTTA, TTPA ) 
- 
    Advanced algorithms of these topics 
2. Related Research Topics
(1) DA & DG
Domain Shift
- (1) Covariate (\(X\)) shift
- (2) Label (\(Y\)) shift
DA & DG: Both are transfer learning techniques
- 
    DA: Domain Adaptaiton 
- 
    DG: Domain Generralization 
DA vs. DG:
- DG: inductive
    - Train model using (source) train data & Inference (target) test data
 
- DA: transductive
    - Inference using both (source) train & (target) test data
        - Example of transductive model) KNN
 
- 4 categories
        - a) Input-level translation
- b) Feature-level alignment
- c) Output-level regularization
- d) Prior estimation
 
 
- Inference using both (source) train & (target) test data
        
DA method for SFDA
SFDA problem can be solved using DA methods,
if it is possible to generate TRAINING DATA from the source model
- (1) One-shot DA
    - Adapting to only “ONE unlabeled target” instance & “source” data
 
- (2) Online DA
    - Similar to One-shot DA, but streaming target data ( = deleted after adaptation )
 
- (3) Federated DA
    - Acquires feedback from the target data to source data
 
(2) Hypotheseis Transfer Learning (HTL)
Pretrained models retain infformation about previously encountered tasks
\(\rightarrow\) Still require a certain number of labeled data in target domain
(3) Continual Learning & Meta-Learning
a) Continual Learning (CL)
- Learning a model for mulitple tasks in a SEQUENCE
- Knowledge from previous tasks is gradually accumulated
- Three scenarios
    - (1) Task-incremental
- (2) Domain-incremental
- (3) Class-incremental
 
- Three categories
    - (1) Rehearsal-based
- (2) Parameter-based regularization
- (3) Generative-based
- (1) vs. (2,3)
        - (1) Access to training data of previous task (O)
- (2,3) Access to training data of previous task (X)
 
 
b) Meta learning
( Meta Learning = Learning to learn )
- 
    Similar to CL 
- 
    But with training data randomly drawn from a task distribution & test data are tasks with few examples 
- 
    Offers a solution for TTA w/o incorporation of test data in the meta-training stage 
(4) Data-Free Knowledge Distillation
Knowledge Distillation (KD)
- Knowledge from teacher model \(\rightarrow\) student model
- To address privacy concerns … Data-Free KD is proposed
Two categories of Data-Free KD
- (1) Adversarial training
    - Generates worst-case synthetic samples for student learning
 
- (2) Data prior matching
    - Generates synthetic samples that satisfies certain priors
        - i.e.) class prior, batch-norm statistics
 
 
- Generates synthetic samples that satisfies certain priors
        
Compared with TTA…
- Data-Free KD focues on
    - transfer between models (O)
- transfer between datasets (X)
 
(5) Self-supervised & Semi-supervised Learning
Self-supervised Learning
- Learn from unlabeled data
Semi-supervised Learning
- 
    Learning from both labeled & unlabeled data 
- Common objective = (1) + (2)
    - (1) Supervised Loss ( calculated with labeled data )
- (2) Unsupervised Loss ( calculated with labeled + unlabeled data )
 
- 
    Depending on Loss (2), can be divieded into .. - a) Self-training
- b) Consistency regularization
- c) Model variations
 ( https://seunghan96.github.io/ssl/SemiSL_intro/ ) 
Self- & Semi- SL can also be incorporated to unsupervisedly update the pretrained model for TTA tasks
3. Source-Free Domain Adaptation (SFDA)
(1) Problem Definition
a) Domain
Domain \(\mathcal{D}\) is \(p(x, y)\) defined on space \(\mathcal{X} \times \mathcal{Y}\),
- \(x \in \mathcal{X}\) and \(y \in \mathcal{Y}\) denote the input & output
Notation
- 
    Target domain \(p_{\mathcal{T}}(x, y)\) - 
        domain of our interest 
- 
        unlabeled data 
 
- 
        
- Source domain \(p_{\mathcal{S}}(x, y)\)
    - labeled data
 
- ( Unless otherwise specified, \(\mathcal{Y}\) is a \(C\)-cardinality label set )
b) Settings
Settings
- Labeled source domain \(\mathcal{D}_{\mathcal{S}}=\left\{\left(x_1, y_1\right), \ldots,\left(x_{n_s}, y_{n_s}\right)\right\}\)
- 
    Unlabeled target domain \(\mathcal{D}_{\mathcal{T}}=\left\{x_1, \ldots, x_{n_t}\right\}\) 
- 
    Data distribution shifts: \(\mathcal{X}_{\mathcal{S}}=\mathcal{X}_{\mathcal{T}}, p_{\mathcal{S}}(x) \neq p_{\mathcal{T}}(x)\), including the covariate shift assumption \(\left(p_{\mathcal{S}}(y \mid x)=\right.\) \(p_{\mathcal{T}}(y \mid x)\) ). 
Unsupervised domain adaptation (UDA)
= leverage knowledge in \(\mathcal{D}_{\mathcal{S}}\) to help infer the label of each target sample in \(\mathcal{D}_{\mathcal{T}}\).
Three scenarios
- (1) Source classifier with accessible models and parameters
- (2) Source classifier as a black-box model
- (3) Source class means as representatives.
\(\rightarrow\) Utilizes all the test data to adjust the classifier learned from the training data
c) Source-free Domain Adaptation (SFDA)
Notation
- Pretrained classifier \(f_{\mathcal{S}}: \mathcal{X}_{\mathcal{S}} \rightarrow \mathcal{Y}_{\mathcal{S}}\) on the \(\mathcal{D}_{\mathcal{S}}\)
- Unlabeled target domain \(\mathcal{D}_{\mathcal{T}}\),
SFDA:
- 
    aims to leverage the labeled knowledge implied in \(f_{\mathcal{S}}\) to infer labels of all the samples in \(\mathcal{D}_{\mathcal{T}}\), in a transductive learning manner. 
- 
    All test data (target data) are required to be seen during adaptation. 
(2) Taxonomy on SFDA algorithm

a) Pseudo-labeling
- Centroid-based pseudo labels
- Neighbor-based pseudo labels
- Complementary pseudo labels
- Optimization-based pseudo labels
- Ensemble-based pseudo labels
b) Consistency Training
- Consistency under data variations
- Consistency under model variations
- 
    Consistency under data & model variations 
- Miscellaneous consistency regularization
c) Clustering-based Training
- Entropy minimization
- Mutual-information maximization
- Explicit clustering
d) Source Distribution Estimation
- Data generation
- Data translation
- Data selection
- Feature estimation
- Virtual doomain alignment
e) Others
- 
    (3.2.5) Self-superviesed Learning 
- (3.2.6) Optimization Strategy
- (3.2.7) Beyond Vanilla Source Model
(3) Learning Scenarios of SFDA algorithms
a) Closed-set vs. Open-set
( Most existing SFDA methods focus on a closed-set scenario )
- Closed- set: \(\mathcal{C}_s=\mathcal{C}_t\)
- Partial-set: \(\mathcal{C}_t \subset \mathcal{C}_s\)
- Open-set: \(\mathcal{C}_s \subset \mathcal{C}_t\)
- Open-partial-set: \(\left(\mathcal{C}_s \backslash \mathcal{C}_t \neq \emptyset, \mathcal{C}_t \backslash \mathcal{C}_s \neq \emptyset\right.\))
Several recent studies even develop a unified framework for both open-set and open-partial-set scenarios.
b) Single-source vs. Multi-source
c) Single-target vs. Multi-target
Multi-target DA
- 
    Multiple unlabeled target domains exist at the same time 
- 
    Domain label of each target data may be even unknown 
- 
    Each target domain may come in a streaming manner \(\rightarrow\) model is successively adapted to different target domains 
d) Unsupervised vs. Semi-supervised
e) White-box vs. Black-box
f) Active SFDA
Few target data can be selected to be labeled by human annotators
e) Imbalanced SFDA
ex) ISFDA: class-imbalanced SFDA
- source & tareget label distns are different & extremely imbalanced
