Contrastive Learning for Unsupervised Domain Adaptation of Time Series

Abstract
Introduction
Related Works
1. Unsupervised Domain Adaptation (UDA)
2. UDA for TS
Problem Definition
Proposed CLUDA Framework
1. Architecture
2. Adversarial Training for UDA
3. Capturing Contextual Representations
4. Aligning the Contextual Representation Across Domains
Experimental Setups
Results

0. Abstract

Unsupervised domain adaptation (UDA)

learn model using LABELED source domain that performs well on UNLABELED target domain

CLUDA

develop a novel framework for UDA of TS data

propose a CL framework in MTS

preserve label information for the prediction task.

capture the variation in the contextual representations between SOURCE & TARGET domain

via a custom nearest-neighbor CL

First framework to learn domain-invariant representation for UDA of TS data

1. Introduction

Need for effective domain adaptation of TS, to learn **domain-invariant representations **

Unsupervised domain adaptation (UDA)

Few works have focused on UDA of TS

Previous works

utilize a tailored feature extractor to capture temporal dynamics of MTS via RNNs, LSTM, CNNs …
minimize the “domain discrepancy” of learned features via ..
- (1) adversarial-based methods (Purushotham et al., 2017; Wilson et al., 2020; 2021; Jin et al., 2022)
- (2) restrictions through metric-based methods (Cai et al., 2021; Liu \& Xue, 2021).

Transfer Learning

pre-train a NN via CL …

to capture the contextual representation of TS from UN-LABELED source domain

BUT … operate on a LABELED target domain ( \(\neq\) UDA )

No method for UDA of TS

\(\rightarrow\) propose a novel framework for UDA of TS based on CL ( = CLUDA )

Components of CLUDA

(1) Adversarial training

to minimize the domain discrepancy between source & target domains

(2) Semantic-preserving augmentations

(3) Custom nearest-neighborhood CL

further align the contextual representation across source and target domains

Experiments

Datasets 1

WISDM (Kwapisz et al., 2011)
HAR (Anguita et al., 2013)
HHAR (Stisen et al., 2015)

\(\rightarrow\) CLUDA leads to increasing accuracy on target domains by an important margin.

Datasets 2

( two largescale real-world medical datasets )

MIMIC-IV (Johnson et al., 2020)
AmsterdamUMCdb (Thoral et al., 2021)

Contributions

Propose CLUDA

( unsupervised domain adaptation of time series )
Capture domain-invariant, contextual representations in CLUDA
- via a custom approach combining (1) nearest-neighborhood CL & (2) adversarial learning

(1) Unsupervised Domain Adaptation (UDA)

LABELED source \(\rightarrow\) UNLABELED target

Typically aim to minimize domain discrepancy

3 Categories

(1) Adversarial-based

reduce domain discrepancy via domain discriminator networks

( force to learn domain-invariant feature representations )

(2) Contrastive

via minimization of CL loss, aims to bring source & target embeddings of the same class
labels are UNKNONWN … rely on pseudo-labels

(3) Metric-based

(2) UDA for TS

a) Variational recurrent adversarial deep domain adaptation (VRADA)

first UDA method for MTS that uses adversarial learning for reducing domain discrepancy.

Feature extractor = variational RNN
trains the (1) classifier and the (2) domain discriminator (adversarially)

b) Convolutional deep domain adaptation for time series (CoDATS)

VRADA + ( Feature extractor = CNN )

c) Time series sparse associative structure alignment (TS-SASA)

Metric-based method

Intra-variables & Inter-variables attention mechanisms are aligned between the domains via the minimization of maximum mean discrepancy (MMD).

d) Adversarial spectral kernel matching (AdvSKM)

Metric-based method

aligns the two domains via MMD
introduces a spectral kernel mapping,

from which the output is used to minimize MMD between the domains.

\(\rightarrow\) [ Common ] Aim to align the features across SOURCE and TARGET domains.

Research Gap

Existing works merely align the features across source & target domains.

Even though the source and target distributions overlap … this results in mixing the source and target samples of different classes.

3. Problem Definition

Classification task

2 distributions over the TS

a) SOURCE domain \(\mathcal{D}_S\)
b) TARGET domain \(\mathcal{D}_t\)

Labeled samples from the SOURCE domain

given by \(\mathcal{S}=\left\{\left(x_i^s, y_i^s\right)\right\}_{i=1}^{N_s} \sim \mathcal{D}_S\),

Unlabeled samples from the TARGET domain

given by \(\mathcal{T}=\left\{x_i^t\right\}_{i=1}^{N_t} \sim \mathcal{D}_T\)

Each \(x_i\) is a sample of MTS,

denoted by \(x_i=\left\{x_{i t}\right\}_{t=1}^T \in \mathbb{R}^{M \times T}\)

Goal : build a classifier

that generalizes well over \(\mathcal{T}\)
by leveraging the labeled \(\mathcal{S}\).

( At evaluation ) use the labeled \(\mathcal{T}_{\text {test }}=\left\{\left(x_i^t, y_i^t\right)\right\}_{i=1}^{N_{\text {test }}} \sim \mathcal{D}_T\)

4. Proposed CLUDA Framework

Overview of our CLUDA framework

Domain adversarial training
Capture the contextual representation
Align contextual representation across domains.

(1) Architecture

(1) Feature extractor \(F(\cdot)\)

Input = \(x^s\) and \(x^t\)
Output = \(z^s\) and \(z^t\)
Momentum updated version = \(\tilde{F}(\cdot)\)

(2) Classifier network \(C(\cdot)\)

Predict \(y^s\) using \(z^s\).

(3) Discriminator network \(D(\cdot)\)

Trained to distinguish \(z^s\) & \(z^t\).
Introduce domain labels
- \(d=0\) for SOURCE
- \(d=1\) for TARGET

(2) Adversarial Training for UDA

Minimize a combination of two losses:

(1) Prediction loss \(L_c\)

\(L_c=\frac{1}{N_s} \sum_i^{N_s} L_{\mathrm{pred}}\left(C\left(F\left(x_i^s\right)\right), y_i^s\right)\).

(2) Domain classification loss \(L_{\mathrm{disc}}\)

\(L_{\text {disc }}=\frac{1}{N_s} \sum_i^{N_s} L_{\text {pred }}\left(D\left(R\left(F\left(x_i^s\right)\right)\right), d_i^s\right)+\frac{1}{N_t} \sum_i^{N_t} L_{\text {pred }}\left(D\left(R\left(F\left(x_i^t\right)\right)\right), d_i^t\right)\).

Adversarial learning
- \(D(\cdot)\) = trained to MINIMIZE the loss
- \(F(\cdot)\) = trained to MAXIMIZE the loss
achieved by the gradient reversal layer \(R(\cdot)\) between \(F(\cdot)\) and \(D(\cdot)\),

defined by \(R(x)=x, \quad \frac{\mathrm{d} R}{\mathrm{~d} x}=-\mathbf{I}\)

(3) Capturing Contextual Representations

(1) Encourage \(F(\cdot)\) to learn label-preserving information captured by the context.

(2) Hypothesize that (a) < (b)

(a) discrepancy between the contextual representations of two domains
(b) discrepancy between their feature space

Leverage CL in form of MoCo

apply semantic-preserving augmentations to each sample of MTS
2 views of each sample
- query \(x_q\) …… \(z_q=F\left(x_q\right)\)
- key \(x_k\) ……. \(z_k=\tilde{F}\left(x_k\right)\)

Momentum-updated feature extractor

\(\theta_{\tilde{F}} \leftarrow m \theta_{\tilde{F}}+(1-m) \theta_F\).

Contrastive loss

\(L_{\mathrm{CL}}=-\frac{1}{N} \sum_{i=1}^N \log \frac{\exp \left(Q\left(z_{q i}\right) \cdot z_{k i} / \tau\right)}{\exp \left(Q\left(z_{q i}\right) \cdot z_{k i} / \tau\right)+\sum_{j=1}^J \exp \left(Q\left(z_{q i}\right) \cdot z_{k j} / \tau\right)}\).

Since we have two domains (i.e., source and target)

\(\rightarrow\) two CL loss ( \(L_{\mathrm{CL}}^s\) & \(L_{\mathrm{CL}}^t\) )

(4) Aligning the Contextual Representation Across Domains

Further aligns the contextual representation across the SOURCE & TARGET

First nearest-neighbor CL approach for UDA of TS

**Nearest-neighbor CL (NNCL) **

facilitate the classifier \(C(\cdot)\) to make accurate predictions for the target domain
by creating positive pairs between domains
- explicitly align the representations across domains.

\(L_{\mathrm{NNCL}}=-\frac{1}{N_t} \sum_{i=1}^{N_t} \log \frac{\exp \left(z_{q i}^t \cdot N N_s\left(z_{k i}^t\right) / \tau\right)}{\sum_{j=1}^{N_s} \exp \left(z_{q i}^t \cdot z_{q j}^s / \tau\right)}\).

retrieves the nearest-neighbor of an embedding from the source queries \(\left\{z_{q i}^s\right\}_{i=1}^{N_s}\).

(5) Training

\(L=L_c+\lambda_{\mathrm{disc}} \cdot L_{\mathrm{disc}}+\lambda_{\mathrm{CL}} \cdot\left(L_{\mathrm{CL}}^s+L_{\mathrm{CL}}^t\right)+\lambda_{\mathrm{NNCL}} \cdot L_{\mathrm{NNCL}}\).

5. Experimental Setup

Earlier works of UDA on time series

Wilson et al., 2020; 2021; Cai et al., 2021; Liu \& Xue, 2021

(1) Datasets

Established benchmark datasets
- WISDM (Kwapisz et al., 2011), HAR (Anguita et al., 2013), and HHAR (Stisen et al., 2015).
- each patient = each domain
- randomly sample 10 source-target domain pairs for evaluation.
Real-world setting with medical datasets
- MIMIC-IV (Johnson et al., 2020) and AmsterdamUMCdb (Thoral et al., 2021).

each age group = each domain

(2) Baselines

Model w/o UDA

use feature extractor \(F(\cdot)\) and the classifier \(C(\cdot)\) using the same architecture as in our CLUDA.
only trained on the source domain.

Model w/ UDA ( for TS )

(1) VRADA (Purushotham et al., 2017)
(2) CoDATS (Wilson et al., 2020)
(3) TS-SASA (Cai et al., 2021)
(4) AdvSKM (Liu \& Xue, 2021)

Model w/ UDA ( not CV )

(5) CAN (Kang et al., 2019)
(6) CDAN (Long et al., 2018)
(7) DDC (Tzeng et al., 2014)
(8) DeepCORAL (Sun \& Saenko, 2016)
(9) DSAN (Zhu et al., 2020)
(10) HoMM (Chen et al., 2020a)
(11) MMDA (Rahman et al., 2020).

6. Results

(1) Established benchmark datasets

Average accuracy of each method for 10 source-target domain pairs

on the WISDM, HAR, and HHAR datasets

to study the domain discrepancy
(a) The embeddings of w/o UDA
- significant domain shift between source and target
- two clusters of each class (i. e., one for each domain)
(b) CDAN as the best baseline
- reduces the domain shift
- by aligning the features of source and target for some classes,
  
  ( BUT mixes the different classes of the different domains (e.g., blue class of source and green class of target overlap). )
(c) CLUDA
- pulls together the source (target) classes for the source (target) domain (due to the CL)
- pulls both source and target domains together for each class (due to the alignment).

Ablation Study

Figure 2-(b)

(2) Real-world setting with medical datasets

Twitter Facebook LinkedIn

(paper 93) Contrastive Learning for Unsupervised Domain Adaptation of Time Series

Seunghan Lee

Contrastive Learning for Unsupervised Domain Adaptation of Time Series

Contents

0. Abstract

CLUDA

1. Introduction

Unsupervised domain adaptation (UDA)

Transfer Learning

Components of CLUDA

Experiments

Contributions

(1) Unsupervised Domain Adaptation (UDA)

3 Categories

(2) UDA for TS

a) Variational recurrent adversarial deep domain adaptation (VRADA)

b) Convolutional deep domain adaptation for time series (CoDATS)

c) Time series sparse associative structure alignment (TS-SASA)

d) Adversarial spectral kernel matching (AdvSKM)

3. Problem Definition

4. Proposed CLUDA Framework

(1) Architecture

(2) Adversarial Training for UDA

(3) Capturing Contextual Representations

(4) Aligning the Contextual Representation Across Domains

(5) Training

5. Experimental Setup

(1) Datasets

(2) Baselines

6. Results

(1) Established benchmark datasets

Ablation Study

(2) Real-world setting with medical datasets

You May Also Enjoy

(paper 93) Contrastive Learning for Unsupervised Domain Adaptation of Time Series

Seunghan Lee

Contrastive Learning for Unsupervised Domain Adaptation of Time Series

Contents

0. Abstract

CLUDA

1. Introduction

Unsupervised domain adaptation (UDA)

Transfer Learning

Components of CLUDA

Experiments

Contributions

2. Related Work

(1) Unsupervised Domain Adaptation (UDA)

3 Categories

(2) UDA for TS

a) Variational recurrent adversarial deep domain adaptation (VRADA)

b) Convolutional deep domain adaptation for time series (CoDATS)

c) Time series sparse associative structure alignment (TS-SASA)

d) Adversarial spectral kernel matching (AdvSKM)

3. Problem Definition

4. Proposed CLUDA Framework

(1) Architecture

(2) Adversarial Training for UDA

(3) Capturing Contextual Representations

(4) Aligning the Contextual Representation Across Domains

(5) Training

5. Experimental Setup

(1) Datasets

(2) Baselines

6. Results

(1) Established benchmark datasets

Ablation Study

(2) Real-world setting with medical datasets

You May Also Enjoy