Deep Semi-supervised Learning for Time Series Classification (2022)
Contents
- Abstract
- Introduction
- From Images to Time Series
- Problem Formulation
- Backbone Architecture
- Data Augmentation
- Methods
- Mean Teacher
- Virtual Adversarial Traning
- MixMatch
- Ladder Net
0. Abstract
Semi-supervised Learning
- mostly on CV ….. not on TS
This paper : discuss the transferability of SOTA models from image to TS
- model backbone
- data augmentation strategies
1. Introduciton
Question : Can we transfer SOTA SSL models from image to TS?
- SOTA models : MixMatch, Virtual Adversarial Training, Mean Teacher, Ladder Net
2 modifications
- (1) modification of a suitable backbone architecture
- (2) adaptions of an appropriate DA strategy
propose four new deep SSL algorithms for TSC
( + meaningful data augmentation )
2. From Images to Time Series
(1) Problem Formulation
Time Series : \(\left\{\left\{x_{1,1}^{(i)}, \ldots, x_{1, t}^{(i)}\right\}, \ldots,\left\{x_{c, 1}^{(i)}, \ldots, x_{c, t}^{(i)}\right\}\right\}\).
- \(t\) : length
- \(c\) : amount of covariates
- \(c=1\) : univariate
- \(c>1\) : multivariate
- \(x^{(i)} \in \mathcal{X} \subseteq \mathbb{R}^{c \times t}\).
Input Space : \(\mathcal{X}\)
Target Space : \(\mathcal{Y}\)
- \(y^{(i)} \in \mathcal{Y}\) …. categorical variable
Goal of SSL : train a prediction model \(f: \mathcal{X} \mapsto \mathcal{Y}\) on a dataset \(\mathcal{D}=\left(\mathcal{D}^l, \mathcal{D}^u\right)\)
- labeled dataset : \(\mathcal{D}^l=\left\{\left(x^{(i)}, y^{(i)}\right)\right\}_{i=1}^{n_l}\)
- unlabeled dataset : \(\mathcal{D}^u=\left\{x^{(i)}\right\}_{i=n_l+1}^n\)
( where \(n=n_l+n_u\) & \(n_l \ll n_u)\)
Batch of data : \(\mathcal{B} \subset \mathcal{D}\) …… \(\mathcal{B}=\left(\mathcal{B}^l, \mathcal{B}^u\right)\)
- (labeled) \(\mathcal{B}^l \subseteq \mathcal{D}^l\)
- (unlabeled ) \(\mathcal{B}^u \subseteq \mathcal{D}^u\)
(2) Backbone Architecture
Dimension
-
Image : 3d tensor
-
TS : 2d tensor ( channels : # of covariates )
Fully Convolutional Network (FCN)
- use it as a backbone architecture
- outperforms a variety of models on 44 different TSC problems
(3) Data Augmentation
Regularization-based semisupervised methods
- injection of random noise into the model
DA strategies : \(g\left(x^{(i)}\right), g: \mathcal{X} \mapsto \mathcal{X}\)
-
perturbate the input \(x^{(i)}\) of a sample,
while preserving the meaning of its label \(y^{(i)}\)
propose the use of the RandAugment strategy
- removes the need for a separate search phase
( for each batch … )
\(\rightarrow\) \(N\) out of \(K\) augmentation strategies are randomly chosen
( + magnitude hyperparameter is introduced to control the augmentation intensity )
Augmentation policies
- warping in the time dimension
- warping the magnitude
- addition of Gaussian Noise
- random rescaling
3. Methods
(1) Mean Teacher
-
consistency-regularization-based models
-
teacher model ( = average of the consecutive student models )
\(\rightarrow\) used to enforce consistency in model predictions
(2) Virtual Adversarial Training (VAT)
-
consistency-regularization-based models
-
a small data perturbation is learned
( maximum change in prediction )
-
perturbed model predictions are used as auxiliary labels
(3) MixMatch
- various semi-supervised techniques ( ex. consistency regularization, Mixup, pseudo-labeling ) are combined within one holistic approach
(4) Ladder net
-
reconstruction-based SSL model
( inspired by denoising autoencoders )
-
extends a supervised encoder model with a corresponding decoder network
( able to use unsupervised reconstruction loss )