Precursor-of-Anomaly Detection for Irregular Time Series

https://arxiv.org/pdf/2306.15489

Abstract
Introduciton
Preliminaries
Proposed Methods
1. Problem statement
2. Overall workflow
3. NN based on co-evolving NCDEs
4. Training algorithm
Experiments
1. Regular TS
2. Irregular TS

Abstract

Precursor-of-Anomaly (PoA) detection.

Propose a novel type of AD

(Conventional) AD vs. PoA

(Conventional) AD: Focuses on determining whether a **GIVEN time series ** is anomaly
PoA: Focuses on determining whether a **FUTURE time series ** is anomaly

Proposed algorithm: Neural controlled differential equation-based NN

1. Introduction

Contributions

First to solve both AD & PoA
Propose PAD (NCDE-based unified framework)
Design a (1) multi-task & (2) knowledge distillation learning method
Self-supervised learning (SSL)
Propose an augmentation method to create artificial anomalies for SSL
Regular & Irregular TS experiments + Code provided
- https://github.com/sheoyon-jhin/PAD

2. Preliminaries

Differential equations

Neural Ordinary Differential Equations (NODE)
Neural Controlled Differential Equations (NCDE)

NODE vs. NCDE

NODE:
- \(\mathbf{z}(T)=\mathbf{z}(0)+\int_0^T f\left(\mathbf{z}(t), t ; \theta_f\right) d t\).
NCDE:
- \(\begin{aligned} z(T) & =z(0)+\int_0^T f\left(z(t) ; \theta_f\right) d X(t) \\ & =z(0)+\int_0^T f\left(z(t) ; \theta_f\right) \frac{d X(t)}{d t} d t \end{aligned}\).

3. Proposed Methods

(1) Problem statement

Notation & Settings

MTS: \(\mathbf{x}_{0: T}=\left\{\mathbf{x}_0, \mathbf{x}_1, \ldots, \mathbf{x}_T\right\}\), where., \(\mathbf{x}_t \in \mathbb{R}^N\).
- (Irregular TS) Time-interval between two consecutive observations is not a constant
- (Regular TS) Time0interval is fixed
Window-based approach
- \(x_{0:T}\) is divided into non-overlapping windows
  - i.e., \(w_i=\) \(\left[\mathbf{x}_{t_0^l}, \mathbf{x}_{t_i^l}, \ldots, \mathbf{x}_{t_b^l}\right]\) where \(t_j^i=i \times b+j-1\) with a window size of \(b\).
  - \(\left\lceil\frac{T}{b}\right\rceil\) windows in total for \(x_{0, T}\).

Task: For a given input window \(w_i\), …

(AD) Decides whether \(w_i\) contains abnormal observations
(PoA) Decides whether \(w_{i+1}\) contains abnormal observations

(Both are binary classification tasks)

(2) Overall workflow

(1) SSL
- Create augmented training data
(2) Two co-evolving NCDE layers
- Produce the last hidden representations \(\mathbf{h}(T)\) and \(\mathbf{z}(T)\)
(3) Training stage: Anomaly NCDE gets two inputs:
- \(w_i\) for the anomaly detection
- \(w_{i+1}\) for the PoA detection
(4) Two output layers:
- For the anomaly detection
- For the PoA detection
\(\rightarrow\) These two different tasks are integrated into a single training method!
- (Multi-task learning) Shared parameter \(\theta_c\)
(5) Two outputs:
- Creates the two outputs \(\hat{y}_i^a\) and \(\hat{y}_{i+1}^a\) for the knowledge distillation

(3) NN based on Co-evolving NCDEs

Dual co-evolving NCDEs

(1) One for the AD
(2) One for PoA

[Preprocessing step]

Given a discrete TS sample \(\mathbf{x}_{1: T}\)….

\(\rightarrow\) Create a continuous path \(X(t)\) using an interpolation method

[Co-evolving NCDEs]

Obtain two hidden vectors \(h(T)\) and \(z(T)\)
\(\mathbf{h}(T)=\mathbf{h}(0)+\int_0^T f\left(\mathbf{h}(t) ; \theta_f, \theta_c\right) \frac{d X(t)}{d t} d t\).
\(\mathbf{z}(T)=\mathbf{z}(0)+\int_0^T g\left(\mathbf{z}(t) ; \theta_g, \theta_c\right) \frac{d X(t)}{d t} d t\).

(Multi-task) Architecture details:

\(f\left(\mathrm{~h}(t) ; \theta_f, \theta_e\right) =\underbrace{\rho(\mathrm{FC}(\phi(\mathrm{FC}(\mathrm{~h}(t)))))}_{\theta_f}+\underbrace{\rho(\mathrm{FC}(\phi(\mathrm{FC}(\mathrm{~h}(t)))))}_{\theta_0}\).
\(g\left(z(t) ; \theta_g, \theta_e\right) =\underbrace{\rho(\mathrm{FC}(\phi(\mathrm{FC}(z(t)))))}_{\theta_0}+\underbrace{\rho(\mathrm{FC}(\phi(\mathrm{FC}(z(t)))))}_{\theta_0}\).
- \(\phi\): ReLU
- \(\beta\) : Tanh
Note that two NCDEa co-evolve by using the shared params \(\theta_c\)

Output layers:

(For AD) \(y_i^e=\sigma(F C_{\theta_0}(h(T))\)
(For PoA) \(S_i^f=\sigma\left(F C_{\theta_p}(z(T))\right)\).

(4) Training Algorithm

Loss function: (CE loss)

\(L_{K D D}=C E\left(\hat{\rho}_{i+1}^a+\hat{\rho}_{i+1}^p\right)\).
- \(\hat{y}_{i+1}^a\) : Anomaly NCDE model’s output
- \(\hat{y}_{i+1}^p\): PoA NCDE model’s output
\(L_a=C E\left(\hat{y}_i^a, y_i\right)\).
- \(y_i\): GT of anomaly detection.

4. Experiments

(1) Regular TS

(2) Irregular TS

Twitter Facebook LinkedIn

Precursor-of-Anomaly Detection for Irregular Time Series

Seunghan Lee

Precursor-of-Anomaly Detection for Irregular Time Series

Contents

Abstract

1. Introduction

Contributions

2. Preliminaries

3. Proposed Methods

(1) Problem statement

(2) Overall workflow

(3) NN based on Co-evolving NCDEs

(4) Training Algorithm

4. Experiments

(1) Regular TS

(2) Irregular TS

You May Also Enjoy