Change Point Detection in Time Series Data using Autoencoders with a Time-Invariant Representation (2020, 4)Permalink
ContentsPermalink
- Abstract
- Introduction
- Problem Formulation
- AE based CPD
- Preprocessing
- Feature Encoding
- Post processing & Peak detection
0. AbstractPermalink
CPD = locate abrupt property changes
This paper…
-
uses autoencoder (AE) based methodology with novel loss function
-
mitigate the issue of false detection alarms using a post-processing procedure
Allow user to indicate whether
- change points should be sought in time domain, frequency domain, or both
Detectable change points : abrupt changes in…
- slope / mean / variance / autocorrelation / frequency spectrum
1. IntroductionPermalink
CPD’s goal
- 1) goal in itself
- 2) pre-processing tool to segment a TS in homogeneous segments
CPD’s categories
-
1) online CPD :
- real-time detection
- dissimilarity = based on difference in distn of 2 intervals
-
2) retrospective (offline) CPD :
- robust detections
-
at the cost of needing more future data
- ( this paper focuses on this method )
Many CPD algorithms compare “past & future TS”, by means of “dissimilarity” measure
Past Works ( algorithm : assumption(model) )
- ex 1) CUSUM : parametric probability distribution
- ex 2) GLR : autoregressive model
- ex 3) subspace method : state-space model
- ex 4) Bayesian online CPD
→ performance depends on “how well actual data follows the assumed model”
- ex 5) KDE : parameter free
- ex 6) Related Density Ratio estimation : parameter free
ex 7) autoencoder
-
pros
- absence of distn assumptions
- extract complex features in cost-efficient ways
-
cons
-
no guarantee that distance between consecutive features reflect actual dissimilarity
- correlated nature of TS samples is not properly used
- absence of post-processing procedure preceding detection of peaks in the dissimilarity measure leads to high FP detection alarms
-
TIRE (partially Time Invariate REpresentation)Permalink
-
new AE-based CPD
-
contribution
-
1) novel adaptation of AE with a loss function that promotes time-invariant features ( + define new dissimilarity measure )
-
2) focus on non-iid data, use discrete Fourier transform to obtain temporally localized spectral information, propose an approach that combines “time & frequency” domain info
-
2. Problem FormulationPermalink
X : time series of…
- # of channel : d
- Xi : ith channel
- length : T ( where 0=T0<T1<⋯<Tp=T )
- T1,T2,⋯ : change points
(X[Tk+1],…,X[Tk+1]) : subsequence of time series
- realization of discrete time weak-sense stationary stochastic (WSS) process
Goal of CPD
- estimate change points, w.o prior knowledge on “number & location”
Dissimilarity Measure
- dissimilarity between (a) & (b)
- (a) (X[t−N+1],…,X[t])
- (b) (X[t+1],…,X[t+N])
- N : user defined “window size”
Goal (1) :
- develop a CPD-tailored feature embedding
- and corresponding dissimilarity measure Dt
- ( Dt peaks, when the WSS restriction is violated )
Goal (2) :
- determine all local maxima
- label each local maximum, which exceed certain threshold τ
3. AE based CPDPermalink
(1) PreprocessingPermalink
step 1) divide each channel into window size of N
- xit=[Xi[t−N+1],…,Xi[t]]T∈RN.
step 2) combine for every t ( into single vector )
- yt=[(x1t)T,…,(xdt)T]T∈RNd.
step 3) Transformation
- (1) : DFT (discrete Fourier transform)
- DFT on each window
- to obtain “temporally localized spectral information”
- DFT on each window
-
(2) : cropped to length M
-
(1) + (2) = F:RN→RM
-
zt=[F(x1t)T,…,F(xdt)T]T∈RMd>
( = frequency-domain counterpart of yt )
(2) Feature EncodingPermalink
[1] use AEs to extract features…
-
from **time-domain(TD) windows **{yt}t
-
from frequency-domain (FD) windows {zt}t.
[2] proposal of new loss function
- that promotes time-invariance of the features in consecutive windows
Notation
- ENC input : yt∈RNd
- ENC output : ht=σ(Wyt+b)
- DEC output : ˜yt=σ′(W′ht+b′)
- choose σ=σ′ to be the hyperbolic tangent function
- minimize ∣∣yt−˜yt∣∣
BUT, ht will also contain information “NOT relevant to CPD”
- ex) phase shift, noise …
→ solve by introducing….
ht=[(st)T,(ut)T]T.
- 1) time invariant features : st∈Rs
- invariant over time within a WSS segment
- ex) mean, amplitude, frequency
- 2) instantaneous features : ut∈Rh−s
- all other info
Minimize the loss function…
-
(original) ∑t(∣∣yt−˜yt∣∣2+λ∣∣st−st−1∣∣2)
-
(SGD) ∑t∈Tj(∣∣yt−˜yt∣∣2+λ∣∣st−st−1∣∣2)
-
but, t∈Tj does not generally imply that t−1∈Tj.
→ ∑t∈Tj(∣∣yt−˜yt∣∣2+λK∑K−1k=0∣∣st−k−st−k−1∣∣2).
- can think of it as K+1 parallel autoencoders
the number of instantaneous features should be taken as small as possible!
(3) Post processing & Peak detectionPermalink
construct dissimilarity measure!
1) PostprocessingPermalink
- combine TD & FD time-invariant features into single feature!
- st=[α⋅(sTDt)T,β⋅(sFDt)T]T.
- zero-delay weighted moving average filter to smoothen!
- ˜st[i]=∑N−1k=−N+1v[N−k]⋅st+k[i].
- triangular shaped weighting window
Dissimilarity measure
- Dt=∣∣˜st−˜st+N∣∣2.
- by expert knowledge…
- case 1) α=1 and β=0
- case 2) α=0 and β=1
- case 3) α=Q({DFDt}t,0.95) and β=Q({DTDt}t,0.95).
2) Peak DetectionPermalink
difficult to automatically select a representative peak!
→ propose to reuse the impulse response v from the moving average filtering
˜Dt=∑N−1k=−N+1v[N−k]⋅Dt+k..
detected alarms correspond to all local maxima of (˜DN,˜DN+1,…,˜DT−N).
Prominence of peak : minimum height needed
Given that Dt is a local maximum…
- step 1) define the 2 closest time stamps left & right of t,
for which the dissimilarity measure is larger than Dt ( = tL & tR )
- tL=max{sup{t∗∣Dt∗>Dt and t∗<t},N}.
- tR=min{inf{t∗∣Dt∗>Dt and t∗>t},T−N}.
- step 2) Prominence P(Dt) of local maximum Dt :
- P(Dt)=Dt−max{mintL<t∗<tDt∗,mint<t∗<tRDt∗}.