Change Point Detection in Time Series Data using Autoencoders with a Time-Invariant Representation (2020, 4)Permalink

ContentsPermalink

  1. Abstract
  2. Introduction
  3. Problem Formulation
  4. AE based CPD
    1. Preprocessing
    2. Feature Encoding
    3. Post processing & Peak detection


0. AbstractPermalink

CPD = locate abrupt property changes

This paper…

  • uses autoencoder (AE) based methodology with novel loss function

  • mitigate the issue of false detection alarms using a post-processing procedure


Allow user to indicate whether

  • change points should be sought in time domain, frequency domain, or both


Detectable change points : abrupt changes in…

  • slope / mean / variance / autocorrelation / frequency spectrum


1. IntroductionPermalink

CPD’s goal

  • 1) goal in itself
  • 2) pre-processing tool to segment a TS in homogeneous segments


CPD’s categories

  • 1) online CPD :

    • real-time detection
    • dissimilarity = based on difference in distn of 2 intervals
  • 2) retrospective (offline) CPD :

    • robust detections
    • at the cost of needing more future data

    • ( this paper focuses on this method )


Many CPD algorithms compare “past & future TS”, by means of “dissimilarity” measure


Past Works ( algorithm : assumption(model) )

  • ex 1) CUSUM : parametric probability distribution
  • ex 2) GLR : autoregressive model
  • ex 3) subspace method : state-space model
  • ex 4) Bayesian online CPD

performance depends on “how well actual data follows the assumed model”

  • ex 5) KDE : parameter free
  • ex 6) Related Density Ratio estimation : parameter free


ex 7) autoencoder

  • pros

    • absence of distn assumptions
    • extract complex features in cost-efficient ways
  • cons

    • no guarantee that distance between consecutive features reflect actual dissimilarity

    • correlated nature of TS samples is not properly used
    • absence of post-processing procedure preceding detection of peaks in the dissimilarity measure leads to high FP detection alarms


TIRE (partially Time Invariate REpresentation)Permalink

  • new AE-based CPD

  • contribution

    • 1) novel adaptation of AE with a loss function that promotes time-invariant features ( + define new dissimilarity measure )

    • 2) focus on non-iid data, use discrete Fourier transform to obtain temporally localized spectral information, propose an approach that combines “time & frequency” domain info


2. Problem FormulationPermalink

X : time series of…

  • # of channel : d
    • Xi : ith channel
  • length : T ( where 0=T0<T1<<Tp=T )
    • T1,T2, : change points


(X[Tk+1],,X[Tk+1]) : subsequence of time series

  • realization of discrete time weak-sense stationary stochastic (WSS) process


Goal of CPD

  • estimate change points, w.o prior knowledge on “number & location”


Dissimilarity Measure

  • dissimilarity between (a) & (b)
    • (a) (X[tN+1],,X[t])
    • (b) (X[t+1],,X[t+N])
  • N : user defined “window size”


Goal (1) :

  • develop a CPD-tailored feature embedding
  • and corresponding dissimilarity measure Dt
    • ( Dt peaks, when the WSS restriction is violated )

Goal (2) :

  • determine all local maxima
  • label each local maximum, which exceed certain threshold τ


3. AE based CPDPermalink

(1) PreprocessingPermalink

step 1) divide each channel into window size of N

  • xit=[Xi[tN+1],,Xi[t]]TRN.


step 2) combine for every t ( into single vector )

  • yt=[(x1t)T,,(xdt)T]TRNd.


step 3) Transformation

  • (1) : DFT (discrete Fourier transform)
    • DFT on each window
      • to obtain “temporally localized spectral information”
  • (2) : cropped to length M

  • (1) + (2) = F:RNRM

  • zt=[F(x1t)T,,F(xdt)T]TRMd>

    ( = frequency-domain counterpart of yt )


(2) Feature EncodingPermalink

[1] use AEs to extract features…

  • from **time-domain(TD) windows **{yt}t

  • from frequency-domain (FD) windows {zt}t.


[2] proposal of new loss function

  • that promotes time-invariance of the features in consecutive windows


Notation

  • ENC input : ytRNd
  • ENC output : ht=σ(Wyt+b)
  • DEC output : ˜yt=σ(Wht+b)
    • choose σ=σ to be the hyperbolic tangent function
  • minimize yt˜yt


BUT, ht will also contain information “NOT relevant to CPD”

  • ex) phase shift, noise …

solve by introducing….

ht=[(st)T,(ut)T]T.

  • 1) time invariant features : stRs
    • invariant over time within a WSS segment
    • ex) mean, amplitude, frequency
  • 2) instantaneous features : utRhs
    • all other info


Minimize the loss function…

  • (original) t(yt˜yt2+λ∣∣stst12)

  • (SGD) tTj(yt˜yt2+λ∣∣stst12)

  • but, tTj does not generally imply that t1Tj.

    tTj(yt˜yt2+λKK1k=0∣∣stkstk12).

    • can think of it as K+1 parallel autoencoders


figure2


the number of instantaneous features should be taken as small as possible!


(3) Post processing & Peak detectionPermalink

construct dissimilarity measure!

1) PostprocessingPermalink

  • combine TD & FD time-invariant features into single feature!
  • st=[α(sTDt)T,β(sFDt)T]T.
    • zero-delay weighted moving average filter to smoothen!
    • ˜st[i]=N1k=N+1v[Nk]st+k[i].
      • triangular shaped weighting window


Dissimilarity measure

  • Dt=∣∣˜st˜st+N2.
  • by expert knowledge…
    • case 1) α=1 and β=0
    • case 2) α=0 and β=1
    • case 3) α=Q({DFDt}t,0.95) and β=Q({DTDt}t,0.95).


2) Peak DetectionPermalink

difficult to automatically select a representative peak!

propose to reuse the impulse response v from the moving average filtering

˜Dt=N1k=N+1v[Nk]Dt+k..


detected alarms correspond to all local maxima of (˜DN,˜DN+1,,˜DTN).

Prominence of peak : minimum height needed


Given that Dt is a local maximum…

  • step 1) define the 2 closest time stamps left & right of t, for which the dissimilarity measure is larger than Dt ( = tL & tR )
    • tL=max{sup{tDt>Dt and t<t},N}.
    • tR=min{inf{tDt>Dt and t>t},TN}.
  • step 2) Prominence P(Dt) of local maximum Dt :
    • P(Dt)=Dtmax{mintL<t<tDt,mint<t<tRDt}.

Tags:

Categories:

Updated: