Think Globally, Act Locally ; A DNN approach to High-Dimensional TSF (2019,120)

Abstract
Introduction
Problem Setting
LeveledInit : Handling Diverse Scales with TCN
DeepGLO : A Deep Global Local Forecaster
1. Global : TCN-MF
2. Combining the Global model with Local features

Abstract

HIGH-dimension TS ( millions of correlated TS)

\(\rightarrow\) need to exploit GLOBAL patterns & COUPLE them with local calibration

DeepGLO

deep forecasting model, which thinks globally & acts locally
hybrid model, that combines…
- 1) global matrix factorization ( regularized by TCN )
- 2) temporal network, that captures local properties

1. Introduction

2 shortcomings of causal convolutions

1) difficult to train on data, with wide variation in scales
- ex) TS1 = [ 20,24,25,19,… ] & TS2 = [ 8000,8130,7830, …]
2) focus only on LOCAL past data
- alternative : use 2D convolutions & recurrent connections
  - able to take multiple input \(\rightarrow\) capture global properties
  - but… not scale beyond thousands of TS
- alternative ) TRMF ( Temporally Regularized Matrix Factorization )
  - express all TS as “linear combination” of basis TS
  - but…. can only model linear temporal dependencies

propose DeepGLO

1) think globally & act locally ( leverage both local & global patterns )
2) able in “wide variations” in scale

2. Problem Setting

(1) Forecasting Task

Notation

raw TS : \(\mathbf{Y}=\left[\mathbf{Y}^{(\mathrm{tr})} \mathbf{Y}^{(\mathrm{te})}\right]\)
- train set : \(\mathbf{Y}^{(\mathrm{tr})} \in \mathbb{R}^{n \times t}\)……. \(t\) : number of points observed
- test set : \(\mathbf{Y}^{(\mathrm{te})} \in \mathbb{R}^{n \times \tau}\)…….. \(\tau\) : window size for forecasting
- \(\mathbf{y}^{(i)}\) : \(i\)-th TS
covariates : \(\mathbf{Z} \in \mathbb{R}^{n \times r \times(t+\tau)}\)
- \(i\)-th TS and \(j\)-th time point : \(\mathbf{z}_{j}^{(i)}=\mathbf{Z}[i,:, j]\) ( \(r\)-dim )

Task :

given original TS \(\mathbf{Y}^{(\mathrm{tr})}\) & \(\mathbf{Z}\)
predict future in the test range ( = \(\hat{\mathbf{Y}}^{(\mathrm{te})} \in \mathbb{R}^{n \times \tau}\) )

(2) Objective

a) normalized absolute deviation ( = WAPE )

\(\mathcal{L}\left(Y^{(\mathrm{obs})}, Y^{(\mathrm{pred})}\right)=\frac{\sum_{i=1}^{n} \sum_{j=1}^{\tau} \mid Y_{i j}^{(\mathrm{obs})}-Y_{i j}^{(\mathrm{pred})} \mid }{\sum_{i=1}^{n} \sum_{j=1}^{\tau} \mid Y_{i j}^{(\mathrm{obs})} \mid }\).

b) squared-loss

\(\mathcal{L}_{2}\left(Y^{(\mathrm{obs})}, Y^{(\mathrm{pred})}\right)=(1 / n \tau)\left\ \mid Y^{(\mathrm{obs})}-Y^{(\mathrm{pred})}\right\ \mid _{F}^{2}\).

3. LeveledInit : Handling Diverse Scales with TCN

LeveledInit = simple initialization scheme for TCN

designed to handle high-dim TS with wide variation in scale
without apriori normalization

Previous works

choice of normalization parameters have significant effect on performance

Scheme

Goal : deal with wide variation in scale
start with initial parameters, that results in approximately predicting the AVERAGE value of a given window of past time points \(\mathbf{y}_{j-l: j-1}\) as the future prediction \(\hat{y}_{j}\)

4. DeepGLO : A Deep Global Local Forecaster

leverage both GLOBAL & LOCAL features
present a global component : TCN-MF (TCN regularized Matrix Factorization)
- represent each of TS as a linear combination of \(k\) basis TS

(1) Global : TCN-MF

propose a LOW-rank MF model

uses TCN for regularization
idea : factorize \(\mathbf{Y}^{(tr)}\) into low-rank factors..
- 1) \(\mathbf{F} \in \mathbb{R}^{n \times k}\)
- 2) \(\mathbf{X}^{(\mathrm{tr})} \in \mathbb{R}^{k \times t}\), where \(k \ll n\)
  - ( = comprised of \(k\) basis TS, that capture global temporal patterns )

Temporal Regularization by TCN

TCN that captures temporal patterns in \(\mathbf{Y}^{(tr)}\)

\(\rightarrow\) encourage temporal structures in \(\mathbf{X}^{(\mathrm{tr})} \in \mathbb{R}^{k \times t}\)

Regularization :

\(\mathcal{R}\left(\mathbf{X}^{(\mathrm{tr})} \mid \mathcal{T}_{X}(\cdot)\right):=\frac{1}{ \mid \mathcal{J} \mid } \mathcal{L}_{2}\left(\mathbf{X}[:, \mathcal{J}], \mathcal{T}_{X}(\mathbf{X}[:, \mathcal{J}-1])\right)\).

Objective function :

\(\mathcal{L}_{G}\left(\mathbf{Y}^{(\operatorname{tr})}, \mathbf{F}, \mathbf{X}^{(\operatorname{tr})}, \mathcal{T}_{X}\right):=\mathcal{L}_{2}\left(\mathbf{Y}^{(\operatorname{tr})}, \mathbf{F X}^{(\operatorname{tr})}\right)+\lambda_{\mathcal{T}} \mathcal{R}\left(\mathbf{X}^{(\operatorname{tr})} \mid \mathcal{T}_{X}(\cdot)\right)\).

Training

low-rank factors \(\mathbf{F}\) , \(\mathbf{X}^{(tr)}\) & temporal network \(\mathcal{T}_X(\cdot)\)

\(\rightarrow\) trained alternatively, to minimize the loss!

Prediction

trained \(\mathcal{T}_X(\cdot)\) ( local network )

\(\rightarrow\) used for multi-step look ahead prediction
\(\left[\hat{x}_{j-l+1}, \cdots, \hat{x}_{j}\right]:=\mathcal{T}_{X}\left(\mathbf{x}_{j-l: j-1}\right)\).
final global predictions : \(\mathbf{Y}^{(t e)}=\mathbf{F} \hat{\mathbf{X}}^{(t e)}\)

(2) Combining the Global model with Local features

Final hybrid model : TCN \(\mathcal{T}_Y(\cdot \mid \Theta_Y)\)

input : \(\mathbf{Y}^{(\mathrm{tr})}\) & \(\mathbf{Z}\)
predict \(\hat{\mathbf{Y}}^{(\mathrm{te})} \in \mathbb{R}^{n \times \tau}\)

Twitter Facebook LinkedIn

(paper) Think Globally, Act Locally ; A DNN approach to High-Dimensional TSF

Seunghan Lee