Multivariate Time Series Forecasting with Transfer Entropy Graph (2020, 5)

Contents

  1. Abstract

  2. Introduction

  3. Preliminaries

    1. Neural Granger Causality
    2. Graph Neural Network
  4. Methodology

    1. Problem Formulation
    2. Causality Graph Structure with Transfer Entropy

    3. Feature Extraction of Multiple Receptive Fields
    4. Node Embedding based on Causality Matrix


Abstract

most methods :

  • assume that predicted value of single variable is affected by ALL other variables

    \(\rightarrow\) ignore the causal relationship among variables


propose CauGNN (graph neural network with Neural Granger Causality)

  • node = each variable
  • edge = causal relationship between variables

CNN filters with different perception scales are used ( for feature extraction )


1. Introduction

LSTNet

  • encodes short-term local information into low-dim vector, using 1d-CNN
  • decode the vectors through RNN

\(\rightarrow\) but, existing models cannot model the pairwise causal dependencies among MTS explicitly


G-causality ( Granger Causality Analysis )

  • quantitative characterization of TS causality
  • but, cannot well handle non-linear relationships


Transfer Entropy (TE)

  • causal analysis, which can deal with non-linear cases


CauGNN

  • proposed method

  • 1) after pairwise TE is calculated, TF matrix can be obtained

    \(\rightarrow\) used as “adjacency matrix”

  • 2) CNN filters with different perception scale

    \(\rightarrow\) for TS feature extraction

  • 3) GNN for embedding & forecasting


[ Contributions ]

  1. consider MTS as a graph structure with causality

  2. use transfer entropy to extract causality

    & construct TE graph


2. Preliminaries

(1) Neural Granger Causality

Granger Causality

if addition of historical info of variable \(i\) improves the prediction of variable \(j\),

\(\rightarrow\) variable \(i\) is the cause of variable \(j\)


Granger Causality vs Neural Granger Causality

  • Granger Causality
    • traditional linear model
    • use the AR model for prediction
  • Neural Granger Causality
    • use DL & regularization for non-linearity


Neural Granger Network

consists of 2 parts

  • 1) variable selection module
  • 2) prediction module


a) variable selection module

  • FC layer ( input = historical TS )
  • select key variables by group Lasso regularization
    • if specific group is not significant \(\rightarrow\) will be zero (0)


b) prediction module

  • MLP or LSTM
  • general prediction method


Objective function

  • \(\min _{W} \sum_{t=K}^{T}\left(x_{i t}-g_{i}\left(\boldsymbol{x}_{(t-1):(t-K)}\right)\right)^{2}+\lambda \sum_{j=1}^{p} \mid \mid \left(\boldsymbol{W}_{: j}^{\mathbf{1 1}}, \ldots, \boldsymbol{W}_{: j}^{\mathbf{1} K}\right) \mid \mid _{F}\).


(2) Graph Neural Network

most of GNN approaches fit within the framework of “neural message passing”


GNN can be viewed as… message passing algorithm,

where node representations are iteratively computed from the features of neighbor nodes

( using differentiable aggregation function )


GCN

  • can be regarded as…
    • 1) approximation of spectral-domain convolution of graph signals
    • 2) sampling & aggregating of neighborhood information


GraphSAGE

  • enabling training in batches


GIN (Graph Isomorphism Network)

  • enable more complex forms of aggregation


GAT

  • attending over the neighbors via self-attention


3. Methodology

figure2

(1) Problem Formulation

  • observed TS : \(\boldsymbol{X}_{n}=\left[\boldsymbol{x}_{1}, \boldsymbol{x}_{\boldsymbol{2}}, \ldots, \boldsymbol{x}_{t}\right]\)
    • \(\boldsymbol{x}_{\boldsymbol{i}} \in \mathbb{R}^{n}(i=\) \(1, \ldots, n)\)
    • \(n\) : number of variables
  • goal : predict \(\boldsymbol{x}_{t+h}\)


(2) Causality Graph Structure with Transfer Entropy

(Information) Entropy : \(H(\boldsymbol{X})=-\sum p(x) \log _{2} p(x)\).

  • larger entropy \(\rightarrow\) more information


Conditional Entropy : \(H(\boldsymbol{X} \mid \boldsymbol{Y})=-\sum \sum p(x, y) \log _{2} p(x \mid y)\)

  • information about \(\mathbf{X}\), given \(\mathbf{Y}\) is known


TE of variables \(\mathbf{Y}\) to \(\mathbf{X}\) :

\(\begin{aligned} T_{\boldsymbol{Y} \rightarrow \boldsymbol{X}}=& \sum p\left(x_{t+1}, \boldsymbol{x}_{t}^{(k)}, \boldsymbol{y}_{t}^{(l)}\right) \log _{2} p\left(x_{t+1} \mid \boldsymbol{x}_{t}^{(k)}, \boldsymbol{y}_{t}^{(l)}\right) \\ &-\sum p\left(x_{t+1}, \boldsymbol{x}_{t}^{(k)}\right) \log _{2} p\left(x_{t+1} \mid \boldsymbol{x}_{t}^{(k)}\right) \\ =& \sum p\left(x_{t+1}, \boldsymbol{x}_{t}^{(k)}, \boldsymbol{y}_{t}^{(l)}\right) \log _{2} \frac{p\left(x_{t+1} \mid \boldsymbol{x}_{t}^{(k)}, \boldsymbol{y}_{t}^{(l)}\right)}{p\left(x_{t+1} \mid \boldsymbol{x}_{t}^{(k)}\right)} \\ =& H\left(\boldsymbol{X}_{t+1} \mid \boldsymbol{X}_{t}\right)-H\left(\boldsymbol{X}_{t+1} \mid \boldsymbol{X}_{t}, \boldsymbol{Y}_{t}\right) \end{aligned}\).


  • “increase in information of \(X\)“, when \(Y\) changes from “unknown to known”
  • assymetric


Causal relationship between \(X\) & \(Y\)

  • \(T_{\boldsymbol{X}, \boldsymbol{Y}}=T_{\boldsymbol{X} \rightarrow \boldsymbol{Y}}-T_{\boldsymbol{Y} \rightarrow \boldsymbol{X}}\).
    • \(T_{\boldsymbol{X}, \boldsymbol{Y}} > 0\) : \(\boldsymbol{X}\) is the cause of \(\boldsymbol{Y}\)
    • \(T_{\boldsymbol{X}, \boldsymbol{Y}} < 0\) : \(\boldsymbol{Y}\) is the cause of \(\boldsymbol{X}\)


use neural granger to capture causality

  • causality matrix \(\mathbf{T}\) of mts \(X_n\) can be formulated with…

    \(t_{i j}=\left\{\begin{array}{lc} T_{\boldsymbol{x}_{i}, \boldsymbol{x}_{j}}, & T_{\boldsymbol{x}_{i}, \boldsymbol{x}_{j}}>c \\ 0, & \text { otherwise } \end{array}\right.\).

\(\rightarrow\) use as “adjacency matrix of MTS graph”


(3) Feature Extraction of Multiple Receptive Fields

necessary to consider trend & seasonality

\(\rightarrow\) thus, need to extract the features of TS in units of multiple certain periods

\(\rightarrow\) use multiple CNN filters

  • \(\boldsymbol{h}_{\boldsymbol{i}}=\operatorname{ReLU}\left(\boldsymbol{W}_{\boldsymbol{i}} * \boldsymbol{x}+\boldsymbol{b}_{\boldsymbol{i}}\right), \boldsymbol{h}=\left[\boldsymbol{h}_{\mathbf{1}} \oplus \boldsymbol{h}_{\mathbf{2}} \oplus \ldots \oplus \boldsymbol{h}_{\boldsymbol{p}}\right] .\).


(4) Node Embedding based on Causality Matrix

after feature extraction,

input MTS is converted into feature matrix \(\boldsymbol{H} \in \mathbb{R}^{n \times d}\)

  • feature matrix with \(n\) nodes

adjacency of nodes is determined by causality matrix \(T\)


inspired by k-GNNs

  • only perform information fusion between “neighbors”

    ( ignore information of non-neighbors )


Forward-pass update :

  • \(\boldsymbol{h}_{i}^{(l+1)}=\sigma\left(\boldsymbol{h}_{i}^{(l)} \boldsymbol{W}_{1}^{(l)}+\sum_{j \in N(i)} \boldsymbol{h}_{j}^{(l)} \boldsymbol{W}_{2}^{(l)}\right)\).


Tags:

Categories: ,

Updated: