METRO : A Generic GNN Framework for MTS Forecasting (2020, 131)Permalink
ContentsPermalink
- Abstract
- Introduction
- Preliminaries
- METRO
- Temporal Graph Embedding Unit
- Single-scale Graph Update Unit
- Cross-scale Graph Fusion Kernel
- Predictor and Training Strategy
0. AbstractPermalink
METROPermalink
-
generic framework with “Multi-scale Temporal Graphs NN”
-
models the “dynamic & cross-scale variable correlations” simultaneously
-
represent MTS as “series of temporal graphs”
can capture both
- inter-step correlation
- intra-step correlation
via message-passing & node embedding
1. IntroductionPermalink
GNN for MTS forecasting
-
capture the interdependencies & dynamic trends over a group of variables
- nodes = variables
- edges = interactions of variables
-
can use “even for data w.o explicit graph structures”
- use hidden graph structures
-
previous works
-
assume that “MTS are static”
→ however, “interdependencies between variables can be temporally dynamic”
- ex) MTGNN, TEGNN, ASTGCN
-
Previous GNN works
-
MTGNN & TEGNN
- learn/compute adjacency matrices of graphs, from the whole input TS
- hold them for “subsequent temporal modeling”
-
ASTGCN
-
for ‘traffic prediction’
( connections between nodes are known in advance )
-
considers “adaptive adjacency matrix” ( attention value )
-
but on relatively large scale, that only cahnges with regard to input series example
-
-
STDN
- flow gating mechanism, to learn “dynamic similarity” by time step
- but, can only be applied on consecutive & grid partitioned regional maps
Multi-scale patterns
-
[ approach 1 ]
- treat time stamps of MTS as “side information”
- combine the “embedded time stamps” with MTS’s embedding
-
[ approach 2 ]
- first, adopt filters ( with different kernel sizes ) to MTS
- second, sum/concatenate the multi-scale features
-
common
-
can incorporate information “across different scales”
-
but, just simple linear-weight transformation
( relation between pairs of scales are ignored )
-
Problems
- 1) dependencies between variables are dynamic
- 2) temporal patterns in MTS occur in multiple scale
→ propose METRO ( Multi-scalE Temporal gRaph neurla netwOrks )
4 components of METRO
- 1) TGE (Temporal Graph Embedding)
- obtains “node embeddings” under multi-scale
- 2) SGU (Single-scale Graph Update)
- dynamically leverages graph structures of nodes in each single scale
- propagates info from a node’s intra/inter-step neighbors
- 3) CGF (Cross-scale Graph Fusion) kernel
- walks along the time axis & fuses features across scales
- 4) predictor
- decoder ( decodes node embeddings )
- gives final predictions
2. PreliminariesPermalink
MTS forecastingPermalink
Notation
- MTS : X={X(1),X(2),…,X(t)}
- at time i : X(i)=[xi,1,..,xi,n]∈Rn×m,i∈[1,…,t].
- xi,k∈Rm,k∈[1,…,n].
- n : number of variables ( TS )
- m : number of dimension
- at time i : X(i)=[xi,1,..,xi,n]∈Rn×m,i∈[1,…,t].
- Goal : predict Δ steps ahead
- output : ˆY(t+Δ)=X(t+Δ)=[xt+Δ,1,..,xt+Δ,n]
- input : {X(t−w+1),…,X(t)}
GraphsPermalink
(1) Graph : G=(V,E)
- static graph
- vertices & edges are invariant over time
(2) Temporal Graph : G(t)=(V(t),E(t))
- graph as a function of time
- 2 types of temporal graphs
- 1) INTRA-step TG : V(t) is made up of variables within time step t
- 2) INTER-step TG : V(t) consists variables of time step t and its previous time step t−1
(3) Multi-scale Temporal Graph : GS(t)=(VS(t),ES(t))
- each node in the graph represents an observation of “time-length ws”
3. METROPermalink
(1) TGE (Temporal Graph Embedding Unit)Permalink
TGE = “encoder”
-
obtain “embeddings of TEMPORAL features of each variables”
-
example)
-
input : {X(t0),…X(t0+w−1)}
-
output :
-
temporal feature at time step t,t∈[t0,…,t0+ w−1],
under scale i is..
-
H0si(t)=emb(si,t)=f(X(t),…,X(t+wsi−1)).
-
-
notation
-
n : # of variables
-
d : embedding dimension
-
f : embedding function
-
ex) unpadded convolution with multiple filters with size 1×wsi
( output of multiple convolution channels are “concatenated” )
-
-
-
(2) SGU (Single-scale Graph Update Unit)Permalink
modeling “hidden relations among variables”
- by training/updating “adjacency matrices”
- problem : message aggregation over time, with FIXED structures?
Solution : introduce SGU
-
adaptively learn “INTRA & INTER” step adjacency matrix of variables
-
propose “graph-based temporal message passing”
- use “historical temporal information” of nodes
-
ex) SGU operates s1, s2, s3 separately
-
example : Gs1(:) … jointly considers
- 1) message passing WITHIN a time step
- Gs1(ti) & time stamps
- 2) message passing BETWEEN a time step
- Gs1(ti) & Gs1(t2)
- 1) message passing WITHIN a time step
Process
-
1) learn messages
-
messages between nodes of 2 adjacent time steps :
-
ml(t−j)=msg(Hl(t−j−1),Hl(t−j),Alt−j−1,t−j).
where Alt−j−1,t−j=gm(Hl(t−j−1),Hl(t−j)).
-
-
ex) msg : GCN
-
ex) gm : transfer entropy / node-embedding-based deep layer model
-
-
2) aggregate messages
- ˜ml(t)=agg(ml(t−k−1),…,ml(t−1)).
- ex) simple concatenation / RNN / Transformer …
Update
-
embedding of nodes ( in step t ) are updated, based on..
- 1) aggregated message
- 2) memory of previous layer
-
ˆHl+1(t)=upd(˜ml(t),Hl(t),Alt).
where Alt=gu(˜ml(t),Hl(t))
- ex) upd : memory update function ( GRUs, LSTMs )
- ex) gu : graph-based GRU
(3) CGF (Cross-scale Graph Fusion) KernelPermalink
CGF
- diffuse information “ACROSS SCALES”
- “scales” = ‘size of respective fields’
- need to decide “which time steps in which scales” can interact
Standard = “HIGHER scale”
Sampling fine features (time steps) = “LOWER scales”
- Zls−i(t)=samp(ˆHl+1s1(:),…,ˆHl+1si−1(:)).
- ˆHl+1sj(:) : output updated features of SGU unit
- Zls−i : selected lower scale features, concatenated in time axis
Information diffusion, between
- 1) Zls−i(t)
- 2) ˆHl+1si(t)
using a graph learning function gf
Hl+1s1:si−1(:),Hl+1si(t)←fuse(Zls−(t),ˆHl+1si(t),Al(s−i,si),t),Al(s−i,si),t=gf(Zls−i(t),ˆHl+1si(t)),.
- Hl+1s1:si−1(:) : cross-scale fused embedding of sampled lower scale features
- Hl+1si(t) : standard high scale at time step t
- ex) fuse : GCNs
(4) Predictor and Training StrategyPermalink
ˆY(t+Δ)=pre(Hs1:sp(:)).