REST: Reciprocal Framework for Spatiotemporal-coupled PredictionsPermalink
ContentsPermalink
- Abstract
- Introduction
0. AbstractPermalink
proposes to jointly …
- (1) mine the “spatial” dependencies
- (2) model “temporal” patterns
propose Reciprocal SpatioTemporal (REST)
-
introduces Edge Inference Networks (EINs) to couple with GCNs.
-
generate “multi-modal” directed weighted graphs to serve GCNs.
-
GCNs utilize these graphs ( = spatial dependencies ) to make predictions
& introduce feedback to optimize EINs
-
-
design a phased heuristic approach
- effectively stabilizes training procedure and prevents early-stop
1. IntroductionPermalink
Problem : Domain knowledge is required to construct an accurate graph
→ not only COSTLY, but also SUB-OPTIMAL !
3 Challenges in “spatio temporal-coupled” predictions
-
(1) Data property aspect
-
1-1) lacks existing edge labels
-
1-2) the information of TS may be limited and noisy
→ making it difficult to find the distance among TS & cluster them as a graph
-
-
(2) Learning aspect
- without effective inductive bias, a model is easy to overfit the noises & the learning procedure may become unstable
-
(3) Practicality aspect
-
mining potential links between two arbitrary TS pairs also brings significant computational burden
( as the possible links are in n2 order )
-
Existing research
- (1) based on “predefined” graph structure
- (2) “infer potential links” with strong domain knowledge
propose a novel Reciprocal Spatiotemporal (REST) framework
→ to address 3 challenges
RESTPermalink
consists of 2 integral parts
- (1) Edge Inference Networks (EINs)
- for mining “spatial” dependencies among TS
- (2) GCNs ( e.g., DCRNNs )
- for making “spatio temporal” prediction
Spatial → Temporal
- Spatial dependencies inferred by EINs promote GCNs to make more accurate prediction
Temporal → Spatial
- GCNs help EINs learn better distance measurement
How does EIN overcome 3 challenges?
- (1) Data property challenge
- EINs project TS from TIME domain → FREQUENCY domain
- fertilize the original TS & quantify the multi-modal spatial dependencies among them
- (2) Practicality challenge
- ( before each training epoch )
- EINs firstly sample a fixed number of possible TS neighbors for all the central TS vertices of interest
- ( during the training procedure )
- EINs try to learn a more accurate “distance function”, with the help of the GCN part
-
theoretically explore all possible linkages from the whole dataset,
while remain the sparsity of graph as knn2 for training,
- k : predefined number of neighbor candidates and k<<n
- ( before each training epoch )
- (3) Learning challenge
- propose a phased heuristic approach as a warm-up to drive the REST framework.
2. PreliminariesPermalink
(1) Observation Records ( length = p )Permalink
UTS : xi
- broken down to xi={x0i,x1i,…,xpi} for time series i in the past p time steps.
MTS : X={x1;x2;…;xn}
(2) Prediction Trend ( length = q )Permalink
UTS : ˆyi={ˆyp+1i,ˆyp+2i,…,ˆyp+qi},
- ˆyti(t∈(p,p+q])) : prediction value of TS i in the next t-th time step.
MTS : ˆY={ˆy1;ˆy2;…;ˆyn}
(3) Spatial DependenciesPermalink
graph G=(V,E,W)
-
∣V∣=N.
-
within a mini-batch…
- central vertices : vertices of interest
- adjacent vertices : other vertices that can reach the central vertices within K steps
-
consider multi-modal, weighted and directed spatial dependencies
( W={wm,m=0,1,…,M} )
- weight wmij∈W refers to spatial dependency from TS i to j under modality m.
(4) Spatial Temporal Coupled PredictionPermalink
Given N TS …. jointly learn f(⋅) and g(⋅)
[X1,X2,…,Xp,G0]f(X)⇒g(G)[ˆYp+1,…,ˆYp+q,G1,…,GM].
3. Model ArchitecturePermalink
(1) Spatial InferencePermalink
Edge Inference Networks (EINs)Permalink
- [ Goal ] discover and quantify spatial dependencies among TS vertices
- [ Problem ] information of TS observations may be limited & noisy
- [ Solution ] project the observations from TS to FREQUENCY
- Use Mel-Frequency Cepstrum Coefficients (MFCCs)
- widely used in audio compressing and speech recognition
- use this frequency warping to representTS
- Use Mel-Frequency Cepstrum Coefficients (MFCCs)
How to calculate MFCCs ?Permalink
X[k]=fft(x[n])Y[c]=log(fc+1∑k=fc−1∣X[k]∣2Bc[k])cx[n]=1CC∑c=1Y[c]cos(πn(c−12)C).
- x[n] : TS observations
- fft(⋅) : fast Fourier Transform
- Bc[k] : filter banks
- C : # of MFCCs to retain
- cx[n](=cx) : MFCCs of TS x
Spatial Inference with EINsPermalink
-
[ Input ] MFCCs
-
[ Inference ] estimate the spatial dependencies between two TS, using sigmoid
-
aij∈RM : inferred asymmetric distance from TS i to j under M modalities
-
ci∈RC : namely cx[n], refers to MFCCs of TS i
-
W∈R2C×M and b∈RM
( = parameters for TS distance inference )
-
ci is concatenated with ci−cj
→ models the directed spatial dependencies
EINs play two important roles : (1) sampling & (2) inferring
During data preparation phase…
- (1) sampling : EINs go through the entire dataset to select …
- 1-1) possible adjacent candidates ( = purple vertices )
- 1-2) central vertices of interest ( = yellow vertex )
- (2) inferring : infer and quantify their spatial dependencies under M modalities for GCNs
(2) Temporal PredictionPermalink
integrate a GCN-based spatiotemporal prediction model as backends and make predictions
- ex) DCRNN, Graph WaveNet [28]
In this paper, we will mainly discuss the directed spatial dependencies and consider diffusion convolution on random walk Laplacian.
( considering only one modality … )
(1) random walk Laplacian : Lrw=I−D−1A
- Lrw : transition matrix
(2) bidirectional diffusion convolution
- Z⋆Ggθ≈∑K−1k=0(θk,0(D−1IA)k+θk,1(D−1OA⊤)k)Z.
- Z : inputs of GCN filter
- gθ : diffusion convolution filter
- θ∈RK×2 : trainable parameters
- A : adjacent matrix
- Di & DO : input & output degree matrix
- diffusion convolution can be truncated by a predefined graph convolution depth K, which is empirically not more than 3
- due to the sparsity of most graph, the complexity of recursively calculating above is O(K∣E∣)≪O(N2).
Consider “Multimodality”
-
an enhanced diffusion convolution is formulated by ..
hs=ReLU(∑M−1m=0∑K−1k=0Z⋆GmgΘ).
-
hs∈RN×do : spatial hidden states
- output of diffusion convolution operators
- M : to predefined number of modality
- forward A & backward A⊤ in bidirectional random walk are deemed as 1 modality
-
Θ∈RM×K×di×do : multi-modal, high order GCN params
DCRNNsPermalink
Adopt DCRNNs architecture!
-
to capture temporal dependencies, DCGRUs replace multiplication in GRUs with diffusion convolution
-
DCGRUs are stacked to construct encoders and decoders
rt=σ(fr⋆Gm[Xt,Ht−1]+br)ut=σ(fu⋆Gm[Xt,Ht−1]+bu)Ct=tanh(fC⋆m[Xt,(rt⊙Ht−1)]+bC)Ht=ut⊙Ht−1+(1−ut)⊙Ct.
- Xt : observations of all included vertices ( = all colored vertices )
- Ht−1 : to temporal hidden states generated by last DCGRUs
- ⋆Gm : diffusion convolution operator given graph Gm
- Gm : inferred by EINs
- rt and ut : output of reset and update gates at time t
- fr,fu, and fC : GCN filters with different trainable parameters
- Ht refer to temporal hidden states
Based on the temporal hidden states Ht from decoders…
→ ˆYt+1=W⊤Ht+b