REST: Reciprocal Framework for Spatiotemporal-coupled PredictionsPermalink


ContentsPermalink

  1. Abstract
  2. Introduction


0. AbstractPermalink

proposes to jointly …

  • (1) mine the “spatial” dependencies
  • (2) model “temporal” patterns


propose Reciprocal SpatioTemporal (REST)

  • introduces Edge Inference Networks (EINs) to couple with GCNs.

    • generate “multi-modal” directed weighted graphs to serve GCNs.

    • GCNs utilize these graphs ( = spatial dependencies ) to make predictions

      & introduce feedback to optimize EINs

  • design a phased heuristic approach

    • effectively stabilizes training procedure and prevents early-stop


1. IntroductionPermalink

Problem : Domain knowledge is required to construct an accurate graph

not only COSTLY, but also SUB-OPTIMAL !


3 Challenges in “spatio temporal-coupled” predictions

  • (1) Data property aspect

    • 1-1) lacks existing edge labels

    • 1-2) the information of TS may be limited and noisy

      making it difficult to find the distance among TS & cluster them as a graph

  • (2) Learning aspect

    • without effective inductive bias, a model is easy to overfit the noises & the learning procedure may become unstable
  • (3) Practicality aspect

    • mining potential links between two arbitrary TS pairs also brings significant computational burden

      ( as the possible links are in n2 order )


Existing research

  • (1) based on “predefined” graph structure
  • (2) “infer potential links” with strong domain knowledge


propose a novel Reciprocal Spatiotemporal (REST) framework

to address 3 challenges


figure2


RESTPermalink

consists of 2 integral parts

  • (1) Edge Inference Networks (EINs)
    • for mining “spatial” dependencies among TS
  • (2) GCNs ( e.g., DCRNNs )
    • for making “spatio temporal” prediction


Spatial Temporal

  • Spatial dependencies inferred by EINs promote GCNs to make more accurate prediction

Temporal Spatial

  • GCNs help EINs learn better distance measurement


How does EIN overcome 3 challenges?

  • (1) Data property challenge
    • EINs project TS from TIME domain FREQUENCY domain
    • fertilize the original TS & quantify the multi-modal spatial dependencies among them
  • (2) Practicality challenge
    • ( before each training epoch )
      • EINs firstly sample a fixed number of possible TS neighbors for all the central TS vertices of interest
    • ( during the training procedure )
      • EINs try to learn a more accurate “distance function”, with the help of the GCN part
    • theoretically explore all possible linkages from the whole dataset,

      while remain the sparsity of graph as knn2 for training,

      • k : predefined number of neighbor candidates and k<<n
  • (3) Learning challenge
    • propose a phased heuristic approach as a warm-up to drive the REST framework.


2. PreliminariesPermalink

(1) Observation Records ( length = p )Permalink

UTS : xi

  • broken down to xi={x0i,x1i,,xpi} for time series i in the past p time steps.

MTS : X={x1;x2;;xn}


(2) Prediction Trend ( length = q )Permalink

UTS : ˆyi={ˆyp+1i,ˆyp+2i,,ˆyp+qi},

  • ˆyti(t(p,p+q])) : prediction value of TS i in the next t-th time step.

MTS : ˆY={ˆy1;ˆy2;;ˆyn}


(3) Spatial DependenciesPermalink

graph G=(V,E,W)

  • V∣=N.

  • within a mini-batch…

    • central vertices : vertices of interest
    • adjacent vertices : other vertices that can reach the central vertices within K steps
  • consider multi-modal, weighted and directed spatial dependencies

    ( W={wm,m=0,1,,M} )

    • weight wmijW refers to spatial dependency from TS i to j under modality m.


(4) Spatial Temporal Coupled PredictionPermalink

Given N TS …. jointly learn f() and g()

[X1,X2,,Xp,G0]f(X)g(G)[ˆYp+1,,ˆYp+q,G1,,GM].


3. Model ArchitecturePermalink

(1) Spatial InferencePermalink

Edge Inference Networks (EINs)Permalink

  • [ Goal ] discover and quantify spatial dependencies among TS vertices
  • [ Problem ] information of TS observations may be limited & noisy
  • [ Solution ] project the observations from TS to FREQUENCY
    • Use Mel-Frequency Cepstrum Coefficients (MFCCs)
      • widely used in audio compressing and speech recognition
    • use this frequency warping to representTS


How to calculate MFCCs ?Permalink

X[k]=fft(x[n])Y[c]=log(fc+1k=fc1X[k]2Bc[k])cx[n]=1CCc=1Y[c]cos(πn(c12)C).

  • x[n] : TS observations
  • fft() : fast Fourier Transform
  • Bc[k] : filter banks
  • C : # of MFCCs to retain
  • cx[n](=cx) : MFCCs of TS x


Spatial Inference with EINsPermalink

  • [ Input ] MFCCs

  • [ Inference ] estimate the spatial dependencies between two TS, using sigmoid


aij=σ(Wconcat([ci,cicj])+b)
  • aijRM : inferred asymmetric distance from TS i to j under M modalities

  • ciRC : namely cx[n], refers to MFCCs of TS i

  • WR2C×M and bRM

    ( = parameters for TS distance inference )

  • ci is concatenated with cicj

    models the directed spatial dependencies


EINs play two important roles : (1) sampling & (2) inferring

During data preparation phase…

  • (1) sampling : EINs go through the entire dataset to select …
    • 1-1) possible adjacent candidates ( = purple vertices )
    • 1-2) central vertices of interest ( = yellow vertex )
  • (2) inferring : infer and quantify their spatial dependencies under M modalities for GCNs


(2) Temporal PredictionPermalink

integrate a GCN-based spatiotemporal prediction model as backends and make predictions

  • ex) DCRNN, Graph WaveNet [28]


In this paper, we will mainly discuss the directed spatial dependencies and consider diffusion convolution on random walk Laplacian.


( considering only one modality … )

(1) random walk Laplacian : Lrw=ID1A

  • Lrw : transition matrix

(2) bidirectional diffusion convolution

  • ZGgθK1k=0(θk,0(D1IA)k+θk,1(D1OA)k)Z.
    • Z : inputs of GCN filter
    • gθ : diffusion convolution filter
      • θRK×2 : trainable parameters
    • A : adjacent matrix
    • Di & DO : input & output degree matrix
  • diffusion convolution can be truncated by a predefined graph convolution depth K, which is empirically not more than 3
  • due to the sparsity of most graph, the complexity of recursively calculating above is O(KE)O(N2).


Consider “Multimodality”

  • an enhanced diffusion convolution is formulated by ..

    hs=ReLU(M1m=0K1k=0ZGmgΘ).

  • hsRN×do : spatial hidden states

    • output of diffusion convolution operators
    • M : to predefined number of modality
    • forward A & backward A in bidirectional random walk are deemed as 1 modality
  • ΘRM×K×di×do : multi-modal, high order GCN params


DCRNNsPermalink

Adopt DCRNNs architecture!

  • to capture temporal dependencies, DCGRUs replace multiplication in GRUs with diffusion convolution

  • DCGRUs are stacked to construct encoders and decoders

rt=σ(frGm[Xt,Ht1]+br)ut=σ(fuGm[Xt,Ht1]+bu)Ct=tanh(fCm[Xt,(rtHt1)]+bC)Ht=utHt1+(1ut)Ct.

  • Xt : observations of all included vertices ( = all colored vertices )
  • Ht1 : to temporal hidden states generated by last DCGRUs
  • Gm : diffusion convolution operator given graph Gm
    • Gm : inferred by EINs
    • rt and ut : output of reset and update gates at time t
    • fr,fu, and fC : GCN filters with different trainable parameters
  • Ht refer to temporal hidden states


Based on the temporal hidden states Ht from decoders…

ˆYt+1=WHt+b


Categories: ,

Updated: