METRO : A Generic GNN Framework for MTS Forecasting (2020, 131)Permalink

ContentsPermalink

  1. Abstract
  2. Introduction
  3. Preliminaries
  4. METRO
    1. Temporal Graph Embedding Unit
    2. Single-scale Graph Update Unit
    3. Cross-scale Graph Fusion Kernel
    4. Predictor and Training Strategy

0. AbstractPermalink

METROPermalink

  • generic framework with “Multi-scale Temporal Graphs NN”

  • models the “dynamic & cross-scale variable correlations” simultaneously

  • represent MTS as “series of temporal graphs”

    can capture both

    • inter-step correlation
    • intra-step correlation

    via message-passing & node embedding


1. IntroductionPermalink

GNN for MTS forecasting

  • capture the interdependencies & dynamic trends over a group of variables

    • nodes = variables
    • edges = interactions of variables
  • can use “even for data w.o explicit graph structures”

    • use hidden graph structures
  • previous works

    • assume that “MTS are static”

      however, “interdependencies between variables can be temporally dynamic”

      • ex) MTGNN, TEGNN, ASTGCN

figure2


Previous GNN works

  • MTGNN & TEGNN

    • learn/compute adjacency matrices of graphs, from the whole input TS
    • hold them for “subsequent temporal modeling”
  • ASTGCN

    • for ‘traffic prediction’

      ( connections between nodes are known in advance )

    • considers “adaptive adjacency matrix” ( attention value )

    • but on relatively large scale, that only cahnges with regard to input series example

  • STDN

    • flow gating mechanism, to learn “dynamic similarity” by time step
    • but, can only be applied on consecutive & grid partitioned regional maps


Multi-scale patterns

  • [ approach 1 ]

    • treat time stamps of MTS as “side information”
    • combine the “embedded time stamps” with MTS’s embedding
  • [ approach 2 ]

    • first, adopt filters ( with different kernel sizes ) to MTS
    • second, sum/concatenate the multi-scale features
  • common

    • can incorporate information “across different scales”

    • but, just simple linear-weight transformation

      ( relation between pairs of scales are ignored )


Problems

  • 1) dependencies between variables are dynamic
  • 2) temporal patterns in MTS occur in multiple scale

propose METRO ( Multi-scalE Temporal gRaph neurla netwOrks )


4 components of METRO

  • 1) TGE (Temporal Graph Embedding)
    • obtains “node embeddings” under multi-scale
  • 2) SGU (Single-scale Graph Update)
    • dynamically leverages graph structures of nodes in each single scale
    • propagates info from a node’s intra/inter-step neighbors
  • 3) CGF (Cross-scale Graph Fusion) kernel
    • walks along the time axis & fuses features across scales
  • 4) predictor
    • decoder ( decodes node embeddings )
    • gives final predictions


2. PreliminariesPermalink

MTS forecastingPermalink

Notation

  • MTS : X={X(1),X(2),,X(t)}
    • at time i : X(i)=[xi,1,..,xi,n]Rn×m,i[1,,t].
      • xi,kRm,k[1,,n].
    • n : number of variables ( TS )
    • m : number of dimension
  • Goal : predict Δ steps ahead
    • output : ˆY(t+Δ)=X(t+Δ)=[xt+Δ,1,..,xt+Δ,n]
    • input : {X(tw+1),,X(t)}


GraphsPermalink

(1) Graph : G=(V,E)

  • static graph
  • vertices & edges are invariant over time


(2) Temporal Graph : G(t)=(V(t),E(t))

  • graph as a function of time
  • 2 types of temporal graphs
    • 1) INTRA-step TG : V(t) is made up of variables within time step t
    • 2) INTER-step TG : V(t) consists variables of time step t and its previous time step t1


(3) Multi-scale Temporal Graph : GS(t)=(VS(t),ES(t))

  • each node in the graph represents an observation of “time-length ws


3. METROPermalink

(1) TGE (Temporal Graph Embedding Unit)Permalink

TGE = “encoder”

  • obtain “embeddings of TEMPORAL features of each variables”

  • example)

    • input : {X(t0),X(t0+w1)}

    • output :

      • temporal feature at time step t,t[t0,,t0+ w1],

        under scale i is..

      • H0si(t)=emb(si,t)=f(X(t),,X(t+wsi1)).

    • notation

      • n : # of variables

      • d : embedding dimension

      • f : embedding function

        • ex) unpadded convolution with multiple filters with size 1×wsi

          ( output of multiple convolution channels are “concatenated” )


(2) SGU (Single-scale Graph Update Unit)Permalink

modeling “hidden relations among variables”

  • by training/updating “adjacency matrices”
  • problem : message aggregation over time, with FIXED structures?


Solution : introduce SGU

  • adaptively learn “INTRA & INTER” step adjacency matrix of variables

  • propose “graph-based temporal message passing”

    • use “historical temporal information” of nodes
  • ex) SGU operates s1, s2, s3 separately

    figure2

  • example : Gs1(:) … jointly considers

    • 1) message passing WITHIN a time step
      • Gs1(ti) & time stamps
    • 2) message passing BETWEEN a time step
      • Gs1(ti) & Gs1(t2)


Process

  • 1) learn messages

    • messages between nodes of 2 adjacent time steps :

      • ml(tj)=msg(Hl(tj1),Hl(tj),Altj1,tj).

        where Altj1,tj=gm(Hl(tj1),Hl(tj)).

    • ex) msg : GCN

    • ex) gm : transfer entropy / node-embedding-based deep layer model

  • 2) aggregate messages

    • ˜ml(t)=agg(ml(tk1),,ml(t1)).
    • ex) simple concatenation / RNN / Transformer …


Update

  • embedding of nodes ( in step t ) are updated, based on..

    • 1) aggregated message
    • 2) memory of previous layer
  • ˆHl+1(t)=upd(˜ml(t),Hl(t),Alt).

    where Alt=gu(˜ml(t),Hl(t))

    • ex) upd : memory update function ( GRUs, LSTMs )
    • ex) gu : graph-based GRU


(3) CGF (Cross-scale Graph Fusion) KernelPermalink

CGF

  • diffuse information “ACROSS SCALES”
  • “scales” = ‘size of respective fields’
  • need to decide “which time steps in which scales” can interact


Standard = “HIGHER scale”

Sampling fine features (time steps) = “LOWER scales”

figure2

  • Zlsi(t)=samp(ˆHl+1s1(:),,ˆHl+1si1(:)).
    • ˆHl+1sj(:) : output updated features of SGU unit
    • Zlsi : selected lower scale features, concatenated in time axis


Information diffusion, between

  • 1) Zlsi(t)
  • 2) ˆHl+1si(t)

using a graph learning function gf

Hl+1s1:si1(:),Hl+1si(t)fuse(Zls(t),ˆHl+1si(t),Al(si,si),t),Al(si,si),t=gf(Zlsi(t),ˆHl+1si(t)),.

  • Hl+1s1:si1(:) : cross-scale fused embedding of sampled lower scale features
  • Hl+1si(t) : standard high scale at time step t
  • ex) fuse : GCNs


figure2


(4) Predictor and Training StrategyPermalink

ˆY(t+Δ)=pre(Hs1:sp(:)).

Tags: ,

Categories: ,

Updated: