Diffusion Convolutional Recurrent Neural Network : Data-driven Traffic Forecasting

Abstract
Introduction
Methodology
1. Traffic Forecasting Problem
2. Spatial Dependency Modeling
3. Temporal Dynamics Modeling

0. Abstract

Traffic forecasting is challenging, due to..

1) complex spatial dependency
2) non-linear temporal dynamics
3) difficulty of long-term forecasting

Introduce DCRNN (Diffusion Convolutional Recurrent Neural Network)

for “traffic forecasting”
incorporates both (1) spatial & (2) temporal dependency in traffic flow

DCRNN captures…

1) “spatial dependency” : using the “bidirectional RW” on graph
2) “temporal dependency” : using the “enc-dec” architecture, with “scheduled sampling”

1. Introduction

Difficulty of traffic forecasting

ex) road 1 & 3 : close in Euclidean space, but very different behaviors

\(\rightarrow\) spatial structure in traffic is non-Euclidean & directional

DCRNN

represent “pair-wise spatial correlations” between traffic sensors, using a “directed graph”
- 1) nodes : sensors
- 2) edge : proximity between sensor pairs
model the dynamics of the traffic flow as a “diffusion process”

\(\rightarrow\) propose the diffusion convolution to capture “spatial dependency”
integrates
- 1) diffusion convolution
- 2) seq2seq
- 3) scheduled sampling

2. Methodology

(1) Traffic Forecasting Problem

Goal : predict the future traffic speed, given traffic flow from \(N\) correlated sensors

Graph ( Sensor Network )

Weighted directed graph : \(\mathcal{G}=(\mathcal{V}, \mathcal{E}, \boldsymbol{W})\)
- \(\mathcal{V}\) : set of nodes ( \(\mid \mathcal{V} \mid=N\) )
- \(\mathcal{E}\) : set of edges
- \(\boldsymbol{W} \in \mathbb{R}^{N \times N}\) : weighted adjacency matrix
  
  ( representing the nodes proximity … e.g., a function of their road network distance )
Traffic flow observed on \(\mathcal{G}\) : \(\boldsymbol{X} \in \mathbb{R}^{N \times P}\)
- \(P\) : # of features ( ex. velocity, volume … )
- \(\boldsymbol{X}^{(t)}\) : graph signal observed at time \(t\)
goal : learn function \(h(\cdot)\), where…
- \(\left[\boldsymbol{X}^{\left(t-T^{\prime}+1\right)}, \cdots, \boldsymbol{X}^{(t)} ; \mathcal{G}\right] \stackrel{h(\cdot)}{\longrightarrow}\left[\boldsymbol{X}^{(t+1)}, \cdots, \boldsymbol{X}^{(t+T)}\right]\).

(2) Spatial Dependency Modeling

model the spatial dependency,

by relating “traffic flow” to a “diffusion process”

Diffusion process

Notation

1) random walk on \(\mathcal{G}\)
2) restart probability \(\alpha \in [0,1]\)
3) state transition matrix \(\boldsymbol{D}_{O}^{-1} \boldsymbol{W}\)
- \(D_{O}=\operatorname{diag}(W 1)\) = out-degree diagonal matrix
- converges to a stationary distribution \(\mathcal{P} \in \mathbb{R}^{N \times N}\)
  - \(\mathcal{P}_{i,:} \in \mathbb{R}^{N}\) ( \(i\) th row ) = likelihood of diffusion from node \(v_{i} \in \mathcal{V}\)
  - can be calculated in closed-form, \(\mathcal{P}=\sum_{k=0}^{\infty} \alpha(1-\alpha)^{k}\left(\boldsymbol{D}_{O}^{-1} \boldsymbol{W}\right)^{k}\)
( in practice, use finite \(K\)-step truncation )