Probabilistic Time Series Forecasting with Implicit Quantile Networks (2021)

Contents

  1. Abstract
  2. Introduction
  3. Background
    1. Quantile Regression
    2. CRPS
  4. Forecasting with IQN


0. Abstract

propose general method for probabilistic time-series forcasting

combine (1) & (2)

  • (1) autoregressive RNN

    \(\rightarrow\) to model “temporal dynamics”

  • (2) Implicit Quantile Networks

    \(\rightarrow\) to learn a “large class of distn” over a time-series target


This method is favorable in terms of…

  • point-wise prediction accuracy
  • estimating the underlying temporal distn


1. Introduction

[ Traditional ]

  • univariate point forecast

  • require to learn “one model per individual t.s”

    ( not scale for large data )


[ Deep Learning ]

  • RNN, LSTM…
  • main advantages
    • end-to-end
    • ease of incorporating exogenous covariates
    • automatic feature extractions


Desirable to make a probabilistic output

\(\rightarrow\) provide uncertainty bounds!

  • method 1) model the data distn explicitly
  • method 2) Bayesian NN


Propose IQN-RNN

  • DL-based univariate t.s method, that learns an implicit distn over outputs

  • does not make any assumption on the underlying distn

  • probabilistic output is generated by IQN

    & trained by minimizing CRPS (=Continuous Ranked Probability Score )


Contributions

  • model data distn using IQNs
  • model t.s via autoregressive RNN


2. Background

(1) Quantile Regression

Quantile function corresponding to c.d.f \(F: \mathbb{R} \rightarrow[0,1]\)

  • \(Q(\tau)=\inf \{x \in \mathbb{R}: \tau \leq F(x)\}\).
  • For continuous and strictly monotonic c.d.f. \(Q=F^{-1}\)

Minimize quantile loss :

  • \(L_{\tau}(y, \hat{y})=\tau(y-\hat{y})_{+}+(1-\tau)(\hat{y}-y)_{+},\).

    ( where ()\(_{+}\) is ReLU )

figure2


(2) CRPS

Continuous Ranked Probability Score

  • described by a c.d.f. \(F\) given the observation \(y\) :

    \(\operatorname{CRPS}(F, y)=\int_{-\infty}^{\infty}(F(x)-\mathbb{1}\{y \leq x\})^{2} d x\).

  • can be rewritten as… (using quantile loss ) :

    \(\operatorname{CRPS}(F, y)=2 \int_{0}^{1} L_{\tau}(y, Q(\tau)) d \tau\).


3. Forecasting with IQN

in univariate time series setting…

  • forecast \(\left(y_{T-h}, y_{T-h+1}, \ldots, y_{T}\right)\)
  • with \(y=\left(y_{0}, y_{1}, \ldots, y_{T}\right)\)


Notation

  • \(\tau_{0}=\mathbb{P}\left[Y_{0} \leq y_{0}\right]\).
  • \(\tau_{t}=\mathbb{P}[Y_{t} \leq y_{t} \mid Y_{0} \leq \left.y_{0}, \ldots, Y_{t-1} \leq y_{t-1}\right]\).

\(\rightarrow\) rewrite \(y\) as …

  • \(\left(F_{Y_{0}}^{-1}\left(\tau_{0}\right), F_{Y_{1} \mid Y_{0}}^{-1}\left(\tau_{1}\right), \ldots, F_{Y_{T} \mid Y_{0}, \ldots, Y_{T}}^{-1}\left(\tau_{T}\right)\right)\).


Probabilistic Time Series Forecasting

  • unique function \(g\) can represent distn of all \(Y_t\),

    given \(X_t\) & previous observation \(y_0,...y_{t-1}\)

  • IQN 사용 시, 위의 두 정보 외에도 \(\tau_t\)를 사용!

    • mapping from \(\tau_{t} \sim \mathrm{U}([0,1])\) to \(y_{t}\)


IQN-RNN

  • learn \(y_{t}=g\left(X_{t}, \tau_{t}, y_{0}, y_{1}, \ldots, y_{t-1}\right)\) ….. for \(t \in [T-h,T]\)
  • can be written as \(q \circ\left[\psi_{t} \odot(1+\phi)\right]\)

  • Notation

    • \(\odot\) : Hadamard element-wise product

    • \(X_{t}\) : (typically) time-dependent features…. known for all time steps

    • \(\psi_{t}\) : state of an RNN

      ( takes concat(\(X_{t}, y_{t-1},\psi_{t-1})\) as input )

    • \(\phi\) : embeds \(\tau_{t}\)

      ( \(\phi\left(\tau_{t}\right)=\operatorname{ReLU}\left(\sum_{i=0}^{n-1} \cos \left(\pi i \tau_{t}\right) w_{i}+b_{i}\right)\) )

    • \(q\) : additional generator layer


(a) Training

  • step 1) quantiles are sampled

    ( for each observation / at each time step )

  • step 2) passed to both network & quantile loss

figure2


(b) Inference

  • step 1) quantiles are sampled

    ( for each observation / at each time step )

  • step 2) passed to both network

Tags:

Categories:

Updated: