Probabilistic Time Series Forecasting with Implicit Quantile Networks (2021)
Contents
- Abstract
- Introduction
- Background
    - Quantile Regression
- CRPS
 
- Forecasting with IQN
0. Abstract
propose general method for probabilistic time-series forcasting
combine (1) & (2)
- 
    (1) autoregressive RNN \(\rightarrow\) to model “temporal dynamics” 
- 
    (2) Implicit Quantile Networks \(\rightarrow\) to learn a “large class of distn” over a time-series target 
This method is favorable in terms of…
- point-wise prediction accuracy
- estimating the underlying temporal distn
1. Introduction
[ Traditional ]
- 
    univariate point forecast 
- 
    require to learn “one model per individual t.s” ( not scale for large data ) 
[ Deep Learning ]
- RNN, LSTM…
- main advantages
    - end-to-end
- ease of incorporating exogenous covariates
- automatic feature extractions
 
Desirable to make a probabilistic output
\(\rightarrow\) provide uncertainty bounds!
- method 1) model the data distn explicitly
- method 2) Bayesian NN
Propose IQN-RNN
- 
    “DL-based univariate t.s method, that learns an implicit distn over outputs” 
- 
    does not make any assumption on the underlying distn 
- 
    probabilistic output is generated by IQN & trained by minimizing CRPS (=Continuous Ranked Probability Score ) 
Contributions
- model data distn using IQNs
- model t.s via autoregressive RNN
2. Background
(1) Quantile Regression
Quantile function corresponding to c.d.f \(F: \mathbb{R} \rightarrow[0,1]\)
- \(Q(\tau)=\inf \{x \in \mathbb{R}: \tau \leq F(x)\}\).
- For continuous and strictly monotonic c.d.f. \(Q=F^{-1}\)
Minimize quantile loss :
- 
    \(L_{\tau}(y, \hat{y})=\tau(y-\hat{y})_{+}+(1-\tau)(\hat{y}-y)_{+},\). ( where ()\(_{+}\) is ReLU ) 

(2) CRPS
Continuous Ranked Probability Score
- 
    described by a c.d.f. \(F\) given the observation \(y\) : \(\operatorname{CRPS}(F, y)=\int_{-\infty}^{\infty}(F(x)-\mathbb{1}\{y \leq x\})^{2} d x\). 
- 
    can be rewritten as… (using quantile loss ) : \(\operatorname{CRPS}(F, y)=2 \int_{0}^{1} L_{\tau}(y, Q(\tau)) d \tau\). 
3. Forecasting with IQN
in univariate time series setting…
- forecast \(\left(y_{T-h}, y_{T-h+1}, \ldots, y_{T}\right)\)
- with \(y=\left(y_{0}, y_{1}, \ldots, y_{T}\right)\)
Notation
- \(\tau_{0}=\mathbb{P}\left[Y_{0} \leq y_{0}\right]\).
- \(\tau_{t}=\mathbb{P}[Y_{t} \leq y_{t} \mid Y_{0} \leq \left.y_{0}, \ldots, Y_{t-1} \leq y_{t-1}\right]\).
\(\rightarrow\) rewrite \(y\) as …
- \(\left(F_{Y_{0}}^{-1}\left(\tau_{0}\right), F_{Y_{1} \mid Y_{0}}^{-1}\left(\tau_{1}\right), \ldots, F_{Y_{T} \mid Y_{0}, \ldots, Y_{T}}^{-1}\left(\tau_{T}\right)\right)\).
Probabilistic Time Series Forecasting
- 
    unique function \(g\) can represent distn of all \(Y_t\), given \(X_t\) & previous observation \(y_0,...y_{t-1}\) 
- 
    IQN 사용 시, 위의 두 정보 외에도 \(\tau_t\)를 사용! - mapping from \(\tau_{t} \sim \mathrm{U}([0,1])\) to \(y_{t}\)
 
IQN-RNN
- learn \(y_{t}=g\left(X_{t}, \tau_{t}, y_{0}, y_{1}, \ldots, y_{t-1}\right)\) ….. for \(t \in [T-h,T]\)
- 
    can be written as \(q \circ\left[\psi_{t} \odot(1+\phi)\right]\) 
- 
    Notation - 
        \(\odot\) : Hadamard element-wise product 
- 
        \(X_{t}\) : (typically) time-dependent features…. known for all time steps 
- 
        \(\psi_{t}\) : state of an RNN ( takes concat(\(X_{t}, y_{t-1},\psi_{t-1})\) as input ) 
- 
        \(\phi\) : embeds \(\tau_{t}\) ( \(\phi\left(\tau_{t}\right)=\operatorname{ReLU}\left(\sum_{i=0}^{n-1} \cos \left(\pi i \tau_{t}\right) w_{i}+b_{i}\right)\) ) 
- 
        \(q\) : additional generator layer 
 
- 
        
(a) Training
- 
    step 1) quantiles are sampled ( for each observation / at each time step ) 
- 
    step 2) passed to both network & quantile loss 

(b) Inference
- 
    step 1) quantiles are sampled ( for each observation / at each time step ) 
- 
    step 2) passed to both network 
