An Experimental Review on DL Architectures for TS forecasting (2021)
Contents
- Abstract
- Introduction
- DL architecture for TSF
- MLP
- RNN
- CNN
0. Abstract
face 2 main challenges
- 1) comprehensive review of latest works using DL
- 2) experimental study comparing performance
1. Introduction
- thorough review of existing DL techniques for TSF
- provide a experimental comparison between architectures
- 7 types of DL models
- 1) MLP
- 2) Elman Recurrent
- 3) LSTM
- 4) Echo State
- 5) GRU
- 6) CNN
- 7) TCN
Contributions
- 1) updated exhaustive review on the most relevant DL techniques
- 2) comparative analysis that evaluates the performance
- 3) open-source DL framework for TSF
2. DL architecture for TSF
Forecasting Problem
forecasting = fitting a model to predict future values, considering ppast values (=lag)
[ Notation ]
- \(X=\left\{x_{1}, x_{2}, \ldots, x_{T}\right\}\) : historical data
-
\(\hat{X}=\left\{\hat{x}_{1}, \hat{x}_{2}, \ldots, \hat{x}_{T}\right\}\) : vector of predicted values
-
\(H\) : desired forecasting horizon
-
task : predict \(\left\{x_{T+1}, \ldots, x_{T+H}\right\} .\)
-
goal : minimize prediction error,
\(E=\sum_{i=1}^{h=H} \mid x_{T+i}-\hat{x}_{T+i} \mid\).
2 types
- 1) univariate
- 2) multivariate
\(\rightarrow\) depending on the number of variables at each time step
( this paper only deals with univariate time series )
Review DL networks below
(1) Fully Connected NN
- MLP
(2) Recurrent NN
- ERNN ( Elman RNN )
- LSTM
- ESN ( Echo State Network )
- GRU
(3) CNN
- CNN
- TCN
(1) MLP
-
most basic type
-
early 90s, pose ANNs as an alternative of traditional statistical models
-
universal function approximations & flexibility to adapt to data without prior assumptions
-
conclusion
-
proved the importance of preprocessing steps
( simple model + carefully selected input > naive MLP )
-
unable to capture temporal order of t.s
( \(\because\) treat each input independently )
-
-
RNNs, CNNs are better!
(2) RNN
- take temporal dependency into consideration
a) ERNN (Elman RNN)
- tackle the problem of dealing with time patterns in data
- change FC layer \(\rightarrow\) Recurrent layer
- TBPTT (Truncated Backpropagation Through Time)
- problem : exploding/vanishing gradient
b) LSTM
-
solution of ERNN’s problems
-
able to model temporal dependencies in larger horizons,
without forgetting the short-term patterns
-
extracting meaningful info for time series : LSTM > MLP,ERNN
c) ESNs ( Echo State Networks )
simplifies the training procedure of RNNs ( ERNN, LSTM )
Comparison
-
[ previous RNNs ] find the best values for ALL neurons
-
[ ESNs ] tunes just the weight from the output neurons
( make the training problem a simple LR task )
ESNs
- NN with random RNN called reservoir as the hidden layer
- spares interconnectivity ( around 1% )
- non-trainable resrvoir neurons make this network very time efficient
d) GRUs
- solve exploding/vanishing gradient problems of ERNNs
- just a simplification of LSTM
- better than any other recurrent networks
(3) CNN
- automatically extract features from high-dim raw data with grid topology
- distortion invariance
- features are extracted, regardless of where they are in the data
- makes CNNs suitable for dealing 1D data
- 3 principles
- 1) local connectivity
- 2) shared weights ( \(\rightarrow\) smaller parameters )
- 3) translation equivariance
- also be used WITH recurrent blocks
a) TCN (Temporal Convolutional Network)
-
inspired by Wavenet
-
1) convolutions are CAUSAL to prevent information loss
-
2) can process a sequence of ANY length & map it to an output of SAME length
-
to learn long-term dependencies..
\(\rightarrow\) use DILATED causal convolutions ( increase receptive field )
-
employs residual connections