An Experimental Review on DL Architectures for TS forecasting (2021)

Contents

  1. Abstract
  2. Introduction
  3. DL architecture for TSF
    1. MLP
    2. RNN
    3. CNN


0. Abstract

face 2 main challenges

  • 1) comprehensive review of latest works using DL
  • 2) experimental study comparing performance


1. Introduction

  • thorough review of existing DL techniques for TSF
  • provide a experimental comparison between architectures
  • 7 types of DL models
    • 1) MLP
    • 2) Elman Recurrent
    • 3) LSTM
    • 4) Echo State
    • 5) GRU
    • 6) CNN
    • 7) TCN


Contributions

  • 1) updated exhaustive review on the most relevant DL techniques
  • 2) comparative analysis that evaluates the performance
  • 3) open-source DL framework for TSF


2. DL architecture for TSF

Forecasting Problem

forecasting = fitting a model to predict future values, considering ppast values (=lag)


[ Notation ]

  • \(X=\left\{x_{1}, x_{2}, \ldots, x_{T}\right\}\) : historical data
  • \(\hat{X}=\left\{\hat{x}_{1}, \hat{x}_{2}, \ldots, \hat{x}_{T}\right\}\) : vector of predicted values

  • \(H\) : desired forecasting horizon

  • task : predict \(\left\{x_{T+1}, \ldots, x_{T+H}\right\} .\)

  • goal : minimize prediction error,

    \(E=\sum_{i=1}^{h=H} \mid x_{T+i}-\hat{x}_{T+i} \mid\).


2 types

  • 1) univariate
  • 2) multivariate

\(\rightarrow\) depending on the number of variables at each time step

( this paper only deals with univariate time series )


Review DL networks below

(1) Fully Connected NN

  • MLP

(2) Recurrent NN

  • ERNN ( Elman RNN )
  • LSTM
  • ESN ( Echo State Network )
  • GRU

(3) CNN

  • CNN
  • TCN


(1) MLP

  • most basic type

  • early 90s, pose ANNs as an alternative of traditional statistical models

  • universal function approximations & flexibility to adapt to data without prior assumptions

  • conclusion

    • proved the importance of preprocessing steps

      ( simple model + carefully selected input > naive MLP )

    • unable to capture temporal order of t.s

      ( \(\because\) treat each input independently )

  • RNNs, CNNs are better!


(2) RNN

  • take temporal dependency into consideration


a) ERNN (Elman RNN)

  • tackle the problem of dealing with time patterns in data
  • change FC layer \(\rightarrow\) Recurrent layer
  • TBPTT (Truncated Backpropagation Through Time)
  • problem : exploding/vanishing gradient


b) LSTM

  • solution of ERNN’s problems

  • able to model temporal dependencies in larger horizons,

    without forgetting the short-term patterns

  • extracting meaningful info for time series : LSTM > MLP,ERNN


c) ESNs ( Echo State Networks )

simplifies the training procedure of RNNs ( ERNN, LSTM )


Comparison

  • [ previous RNNs ] find the best values for ALL neurons

  • [ ESNs ] tunes just the weight from the output neurons

    ( make the training problem a simple LR task )


ESNs

  • NN with random RNN called reservoir as the hidden layer
  • spares interconnectivity ( around 1% )
  • non-trainable resrvoir neurons make this network very time efficient


d) GRUs

  • solve exploding/vanishing gradient problems of ERNNs
  • just a simplification of LSTM
  • better than any other recurrent networks


(3) CNN

  • automatically extract features from high-dim raw data with grid topology
  • distortion invariance
    • features are extracted, regardless of where they are in the data
    • makes CNNs suitable for dealing 1D data
  • 3 principles
    • 1) local connectivity
    • 2) shared weights ( \(\rightarrow\) smaller parameters )
    • 3) translation equivariance
  • also be used WITH recurrent blocks


a) TCN (Temporal Convolutional Network)

  • inspired by Wavenet

  • 1) convolutions are CAUSAL to prevent information loss

  • 2) can process a sequence of ANY length & map it to an output of SAME length

  • to learn long-term dependencies..

    \(\rightarrow\) use DILATED causal convolutions ( increase receptive field )

  • employs residual connections

Tags:

Categories:

Updated: