N-Beats : Neural Basis Expansion Analysis for Interpretable Time Series Forecasting (2020)

Abstract
Introduction
Problem Statement
N-Beats
1. Basic Block
2. Doubly Residual Stacking
3. Interpretability
4. Ensemble

0. Abstract

focus on solving UNIVARIATE time series POINT forecasting problem using DL

propose architecture based on…

1) backward & forward residual links
2) very deep stack of FC layers

Desired Properties

1) interpretable
2) applicable without modification to a wide array of target domains
3) fast to train

1. Introduction

Contributions

Deep Neural Architecture
Interpretable DL for time series

2. Problem Statement

Problem

univariate point forecasting problem

Notation

length- \(H\) forecast horizon
length- \(T\) observed series history \(\left[y_{1}, \ldots, y_{T}\right] \in \mathbb{R}^{T}\)
lookback window of length \(t \leq T\) ( ends with \(y_T\) )
- \(\mathbf{x} \in \mathbb{R}^{t}=\left[y_{T-t+1}, \ldots, y_{T}\right]\).
\(\widehat{\mathbf{y}}\) : forecast of \(\mathbf{y}\)

Task

predict \(\mathbf{y} \in \mathbb{R}^{H}=\left[y_{T+1}, y_{T+2}, \ldots, y_{T+H}\right] .\)

Forecasting performance

\[\begin{aligned} \text { SMAPE } &=\frac{200}{H} \sum_{i=1}^{H} \frac{ \mid y_{T+i}-\widehat{y}_{T+i} \mid }{ \mid y_{T+i} \mid + \mid \widehat{y}_{T+i} \mid }, \quad \text { MAPE }=\frac{100}{H} \sum_{i=1}^{H} \frac{ \mid y_{T+i}-\widehat{y}_{T+i} \mid }{ \mid y_{T+i} \mid } \\ \operatorname{MASE} &=\frac{1}{H} \sum_{i=1}^{H} \frac{ \mid y_{T+i}-\widehat{y}_{T+i} \mid }{\frac{1}{T+H-m} \sum_{j=m+1}^{T+H} \mid y_{j}-y_{j-m} \mid }, \quad \mathrm{OWA}=\frac{1}{2}\left[\frac{\mathrm{SMAPE}}{\mathrm{SMAPE}_{\text {Naïve2 }}}+\frac{\mathrm{MASE}}{\text { MASE Naïve2 }}\right] \end{aligned}\]

\(m\) : periodicity of the data

3. N-Beats

3 key principles

base architecture should be simple & generic , yet expressive(deep)
should not rely on time-series specific feature engineering or input scaling
should be extendable towards making its outputs human interpretable

(1) Basic Block

( \(\ell\) th block에 대해 )

Input : \(\mathbf{x}_{\ell}\) ( = overall model input = history lookback window )
Output : \(\widehat{\mathbf{x}}_{\ell}\) and \(\widehat{\mathbf{y}}_{\ell}\)

Block architecture

Consists of 2 parts

**[part 1] **FC Network

produces forward \(\theta_{\ell}^{f}\) and the backward \(\theta_{\ell}^{b}\) predictors of expansion coefficients

[part 2] backward \(g_{\ell}^{b}\) and the forward \(g_{\ell}^{f}\) basis layers

accepts \(\theta_{\ell}^{f}\) and \(\theta_{\ell}^{b}\)
project them on the set of basis function
produce the “backcast \(\widehat{\mathbf{x}}_{\ell}\)” and “forecast \(\widehat{\mathbf{y}}_{\ell}\)”

Step 1)

\(\begin{aligned} \mathbf{h}_{\ell, 1} &=\mathrm{FC}_{\ell, 1}\left(\mathbf{x}_{\ell}\right), \quad \mathbf{h}_{\ell, 2}=\mathrm{FC}_{\ell, 2}\left(\mathbf{h}_{\ell, 1}\right), \quad \mathbf{h}_{\ell, 3}=\mathrm{FC}_{\ell, 3}\left(\mathbf{h}_{\ell, 2}\right), \quad \mathbf{h}_{\ell, 4}=\mathrm{FC}_{\ell, 4}\left(\mathbf{h}_{\ell, 3}\right) \\ \theta_{\ell}^{b} &=\operatorname{LINEAR}_{\ell}^{b}\left(\mathbf{h}_{\ell, 4}\right), \quad \theta_{\ell}^{f}=\operatorname{LINEAR}_{\ell}^{f}\left(\mathbf{h}_{\ell, 4}\right) \end{aligned}\).

LINEAR : \(\theta_{\ell}^{f}=\mathbf{W}_{\ell}^{f} \mathbf{h}_{\ell, 4}\)
FC layer : activation function of RELU

Step 2)

\(\widehat{\mathbf{x}}_{\ell}=g_{\ell}^{b}\left(\theta_{\ell}^{b}\right)\) & \(\widehat{\mathbf{y}}_{\ell}=g_{\ell}^{f}\left(\theta_{\ell}^{f}\right)\) 구하기

\[\widehat{\mathbf{y}}_{\ell}=\sum_{i=1}^{\operatorname{dim}\left(\theta_{\ell}^{f}\right)} \theta_{\ell, i}^{f} \mathbf{v}_{i}^{f}, \quad \widehat{\mathbf{x}}_{\ell}=\sum_{i=1}^{\operatorname{dim}\left(\theta_{\ell}^{b}\right)} \theta_{\ell, i}^{b} \mathbf{v}_{i}^{b}\]

\(\mathbf{v}_{i}^{f}\) and \(\mathbf{v}_{i}^{b}\) : forecast and backcast basis vectors

(2) Doubly Residual Stacking

a) Classical residual network

adds the input of the stack of layers to its output, before passing~

b) DenseNet

extends ResNet’s principle by introducing extra connections from the output of each stack to the input of every other stack that follows it

c) (proposed) Hierarchical Doubly Residual Stacking

problem of a) & b)

networks structures that are DIFFICULT TO INTEPRET
propose a novel hierarchical doubly residual topology

Two residual branches

1) running over backcast prediction of each layer
2) running over the forecast branch of each layer

(3) Interpretability

propose 2 configurations, based on selection of \(g_{\ell}^{b}\) and \(g_{\ell}^{f}\)

generic architecture

does not rely on TS-specific knowledge
outputs of block \(l\) :

\(\widehat{\mathbf{y}}_{\ell}=\mathbf{V}_{\ell}^{f} \boldsymbol{\theta}_{\ell}^{f}+\mathbf{b}_{\ell}^{f}, \quad \widehat{\mathbf{x}}_{\ell}=\mathbf{V}_{\ell}^{b} \boldsymbol{\theta}_{\ell}^{b}+\mathbf{b}_{\ell}^{b}\).

a) Trend model

\(\widehat{\mathbf{y}}_{s, \ell}=\sum_{i=0}^{p} \theta_{s, \ell, i}^{f} t^{i}\).

(in matrix form) \(\widehat{\mathbf{y}}_{s, \ell}^{t r}=\mathbf{T} \theta_{s, \ell}^{f}\)
\(\mathbf{T}=\left[\mathbf{1}, \mathbf{t}, \ldots, \mathbf{t}^{p}\right]\) : matrix of powers of \(\mathbf{t}\)

b) Seasonality model

\(\widehat{\mathbf{y}}_{s, \ell}=\sum_{i=0}^{\lfloor H / 2-1\rfloor} \theta_{s, \ell, i}^{f} \cos (2 \pi i t)+\theta_{s, \ell, i+\lfloor H / 2\rfloor}^{f} \sin (2 \pi i t)\).

(in matrix form) \(\widehat{\mathbf{y}}_{s, \ell}^{\text {seas }}=\mathbf{S} \theta_{s, \ell}^{f}\).
\(\mathbf{S}=[\mathbf{1}, \cos (2 \pi \mathbf{t}), \ldots \cos (2 \pi\lfloor H / 2-1\rfloor \mathbf{t})), \sin (2 \pi \mathbf{t}), \ldots, \sin (2 \pi\lfloor H / 2-1\rfloor \mathbf{t}))]\).

(4) Ensemble

more powerful regularization than dropout / L2-norm…

Twitter Facebook LinkedIn

(paper) N-Beats ; Neural Basis Expansion Analysis for Interpretable Time Series Forecasting

Seunghan Lee