Robust Probabilistic Time Series Forecasting (2022)


Contents

  1. Abstract
  2. Related Work
    1. Robust forecasting
    2. DL for TS forecasting
    3. Adversarial Attacks & TS
    4. Certified Adversarial Defenses
    5. Exposure bias
  3. Preliminaries
    1. Probabilistic TS forecasting
    2. Adversarial Attakcs on Probabilistic Autoregressive Forecasting Models
  4. Defining Robustness for Probabilistic TS forecasting
    1. Generalized Input Perturbations
    2. Formal Mathematical Definition of Robustness


0. Abstract

Probabilistic TS forecasting

  • quantify uncertainties
  • but DL forecasting models : prone to input perturbations

\(\rightarrow\) Propose a framework for ROBUST probabilsitic TS forecasting


Step 1. generalize the concept of adversarial input perturbations

Step 2. extend the randomized smoothing technique to attain robust probabilistic forecasters

Step 3. Experiments


1. Related Work

(1) Robust forecasting

deal with..

  • outliers
  • missing data
  • change points


(2) DL for TS forecasting

with extesive TS data… rise of DL

probabilistic approach

  • (1) use NN as backbone & last layer = likelihood function

  • (2) directly generate the quantile forecasts


(3) Adversarial Attacks & TS

Image Classification

  • suceptible to hardly human-perceptible changes…..
  • cause them to completely misclassify the inputs


In TS…

  • mainly focus on TS CLASSIFICATION models
  • attack againts probabilistic FORECASTING models was virst devised by Dang-Nhu (2020), using reparameterization trick


(4) Certified Adversarial Defenses

Adversarial training :

  • most successful defense scheme against attacks

Theoretical performance guarantee has not been established…


Randomized Smotthing

  • more scalable & model-agnostic approach
  • provided practical accuracy on classification!


How about TS forecasting ( with probabilistic output )…?


(5) Exposure bias

Exposure bias

  • autoregressive sequence generation

    \(\rightarrow\) training condition \(\neq\) inference condition


2. Preliminaries

(1) Probabilistic TS forecasting

Notation

  • \(N\) : # of TS
    • \(i\)-th TS =
      • (1) observation \(x_{i, t} \in \mathbb{R}\)
      • (2) input covariates \(z_{i, t} \in \mathbb{R}^{d}\)
  • BACKCAST :
    • (1) \(\boldsymbol{x}=x_{1: T} \in \mathcal{X}=\bigcup_{T=1}^{\infty} \mathbb{R}^{T}\)
    • (2) \(z_{1: T+\tau} \in \mathcal{Z}\)
  • FORECAST : \(x_{T+1: T+\tau} \in \mathcal{Y}=\mathbb{R}^{\tau}\)


Probabilistic Forecaster :

  • \(f: \mathcal{X} \times \mathcal{Z} \rightarrow \mathcal{P}(\mathcal{Y})\).

  • \(\left(Y_{1}, \ldots, Y_{\tau}\right)=f\left(x_{1}, \ldots, x_{T}, z_{1} \ldots, z_{T+\tau}\right)\).

    • \(\left(Y_{1}, \ldots, Y_{\tau}\right)\) : r.v, associated with future targets \(\left(x_{T+1}, \ldots, x_{T+\tau}\right)\)


For simplicity, omit covariates \(z_{1: T+\tau}\)

\(\rightarrow\) \(\mathbf{Y}=\left(Y_{1}, \ldots, Y_{\tau}\right)=f(\boldsymbol{x})\)

  • \(x_{T+1}, \ldots, x_{T+\tau}\) : Y_true
  • \(\boldsymbol{y}=\left(y_{1}, \ldots, y_{\tau}\right)=\left(\hat{x}_{T+1}, \ldots, \hat{x}_{T+\tau}\right)\) : Y_pred
    • \(\left(\hat{x}_{T+1}, \ldots, \hat{x}_{T+\tau}\right) \sim f(\boldsymbol{x})\).


(2) Adversarial Attakcs on Probabilistic Autoregressive Forecasting Models

Adversarial Perturbation ( = attack ) : \(\boldsymbol{\delta}\)

Adversarial target values : \(\mathbf{t}_{\mathrm{adv}} \in \mathbb{R}^{m}\)

  • chosen to be significantly different from \(\mathbb{E}_{f(\boldsymbol{y} \mid \boldsymbol{x})}\left[\chi\left(Y_{1}, \ldots, Y_{\tau}\right)\right]\)

Statistic \(\chi: \mathbb{R}^{\tau} \rightarrow \mathbb{R}^{m}\)

\(\rightarrow\) \(\underset{\delta: \mid \mid \boldsymbol{\delta} \mid \mid \leqslant \eta}{\operatorname{argmin}} \mid \mid \mathbb{E}_{f(\boldsymbol{y} \mid \boldsymbol{x}+\boldsymbol{\delta})}\left[\chi\left(Y_{1}, \ldots, Y_{\tau}\right)\right]-\mathbf{t}_{\mathrm{adv}} \mid \mid _{2}^{2}\)


Focus on attacking subsets of prediction outputs

  • \(\chi_{H}\left(Y_{1}, \ldots, Y_{\tau}\right)=\left(Y_{h_{1}}, \ldots, Y_{h_{m}}\right)\).
    • \(H\) : subset of prediction indices


Adversary searchs for a minimal norm perturbation \(\boldsymbol{x}^{\prime}=\boldsymbol{x}+\boldsymbol{\delta}\),

  • for which the subset of perturbed forecasts is SIGNIFICANTLY DIFFERENT from the originla forecasts


Constrained Optimization \(\rightarrow\) Regularized Optimiaztion

  • \(\min _{\boldsymbol{\delta}} L(\boldsymbol{\delta}):= \mid \mid \boldsymbol{\delta} \mid \mid ^{2}+\lambda \cdot \mid \mid \mathbb{E}_{\boldsymbol{f}(\boldsymbol{y} \mid \boldsymbol{x}+\boldsymbol{\delta})}\left[Y_{H}\right]-\boldsymbol{t}_{\mathrm{adv}} \mid \mid _{2}^{2}\).
    • compute via reparameterization trick


3. Defining Robustness for Probabilistic TS forecasting

Adversarial attacks :

  • proposed only in terms of additive input
  • there can be more distinct types of perturbation

\(\rightarrow\) generalize the notion of adversarial input perturbations


(1) Generalized Input Perturbations

Notation :

  • Input Perturbation : \(T_{\mathcal{X}}: \mathcal{X} \rightarrow \mathcal{X}\)
  • Output Transformation : \(T_{\mathcal{Y}}: \mathcal{Y} \rightarrow \mathcal{Y}\)

  • forecast output ( under input perturbation ) : \(f\left(T_{\mathcal{X}}(\boldsymbol{x})\right)\)
  • Original forecast output ( under output transformation ) : \(\left(T_{\mathcal{Y}}\right)_{\#} f(\boldsymbol{x})\)


Goal : \(f \circ T_{\mathcal{X}} \approx T_{\mathcal{Y}} \circ f\)


2 example classes of perturbations

  • (1) additive adversarial attacks
  • (2) time shift with new noisy observations


(a) Additive Adversarial Perturbation

Deceives the forecaster to deviate from original forecasts on the subset \(H\)

\(T_{\mathcal{X}}(\boldsymbol{x})=\boldsymbol{x}+\boldsymbol{\delta}^{\star}(\boldsymbol{x})\).

  • where \(\boldsymbol{\delta}^{\star}(\boldsymbol{x})=\underset{ \mid \mid \boldsymbol{\delta} \mid \mid \leqslant \eta}{\operatorname{argmax}} \mid \mid \mathbb{E}_{f(\boldsymbol{y} \mid \boldsymbol{x}+\boldsymbol{\delta})}\left[\mathbf{Y}_{H}\right]-\mathbb{E}_{f(\boldsymbol{y} \mid \boldsymbol{x})}\left[\mathbf{Y}_{H}\right] \mid \mid ^{2}\)


\(f \circ T_{\mathcal{X}} \approx T_{\mathcal{Y}} \circ f\) reduces to…

  • \(f\left(\boldsymbol{x}+\boldsymbol{\delta}^{\star}(\boldsymbol{x})\right) \approx f(\boldsymbol{x})\).


want our forecaster to be insensitive to perturbation!


(b) Time Shift with New Noisy Observation

Notation

  • input TS : \(\boldsymbol{x}=\left(x_{1}, \ldots, x_{T}\right)\)
  • \(k \ll \tau\) new observations : \(\left\{\tilde{x}_{T+1}, \ldots, \tilde{x}_{T+k}\right\}\)


Want (a) & (b) to be consistent

  • (a) \(f(\boldsymbol{x})=\left(Y_{1}, Y_{2}, \ldots, Y_{k+1}, Y_{k+2}, \ldots\right)\).
  • (b) \(f\left(\boldsymbol{x} ; \tilde{x}_{T+1}, \ldots, \tilde{x}_{T+k}\right)=\left(Y_{k+1}^{\prime}, Y_{k+2}^{\prime}, \ldots\right)\).


That means, the transformation is..

  • (1) \(T_{\mathcal{X}}(\boldsymbol{x})=\left(\boldsymbol{x} ; \tilde{x}_{T+1}, \ldots, \tilde{x}_{T+k}\right)\).
  • (2) \(T_{\mathcal{Y}}\left(y_{1}, y_{2}, \ldots, y_{k+1}, y_{k+2}, \ldots\right)=\left(y_{k+1}, y_{k+2}, \ldots\right)\).


\(f \circ T_{\mathcal{X}} \approx T_{\mathcal{Y}} \circ f\) reduces to…

  • \(Y_{k+1} \approx Y_{k+1}^{\prime}, Y_{k+2} \approx Y_{k+2}^{\prime}, \ldots\).


example) (de-)amplified relative to the ground truth

  • \(\tilde{x}_{T+1}:=(1+\rho) x_{T+1}\).,

    where adversarial paramter \(\rho > -1\)


(2) Formal Mathematical Definition of Robustness

pass

Tags:

Categories:

Updated: