Robust Probabilistic Time Series Forecasting (2022)
Contents
- Abstract
- Related Work
- Robust forecasting
- DL for TS forecasting
- Adversarial Attacks & TS
- Certified Adversarial Defenses
- Exposure bias
- Preliminaries
- Probabilistic TS forecasting
- Adversarial Attakcs on Probabilistic Autoregressive Forecasting Models
- Defining Robustness for Probabilistic TS forecasting
- Generalized Input Perturbations
- Formal Mathematical Definition of Robustness
0. Abstract
Probabilistic TS forecasting
- quantify uncertainties
- but DL forecasting models : prone to input perturbations
\(\rightarrow\) Propose a framework for ROBUST probabilsitic TS forecasting
Step 1. generalize the concept of adversarial input perturbations
Step 2. extend the randomized smoothing technique to attain robust probabilistic forecasters
Step 3. Experiments
1. Related Work
(1) Robust forecasting
deal with..
- outliers
- missing data
- change points
(2) DL for TS forecasting
with extesive TS data… rise of DL
probabilistic approach
-
(1) use NN as backbone & last layer = likelihood function
-
(2) directly generate the quantile forecasts
(3) Adversarial Attacks & TS
Image Classification
- suceptible to hardly human-perceptible changes…..
- cause them to completely misclassify the inputs
In TS…
- mainly focus on TS CLASSIFICATION models
- attack againts probabilistic FORECASTING models was virst devised by Dang-Nhu (2020), using reparameterization trick
(4) Certified Adversarial Defenses
Adversarial training :
- most successful defense scheme against attacks
Theoretical performance guarantee has not been established…
Randomized Smotthing
- more scalable & model-agnostic approach
- provided practical accuracy on classification!
How about TS forecasting ( with probabilistic output )…?
(5) Exposure bias
Exposure bias
-
autoregressive sequence generation
\(\rightarrow\) training condition \(\neq\) inference condition
2. Preliminaries
(1) Probabilistic TS forecasting
Notation
- \(N\) : # of TS
- \(i\)-th TS =
- (1) observation \(x_{i, t} \in \mathbb{R}\)
- (2) input covariates \(z_{i, t} \in \mathbb{R}^{d}\)
- \(i\)-th TS =
- BACKCAST :
- (1) \(\boldsymbol{x}=x_{1: T} \in \mathcal{X}=\bigcup_{T=1}^{\infty} \mathbb{R}^{T}\)
- (2) \(z_{1: T+\tau} \in \mathcal{Z}\)
- FORECAST : \(x_{T+1: T+\tau} \in \mathcal{Y}=\mathbb{R}^{\tau}\)
Probabilistic Forecaster :
-
\(f: \mathcal{X} \times \mathcal{Z} \rightarrow \mathcal{P}(\mathcal{Y})\).
-
\(\left(Y_{1}, \ldots, Y_{\tau}\right)=f\left(x_{1}, \ldots, x_{T}, z_{1} \ldots, z_{T+\tau}\right)\).
- \(\left(Y_{1}, \ldots, Y_{\tau}\right)\) : r.v, associated with future targets \(\left(x_{T+1}, \ldots, x_{T+\tau}\right)\)
For simplicity, omit covariates \(z_{1: T+\tau}\)
\(\rightarrow\) \(\mathbf{Y}=\left(Y_{1}, \ldots, Y_{\tau}\right)=f(\boldsymbol{x})\)
- \(x_{T+1}, \ldots, x_{T+\tau}\) : Y_true
- \(\boldsymbol{y}=\left(y_{1}, \ldots, y_{\tau}\right)=\left(\hat{x}_{T+1}, \ldots, \hat{x}_{T+\tau}\right)\) : Y_pred
- \(\left(\hat{x}_{T+1}, \ldots, \hat{x}_{T+\tau}\right) \sim f(\boldsymbol{x})\).
(2) Adversarial Attakcs on Probabilistic Autoregressive Forecasting Models
Adversarial Perturbation ( = attack ) : \(\boldsymbol{\delta}\)
Adversarial target values : \(\mathbf{t}_{\mathrm{adv}} \in \mathbb{R}^{m}\)
- chosen to be significantly different from \(\mathbb{E}_{f(\boldsymbol{y} \mid \boldsymbol{x})}\left[\chi\left(Y_{1}, \ldots, Y_{\tau}\right)\right]\)
Statistic \(\chi: \mathbb{R}^{\tau} \rightarrow \mathbb{R}^{m}\)
\(\rightarrow\) \(\underset{\delta: \mid \mid \boldsymbol{\delta} \mid \mid \leqslant \eta}{\operatorname{argmin}} \mid \mid \mathbb{E}_{f(\boldsymbol{y} \mid \boldsymbol{x}+\boldsymbol{\delta})}\left[\chi\left(Y_{1}, \ldots, Y_{\tau}\right)\right]-\mathbf{t}_{\mathrm{adv}} \mid \mid _{2}^{2}\)
Focus on attacking subsets of prediction outputs
- \(\chi_{H}\left(Y_{1}, \ldots, Y_{\tau}\right)=\left(Y_{h_{1}}, \ldots, Y_{h_{m}}\right)\).
- \(H\) : subset of prediction indices
Adversary searchs for a minimal norm perturbation \(\boldsymbol{x}^{\prime}=\boldsymbol{x}+\boldsymbol{\delta}\),
- for which the subset of perturbed forecasts is SIGNIFICANTLY DIFFERENT from the originla forecasts
Constrained Optimization \(\rightarrow\) Regularized Optimiaztion
- \(\min _{\boldsymbol{\delta}} L(\boldsymbol{\delta}):= \mid \mid \boldsymbol{\delta} \mid \mid ^{2}+\lambda \cdot \mid \mid \mathbb{E}_{\boldsymbol{f}(\boldsymbol{y} \mid \boldsymbol{x}+\boldsymbol{\delta})}\left[Y_{H}\right]-\boldsymbol{t}_{\mathrm{adv}} \mid \mid _{2}^{2}\).
- compute via reparameterization trick
3. Defining Robustness for Probabilistic TS forecasting
Adversarial attacks :
- proposed only in terms of additive input
- there can be more distinct types of perturbation
\(\rightarrow\) generalize the notion of adversarial input perturbations
(1) Generalized Input Perturbations
Notation :
- Input Perturbation : \(T_{\mathcal{X}}: \mathcal{X} \rightarrow \mathcal{X}\)
-
Output Transformation : \(T_{\mathcal{Y}}: \mathcal{Y} \rightarrow \mathcal{Y}\)
- forecast output ( under input perturbation ) : \(f\left(T_{\mathcal{X}}(\boldsymbol{x})\right)\)
- Original forecast output ( under output transformation ) : \(\left(T_{\mathcal{Y}}\right)_{\#} f(\boldsymbol{x})\)
Goal : \(f \circ T_{\mathcal{X}} \approx T_{\mathcal{Y}} \circ f\)
2 example classes of perturbations
- (1) additive adversarial attacks
- (2) time shift with new noisy observations
(a) Additive Adversarial Perturbation
Deceives the forecaster to deviate from original forecasts on the subset \(H\)
\(T_{\mathcal{X}}(\boldsymbol{x})=\boldsymbol{x}+\boldsymbol{\delta}^{\star}(\boldsymbol{x})\).
- where \(\boldsymbol{\delta}^{\star}(\boldsymbol{x})=\underset{ \mid \mid \boldsymbol{\delta} \mid \mid \leqslant \eta}{\operatorname{argmax}} \mid \mid \mathbb{E}_{f(\boldsymbol{y} \mid \boldsymbol{x}+\boldsymbol{\delta})}\left[\mathbf{Y}_{H}\right]-\mathbb{E}_{f(\boldsymbol{y} \mid \boldsymbol{x})}\left[\mathbf{Y}_{H}\right] \mid \mid ^{2}\)
\(f \circ T_{\mathcal{X}} \approx T_{\mathcal{Y}} \circ f\) reduces to…
- \(f\left(\boldsymbol{x}+\boldsymbol{\delta}^{\star}(\boldsymbol{x})\right) \approx f(\boldsymbol{x})\).
want our forecaster to be insensitive to perturbation!
(b) Time Shift with New Noisy Observation
Notation
- input TS : \(\boldsymbol{x}=\left(x_{1}, \ldots, x_{T}\right)\)
- \(k \ll \tau\) new observations : \(\left\{\tilde{x}_{T+1}, \ldots, \tilde{x}_{T+k}\right\}\)
Want (a) & (b) to be consistent
- (a) \(f(\boldsymbol{x})=\left(Y_{1}, Y_{2}, \ldots, Y_{k+1}, Y_{k+2}, \ldots\right)\).
- (b) \(f\left(\boldsymbol{x} ; \tilde{x}_{T+1}, \ldots, \tilde{x}_{T+k}\right)=\left(Y_{k+1}^{\prime}, Y_{k+2}^{\prime}, \ldots\right)\).
That means, the transformation is..
- (1) \(T_{\mathcal{X}}(\boldsymbol{x})=\left(\boldsymbol{x} ; \tilde{x}_{T+1}, \ldots, \tilde{x}_{T+k}\right)\).
- (2) \(T_{\mathcal{Y}}\left(y_{1}, y_{2}, \ldots, y_{k+1}, y_{k+2}, \ldots\right)=\left(y_{k+1}, y_{k+2}, \ldots\right)\).
\(f \circ T_{\mathcal{X}} \approx T_{\mathcal{Y}} \circ f\) reduces to…
- \(Y_{k+1} \approx Y_{k+1}^{\prime}, Y_{k+2} \approx Y_{k+2}^{\prime}, \ldots\).
example) (de-)amplified relative to the ground truth
-
\(\tilde{x}_{T+1}:=(1+\rho) x_{T+1}\).,
where adversarial paramter \(\rho > -1\)
(2) Formal Mathematical Definition of Robustness
pass