Channel-Awaare Low-Rank Adaptation in Time Series Forecasting

Abstract
Preliminaries

0. Abstract

Channel-aware low-rank adaptation

To balance CI & CD
How? Condition CD models on identity-aware individual components
Plug-in solution

1. Introduction

a) Limitation of existing approaches

Group-aware embedding [18]

Inverted embedding [9] for Transformers

Leading indicator estimation [22]

Channel clustering [1]

$\rightarrow$ Either limited to specific types of backbone model

b) Our solution

Background) Low-rank adaptation [5]

Proposal) Channel-aware low-rank adaptation (C-LoRA)

Trade-off between the two strategies
Provide an alternative in a parameter-efficient way

Parameterize each channel a low-rank factorized adapter to consider individual treatment

Specialized channel adaptation is conditioned on the series information to form an identity-aware embedding

c) Contribution

Plug-in solution
- Adaptable to a wide range of SOTA TS model
- No changes to the existing architecture
Extensive experiments
- Improve the performance of both CD and CI backbones
- Great efficiency, flexibility to transfer across datasets, and can enhance channel identity

2. Methodology

(1) Backbone

General forecasting template

for both the CI and CD models

Step 1) $\overline{\mathbf{X}}=\operatorname{Normalization}(\mathbf{X})$.

ex) ReviN: to address the nonstationarity of TS

Step 2) $\mathbf{z}c^{(0)}=\operatorname{TokenEmbedding}\left(\overline{\mathbf{X}}{;}, c\right)$

$\forall c=1, \ldots, C $.
usually implemented by MLPs to process temporal features

Step 3) (Optional) $\mathbf{Z}^{(\ell+1)}=\operatorname{ChannelMixing}\left(\mathbf{Z}^{(\ell)}\right)$

$\forall \ell=0, \ldots, L$.
optional for CD models by Transformer blocks or MLPs.

Step 4) $\widehat{\mathbf{Y}}=\operatorname{Projection}\left(\mathbf{Z}^{(L+1)}\right)$.

usually implemented by MLPs to process temporal features

(2) C-LoRA

Revisiting the two strategies

a) CI strategy)

Individual models for each channel

Instantiate the TokEnEmbedding with a series of mappings, e.g., different MLPs: $\mathbf{z}c^{(0)}=\operatorname{MLP}_c\left(\overline{\mathbf{X}} ; \theta_c\right), \forall c=1, \ldots, C$.
Hypothesis class of all individuals
- $\mathcal{H}_{\mathrm{CI}}={\operatorname{MLP}_c\left(\cdot ; \theta_c\right) \mid \theta_c \in \Theta, c=1, \ldots, C}$.
$\rightarrow$ However, such a hypothesis class is computationally expensive

( + Pure CI models fail to exploit multivariate correlational structures )

b) CD strategy

Expressive by modeling channel interactions

either explicitly with ChannelMixing or implicitly by optimizing the global loss in Eq. (1).

Limitation

(1) Have difficulty capturing individual channel patterns with a shared encoder $\operatorname{MLP}(\overline{\mathbf{X}} ; \theta)$,
(2) CM operation
- can generate mixed channel identity information
  
  $\rightarrow$ Cause an indistinguishment issue [11]

c) Combine CI + CD

Channel-wise adaptation in a CD model

Model individual channels in a parameter-efficient way

$\rightarrow$ Low-rank adapter

Specialized for each channel $\phi^{(c)} \in$ $\mathbb{R}^{r \times D}$, where $r \ll D$ is the intrinsic rank.

How? Condition on another low-rank matrix

$\widetilde{\phi}^{(c)}=\operatorname{ReLU}\left(\phi^{(c), \mathrm{T}} \mathbf{W}\right) \in \mathbb{R}^{D \times d}$.
- $\mathbf{W} \in \mathbb{R}^{r \times d}$,
- $d$ : adaptation dimension.
- $\widetilde{\phi}^{(c)}$ : channel-specific parameters
  - Needs to be aware of the series information to consider the channel identity.

Result: $\mathbf{z}_{c, \phi}^{(0)}=\mathbf{z}_c^{(0), \mathrm{T}} \widetilde{\phi}^{(c)} \in \mathbb{R}^d$ - where $\mathbf{z}_c^{(0)}=\operatorname{MLP}\left(\overline{\mathbf{X}}_{:, c} ; \theta\right)$ is obtained by a CD model shared by all channels
Aggregate all channel adaptations $\mathbf{Z}_\phi^{(0)}=\left\{\mathbf{z}_{c, \phi}^{(0)}\right\}_{c=1}^C \in \mathbb{R}^{C \times d}$ Incorporate it into the global CD models
Final C-LoRA: $\mathbf{Z}^{(0)}=\left[\operatorname{MLP}(\overline{\mathbf{X}} ; \theta) \| \mathbf{Z}_\phi^{(0)}\right] \in \mathbb{R}^{C \times(D+d)}$.
Summary - Balance between CD and CI models - Efificiently integrates global-local components - Adapt to individual channels with the specialized channel adaptation $\mathbf{z}_{c, \phi}^{(0)}$ &Preserve multivariate interactions by the shared $\operatorname{MLP}\left(\overline{\mathbf{X}}_{i, c} ; \theta\right)$. - Reduced hypothesis class is $\mathcal{H}_{\mathrm{C} \text {-LoRA }}=$ $\left\{\operatorname{MLP}\left(\cdot ; \theta, \phi^{(c)}\right) \mid \theta \in \Theta, \phi^{(c)} \in \mathbb{R}^{r \times D}\right\}$,

Twitter Facebook LinkedIn

Channel-Awaare Low-Rank Adaptation in Time Series Forecasting

Seunghan Lee

Channel-Awaare Low-Rank Adaptation in Time Series Forecasting

Contents

0. Abstract

Channel-aware low-rank adaptation

1. Introduction

a) Limitation of existing approaches

b) Our solution

c) Contribution

2. Methodology

(1) Backbone

(2) C-LoRA

a) CI strategy)

b) CD strategy

c) Combine CI + CD

You May Also Enjoy