TSCMamba: Mamba Meets Multi-View Learning for Time Series Classification

1. Brief Summary

(1) Motivation
(2) Two domains in TS
(3) Proposal
(4) Mamba

(1) Motivation

기존 TSC (TS Classification) 문제점?

(1) Shift equivariance, inversion invariance 같은 핵심 속성을 충분히 활용 X
(2) “Long-range” dependency 처리 미흡
(3) “Spectral–temporal” fusion도 미흡

(2) Two domains in TS

(1) Spectral domain

with CWT (Continuous Wavelet Transform)
Shift-equivariant time–frequency features 확보

(2) Temporal domain

with ROCKET: 다양한 scale의 “local” temporal pattern 추출
with MLP: Sequence-level의 “global” context 추출.
Switch mechanism으로 데이터 특성에 따라 local vs global 중 하나 선택

(3) Proposal

Multi-view learning = a + b + c

a) Spectral features (via CWT )
b) Local temporal features (via ROCKET)
c) Global temporal features (via MLP)

(4) Tango Scanning

Mamba: Efficient & scalable long-range dependency modeling.
- Linear-time complexity로 Transformer보다 효율적.
Inversion invariance
- Time-reversed sequence에서도 stable하게 pattern 학습.
- 이를 위한 Tango Scanning 도입
  
  → Forward + reversed sequence를 하나의 Mamba block에서 처리!

2. Introduction

(1) 기존 방법론의 한계
(2) CWT 기반 접근의 필요성
(3) Local vs. Global temporal features
(4) Multi-View Learning의 필요성
(5) Proposal: Tango Scanning
(6) Main Contribution

(1) 기존 방법론의 한계

계열 1) CNN, RNN
- Time-domain 중심 (O)
- Frequency/Spectral 정보 활용 (X)
계열 2) Transformer
- Quadratic complexity 문제 → Long sequence에 비효율적.
기존 방법이 충분히 고려하지 못한 주요 특징?
- a) Shift Equivariance
- b) Inversion Invariance
- c) Long-range dependency
- d) Time–Frequency representation 융합 부족

(참고) Shift Equivariance

개념: Input TS을 time 축으로 이동(shift)시키면, 출력 feature도 동일하게 shift.
- 즉 “패턴의 절대적 위치”가 아닌 “상대적 위치”가 중요한 경우 필수 속성.
장점: Temporal misalignment에 강함

CNN & Spectral representation의 한계점

[CNN] 구조적으로 shift-equivariant하지만 ….
- Receptive field가 짧음 → long-range dependency 약함.
[DFT/DWT 기반 spectral representation]
- [DFT] shift equivariant X
- [DWT] downsampling 때문에 shift equivariant X
$\rightarrow$ 따라서 spectral domain에서도 shift equivariant한 feature가 필요!

(2) CWT 기반 접근의 필요성

CWT = Time-Frequency localized representation를 얻어냄

장점: (Real-valued mother wavelet을 사용하면) shift equivariance 확보 가능.

단점: global pattern 반영은 부족.

(3) Local vs. Global temporal features

“Local” feature 문제?

CNN/ROCKET: Local receptive field 위주 → global contextual info 부족.

“Global” feature 문제?

MLP: Global dependency 처리 가능하나, local sensitivity 부족.

(4) Multi-View Learning의 필요성

a) 각 feature의 장단점이 상호보완적

Spectral features(CWT): shift-equivariant, time-frequency 정보 확보.
Local temporal features(ROCKET): 다양한 temporal scale 포착.
Global features(MLP): 전체 시퀀스의 global pattern 포착.

b) 제안된 방향

Spectral + Local + Global을 multi-view learning 형태로 결합.
Input-dependent Switch Gate로 Local/Global feature 중 선택.

(5) Proposal: Tango Scanning

기존 Bi-directional Mamba의 한계

일반적인 Bi-directional 구조는 2개의 block 사용 → 비용 증가
Reversal operation이 많아 구조적 redundancy

Tango Scanning = A) + B)

A) Inversion Invariance 도입
- 시계열을 forward/backward로 읽어도 동일한 class 정보 유지
- 예시) ECG, climate, rotational signals 등에서 temporal 방향성이 의미 없을 수 있음.
- 장점
  - 데이터 2배 augmentation 효과
  - Noise-robustness 개선
  - Direction-invariant pattern 학습 가능
B) Mamba의 도입
- Linear complexity로 Long sequence 처리 가능.
- Selective State Spaces(SSM) 기반 → 중요한 정보만 선택적으로 업데이트.

Tango Scanning 특징

1개의 Mamba block으로 forward + reversed sequence 동시 처리
두 방향의 출력과 입력까지 모두 element-wise fusion
Inversion-invariant representation 형성
Memory footprint는 거의 동일

(6) Main Contribution

(1) Multi-view shift-equivariant TSC
- a) CWT 기반 spectral feature
- b) Local/Global temporal features (ROCKET / MLP)
- c) Switch gate로 adaptive fusion
(2) Mamba 기반 sequence modeling 강화
- Linear complexity + selective SSM 활용.
(3) Tango Scanning
- 하나의 Mamba block으로 forward/backward dependency 모두 반영.
(4) SoTA 성능
- 30개 dataset에서 TimesNet, TSLANet 등 최신 모델을 큰 폭으로 초월.

3. Methodology

TSCMamba는 다음 5단계로 구성

Spectral Representation (CWT 기반)
Temporal Feature Extraction (Local: ROCKET / Global: MLP)
Multi-View Fusion (Switch Gate + λ-weighted fusion)
Sequence Modeling (Mamba + Tango Scanning)
Classification Head (Depth-wise Pooling + MLP)

(1) Spectral Representation (CWT)

[Goal] Shift-equivariant한 spectral (time–frequency) features 추출

[How] MTS 각 channel을 CWT 기반 2D representation으로 변환

[Details]

Real-valued Morlet wavelet 사용
각 channel의 TS를 CWT로 변환하여
- L × L → L1 × L1 (논문에서는 64×64)
- 2D scalogram으로 생성

Patch Embedding

Conv2D(patch size=8) → flattened patch → FFN
결과: $W \in \mathbb{R}^{B \times D \times X}$

(B=batch, D=channel, X=patch feature dimension)

특징

(1) Spectral domain에서 shift equivariance 유지
(2) Localized time–frequency 정보 확보
(3) Nonstationary 신호에 강함

(2) Temporal Feature Extraction

Spectral features($W$)만으로는 temporal structure 부족!
두 가지 complementary temporal view 추가:
- a) Local temporal features (ROCKET)
- b) Global temporal features (MLP)

a) Local temporal features (ROCKET)

(1) Model: Random convolution kernel
(2) 역할: 다양한 receptive field를 가진 kernel들로 multi-scale local patterns 추출
(3) 방식: Downstream classifier와 독립적인 unsupervised 방식
(4) 결과: $V_L \in \mathbb{R}^{B \times D \times X}$

b) Global temporal features (MLP)

(1) Model: 각 channel별로 1-layer MLP
(2) 역할: Local dependency 없이 전체 시퀀스를 global pattern으로 압축
(3) 결과: $V_G \in \mathbb{R}^{B \times D \times X}$

(3) Fusing Multi-View Representations

Switch Gate

Learnable mask
- Weighted mixture가 아니라 단일 선택 방식
Temporal feature로 $V = V_G$ or $V_L$ 중 하나를 선택

Fusion 방식

선택된 temporal feature $V$와 spectral feature $W$를 element-wise로 결합

Additive or multiplicative

$V_W = \lambda V + (2-\lambda) W$.
$V_W = \lambda V \cdot (2-\lambda) W$.
λ는 learnable (초기값 1.0)
V와 W의 비중을 자동 조절

최종 multi-view tensor

$U = W \parallel V_W \parallel V$.

$W$: Spectral
$V_W$: Fused spectral–temporal
$V$: Raw temporal (Local/Global)

$\rightarrow$ 3개 feature map을 channel 차원으로 concat한 multi-view representation!

(4) Inferring with Time-Channel Tango Scanning (Mamba)

Token 구성

$U$의 shape: $B \times D \times 3X$

두개의 sequence

Time-wise token sequence: 길이=$3X$, dim=$D$
Channel-wise token sequence: 길이=$D$, dim=$3X$

두 방향(time, channel)으로 token sequence를 만들고

각각 Mamba로 모델링.

a) Vanilla Mamba Block

Input-dependent gating ($g_k$)로 중요한 token만 업데이트
Linear-time long-range modeling

b) Tango Scanning

Forward & Reverse 입력

→ 동일한 Mamba block에 연속적으로 넣고

→ 두 출력과 두 입력을 element-wise sum!

$s^{(o)} = v \oplus a \oplus v^{(r)} \oplus a^{(r)}$

Inversion invariance 확보
Forward + backward dependency 모두 modeling
BiMamba보다 가볍고 메모리 효율적
Token 간 pairwise interaction coverage 극대화

a) Time-wise Tango Scanning

$[B, D, 3X]$ → time dimension 기준 길이=$3X$의 token sequence
Output size: $[B, 3X, D]$

b) Channel-wise Tango Scanning

$[B, D, 3X]$ → channel dimension 기준 길이=$D$의 token sequence
Output size: $[B, D, 3X]$

a + b) Time + Channel fusion

두 scanning output을 합침: $z = (s^{(t)})^T \oplus s^{(c)}$
최종 sequence representation 생성.

(5) Output Class Representation

Depth-wise pooling

Max/Average pooling
Channel dimension을 reduce: $\bar{z} \in \mathbb{R}^{3X}$

2-layer MLP

$\bar{z}\to z^{(1)} \to z^{(2)}$.
마지막 $z^{(2)}$는 class logits
Loss: Cross-Entropy

Twitter Facebook LinkedIn

TSCMamba; Mamba Meets Multi-View Learning for Time Series Classification

Seunghan Lee