TempoPFN: Synthetic Pre-training of Linear RNNs for Zero-shot Time Series Forecasting

https://arxiv.org/pdf/2510.25502

Abstract

1. Motivation

Zero-shot TSFM

Long-horizon prediction에서 비효율적
Reproducibility가 낮음
Synthetic-only pretraining 접근은 어려운 benchmark에서 성능이 낮음

2. Proposal: TempoPFN

Overview
- (1) Univariate TSFM
- (2) Linear RNN 기반
- (3) Synthetic data만 사용해 pre-training
Architecture
- (1) GatedDeltaProduct
- (2) State-weaving
  - Sequence length 전반에 대해 fully parallelizable training
- (3) Windowing이나 summarization 없이 TS state를 직접 추적
Synthetic data
- 다양한 generator를 통합
  - Stochastic Differential Equations (SDEs)
  - Gaussian Processes (GPs)
  - Audio synthesis
- Novel augmentation 포함

3. Performance & Efficiency & Reproducibility

**Performance**
- Gift-Eval, fev-bench, Chronos-ZS benchmark에서 평가
- 모든 기존 synthetic-only 방법을 상회
- 다수의 real-world data로 학습한 모델보다도 높은 성능
Efficiency
- Training과 inference 모두 fully parallelizable
- 기존 foundation baseline 대비 계산 효율 우수
Reproducibility
- Synthetic data generation pipeline 공개
- Training code 공개
- Reproducible TS foundation model 연구 기반 제공

1. Introduction

a) Background

TSFM: Fine-tuning 없이 zero-shot prediction 가능

TSFM 한계점?

(1) Transformer
- Quadratic complexity
- Long-horizon에서 error accumulation
(2) Non-linear RNN (e.g., TiReX)
- Temporal state 유지 가능
- Sequential processing으로 scalability 제한
(3) Synthetic-only pre-training 모델들
- Gift-Eval에서 SoTA 미달
- TabPFN-TS는 성능은 우수하나 synthetic data 비공개 → Reproducibility 부족

b) Proposal: TempoPFN

PFN framework
Arch:
- Linear RNN
- GatedDeltaProduct recurrence
Fully parallelizable training & inference

c) Findings

[1] Architecture TS forecasting에 Non-linear RNN이 필수는 아님
- Linear RNN + GatedDeltaProduct만으로도 충분한 state-tracking 가능
  - DeltaProduct는 orthogonal rotation 기반 → Diagonal SSM보다 우수한 state 유지
[2] Synthetic pre-training 전략
- Diverse synthetic generators + novel augmentations
- Real-world data 미사용 → benchmark leakage 방지
- 전체 pipeline과 code open-source 공개

d) Contributions

(1) Architecture
- Linear RNN 기반 univariate TSFM 최초
- Windowing·patching 없이 모든 future horizon 병렬 예측
- State-weaving으로 horizon 간 정보 흐름 강화
(2) Data pipeline
- Fully synthetic, leakage-free, reproducible
(3) Performance
- Gift-Eval, fev-bench, Chronos-ZS에서 top-tier zero-shot 성능
- 대부분의 real-world data 기반 모델 상회

(1) Time Series Forecasting

Classical methods
- ARIMA, Exponential Smoothing → point estimate 중심
Probabilistic forecasting
- Predictive distribution \(p(y_{T+1:T+H} \mid y_{1:T})\) 모델링
Deep learning 기반
- Transformers, modern RNNs

(2) Zero-shot forecasting

Fine-tuning 없이 unseen TS 예측
주류 접근은 transformer 기반
- Chronos, TimesFM, MOIRAI
MOIRAI(-MOE)가 Gift-Eval SOTA

(3) PFNs & Synthetic Data

PFN은 task solver가 아닌 inference algorithm 자체를 학습
Synthetic prior에서 dataset을 샘플링하여
- Posterior predictive distribution 근사
In-context learning = fast approximate Bayesian inference
성능은 prior의 표현력에 의해 결정
기존 사례: TabPFN, ForecastPFN, TimePFN, TabPFN-TS
Proposal) TempoPFN
- 자체 synthetic temporal-dynamics pipeline을 prior로 사용
- Unseen TS에 대한 zero-shot probabilistic forecasting 가능

(4) Linear RNNs and State-Space Models

최근 long-horizon TS forecasting에서 RNN 재조명
TiRex
- Non-linear RNN (xLSTM) + 일부 real data 사용
Proposal) TempoPFN
- Linear RNN + GatedDeltaProduct
- Fully parallelizable training & inference
- Synthetic-only pretraining → leakage 없음
Linear RNN 장점
- Chunk-wise parallelization / associative scan 가능
기본 형태
- State update: \(H_i = A(x_i)H_{i-1} + B(x_i)\).
- Output: \(\hat{y}_i = \text{dec}(H_i, x_i)\)
Diagonal transition 계열
- Mamba, GLA, mLSTM
More expressive linear RNNs
- DeltaNet, TTT-Linear, RWKV-7, Titans

3. TempoPFN

(1) Architecture

Overall Goal

Univariate TS를 전체 prediction horizon에 대해 single forward pass로 예측하도록 설계됨
Windowing이나 요약 없이 모든 time step을 직접 처리

a) Input Representation

History (time steps + values) & Future (time steps) \(\rightarrow\) 하나의 token sequence로 concat
- (1) Historical steps
  - 값 \(y_i\)는 linear projection
  - Missing value는 learnable NaN embedding
  - Value embedding과 time-feature embedding을 additive하게 결합
- (2) Future steps
  - Time-feature embedding만 사용
Future time steps 간 정보 교환을 허용해 coherent prediction 유도
각 time step은 GluonTS time features (seasonality, day-of-week, index 등)를 사용
TiReX와 달리 window-based presummarization을 사용하지 않음

b) Backbone

10개의 encoder layer로 구성
- 각 layer는 Gated DeltaProduct block 기반
주요 구성 요소
- Gated DeltaProduct recurrence + short 1D convolution (kernel 16–32)
- (Recurrent unit 이전의) Pre-normalization
- Channel-wise 변환을 위한 gated MLP
두 가지 장점
- (1) Linear recurrence의 parallelization
- (2) Convolution, MLP의 표현력을 결합
DeltaProduct는 Householder transformation의 곱으로 hidden state transition을 표현
Head 수 증가 시
- Length extrapolation
- State tracking
- Sequence modeling 능력이 향상됨

c) Non-causality via State Weaving

Full-horizon forecasting은 causal masking이 필요하지 않다는 점을 활용
각 layer의 final hidden state H_t^i를 다음 layer의 learnable initial state H_0^{i+1}에 더함
추가 parameter나 explicit bidirectional 구조 없이 bidirectional information flow를 구현
History와 future 전체에 접근 가능
Causal RNN에서 발생하는 prediction 단계의 information bottleneck을 방지

(2) Synthetic Data Generation

Overall Strategy

TSFM pretraining을 위해 10개의 synthetic generator 사용
기존 방법 + novel generator를 결합

\(\rightarrow\) 보다 넓은 temporal pattern space를 커버

a) Existing Generators

ForecastPFN Generator
- Trend와 seasonality의 multiplicative composition
- Linear, exponential growth + sinusoidal harmonics
- Weibull noise, time warping, magnitude scaling, spike injection 포함
- Extreme value를 방지하는 filtering 적용
KernelSynth
- Gaussian Process prior에서 univariate TS 샘플링
- Periodic, stationary, noise kernel을 additive 또는 multiplicative하게 결합
- Smooth하면서도 다양한 trajectory 생성
Extended Gaussian Process Generator
- KernelSynth를 확장해 더 다양한 kernel 조합 사용
- Stationary와 non-stationary pattern의 범위를 확장
CauKer
- Structural Causal Model 기반 generator
- Random DAG에서 causal dependency를 도입
- Multivariate TS를 생성한 뒤 각 channel을 독립적인 univariate TS로 사용
- 상호 의존적인 dynamics를 간접적으로 반영

b) Novel Generators

Sawtooth
Upward 또는 downward ramp 형태의 pattern
- 약한 trend와 low-amplitude seasonality로 과도한 이상화 방지
Step Function
Changepoint 기반 piecewise constant TS
- Gaussian smoothing, noise, seasonality, anomaly 포함
Anomaly
Baseline signal 위에 spike를 주기적 또는 cluster 형태로 삽입
- Spike magnitude는 constant, trending, cyclical, random regime을 따름
Spikes
- Flat baseline 위에 sharp event-driven spike 배치
- V, inverted V, plateau 형태
- Bursty 또는 evenly spaced 패턴
Sine Wave
Period, amplitude, phase, noise를 제어 가능한 기본 oscillatory signal
- Periodicity 학습을 위한 기초 패턴 제공

c) Audio-Inspired Generators

Procedural audio synthesis 기법을 TS에 적용
Event-driven, highly complex dynamics를 모델링
예시
- Stochastic Rhythms (event data)
- Financial Volatility (shock, clustering)
- Network Topology (traffic burst, congestion)
- Multi-Scale Fractals (self-similarity)

d) SDE Generator (Core Contribution)

Regime-switching, time-inhomogeneous Ornstein–Uhlenbeck process 기반
Mean, volatility, mean-reversion speed가
- Time t와 latent regime r_t에 의존
Regime는 Markov chain으로 전이
Parameter는 polynomial, sinusoidal, logistic, piecewise-linear 형태로 변화
Seasonality를 mean과 volatility에 additive하게 주입
Fractional Brownian motion을 통해 long-memory dynamics도 지원
Rescaling, shifting, measurement noise를 추가해 realism 강화
Regime shift, non-stationarity, periodicity를 하나의 확률적 프레임워크로 통합

4. Experiments

Twitter Facebook LinkedIn

TempoPFN; Synthetic Pre-training of Linear RNNs for Zero-shot Time Series Forecasting

Seunghan Lee

TempoPFN: Synthetic Pre-training of Linear RNNs for Zero-shot Time Series Forecasting

Abstract

1. Motivation

2. Proposal: TempoPFN

3. Performance & Efficiency & Reproducibility

1. Introduction

a) Background

b) Proposal: TempoPFN

c) Findings

d) Contributions

(1) Time Series Forecasting

(2) Zero-shot forecasting

(3) PFNs & Synthetic Data

(4) Linear RNNs and State-Space Models

3. TempoPFN

(1) Architecture

a) Input Representation

b) Backbone

c) Non-causality via State Weaving

(2) Synthetic Data Generation

a) Existing Generators

b) Novel Generators

c) Audio-Inspired Generators

d) SDE Generator (Core Contribution)

4. Experiments

You May Also Enjoy

TempoPFN; Synthetic Pre-training of Linear RNNs for Zero-shot Time Series Forecasting

Seunghan Lee

TempoPFN: Synthetic Pre-training of Linear RNNs for Zero-shot Time Series Forecasting

Abstract

1. Motivation

2. Proposal: TempoPFN

3. Performance & Efficiency & Reproducibility

1. Introduction

a) Background

b) Proposal: TempoPFN

c) Findings

d) Contributions

2. Background & Related Works

(1) Time Series Forecasting

(2) Zero-shot forecasting

(3) PFNs & Synthetic Data

(4) Linear RNNs and State-Space Models

3. TempoPFN

(1) Architecture

a) Input Representation

b) Backbone

c) Non-causality via State Weaving

(2) Synthetic Data Generation

a) Existing Generators

b) Novel Generators

c) Audio-Inspired Generators

d) SDE Generator (Core Contribution)

4. Experiments

You May Also Enjoy