How Can Time Series Analysis Benefit From Multiple Modalities? A Survey and Outlook - Part 1

Abstract
Introduction
Background and Taxonomy
1. Taxonomy
2. Background
TimeAsX: Resuing Foundation Models of Other Modalities for Efficient TSA
1. Time As Text
2. Time As Image
3. Time As Other Modalities
4. Domain-Specific TS Works

Abstract

New recent field: Multiple Modalities for TSA (MM4TSA)

Q) How TSA can benefit from multiple modalities?

Discuss three benefits:

(1) Reusing foundation models of other modalities for efficient TSA
(2) Multimodal extension for enhanced TSA
(3) Cross-modality interaction for advanced TSA

Group the works by the introduced modality type

.e.g, text, images, audio, tables, and others

1. Introduction

(1) Multimodal + TS

(1) Modality reusing
(2) Multimodal enhancement
(3) Cross-modal interaction

Existing works: mainly focus on (1) Reusing LLMs

\(\rightarrow\) Only a sub-sub-branch!

(2) Overview

This survey paper

= First review of emerging MM4TSA field

Systematically identifies three key approaches (Figure 1)

(1) TimeAsX: Reusing Foundation Models from Other Modalities for Efficient TSA
(2) Time+X: Multimodal Extensions for Enhanced TSA
(3) Time2X and X2Time: CrossModality Interaction for Advanced TSA.

Comprehensively cover multiple modalities

e.g., language, vision, tables, and audio

Representative studies from specific domains

e.g., finance, medical, and spatio-temporal

Identify key gaps associated with each approach:

(1) Which modality to reuse?
(2) How to handle heterogeneous modality combinations?
(3) How to generalize to unseen tasks?

2. Background and Taxonomy

(1) Background

TS: \(X_{1: T}=\left\{x_1, x_2, \ldots, x_T\right\}\).

TS tasks:

(Basic) Forecasting, Classification, Anomaly detection, Imputation
(Extension) Videos, event streams

\(\rightarrow\) These data are inherently multimodal!

(2) Taxonomy

(Existing surveys: Mainly focus on reusing LLMs for TSA (i.e., our Time As Text))

3. TimeAsX: Resuing Foundation Models of Other Modalities for Efficient TSA

(Compared to TS), NLP &Vision have richer data and deeper exploration

\(\rightarrow\) Have led to many advanced foundation models

e.g., GPT, DeepSeek, Llama, and Qwen

Question) Can we reuse these off-the-shelf foundation models from “rich” modalities for efficient TSA?

=> “TimeAsX”

Reused modality into text, image, audio, and table

(1) Time As Text

Recent works have explored the usage of LLMs for TSA tasks

Motivation: Both language and TS have a sequence structure

Divided three groups

a) Direct alignment without training
b) Training for alignment under an existing vocabulary
c) Training for alignment with an expanded vocabulary

a) Direct Alignment without Training

Key concepts

Do not need any update to LLM
Mainly focus on how to input TS data as text input
Two cateogires
- (1) Better Tokenization
- (2) Task-specific Prompting

(1) Better Tokenization

LLMTime [58]: Inputs the TS data as text to the LLM
- Title: Large Language Models Are Zero-Shot Time Series Forecasters
- https://arxiv.org/pdf/2310.07820

(2) Task-specific Prompting

Carefully prompting the LLMs to provide context about the TS domain and task.
PromptCast [204]: Provides specific text prompts to LLMs
- Prompts: Contain information such as domain information
  - e.g., Time-step, task and past time-series values
- Title: PromptCast: A New Prompt-based Learning Paradigm for Time Series Forecasting
- https://arxiv.org/pdf/2210.08964
LSTPrompt [124]: Uses more sophisticated CoT prompting
- Provide useful sequence of steps for LLMs to reason about TS
- Focusing on different trends and patterns for long and short term forecasting.
- Title: LSTPrompt: Large Language Models as Zero-Shot Time Series Forecasters by Long-Short-Term Prompting
- https://arxiv.org/pdf/2402.16132

LLMTime

PromptCast

LSTPrompt

b) Training for Alignment Under Existing Vocabulary

Aligns TS with LLMs by treating TS as a sentence in a known vocabulary

Background

Dimension of both language vocabulary and TS patches are high!
\(\rightarrow\) Simplify the training procedure by correlating to objectives from certain TS tasks

Solution: LLMs will be treated as TS models by adding time-to-text transformation modules (and modules vice versa)

Categorized into…

(1) Embedding alignment
(2) Prototype alignment
(3) Context alignment

(1) Embedding Alignment

TS is aligned to existing vocabulary by training the initial and final layers as projections from TS to language vocabularies (or vice versa)
aLLM4TS [217], OneFitsAll [234]
- Use a frozen LLM backbone
- Generate patch embeddings of TS datasets & feed to LLM
- Embedding layer & Output layer
  
  \(\rightarrow\) Fine-tuned on the TS data
- [217] Title: Multi-Patch Prediction: Adapting LLMs for Time Series Representation Learning
- [217] https://arxiv.org/pdf/2402.04852
- [217] Title One fits all: Power general time series analysis by pretrained lm
- [217] https://arxiv.org/pdf/2302.11939
LLM4TS [32]: Additionally fine-tunes the LNs
- Title: LLM4TS: Aligning Pre-Trained LLMs as Data-Efficient Time-Series Forecasters
- https://arxiv.org/pdf/2308.08469

(2) Prototype Alignment

Train input modules to map TS values into fixed embeddings (prototypes) that are closer to embedding space of pre-trained distribution
TimeLLM [141]:
- Combines text-based prompting with patch-based input
- Patches are reprogrammed to generate output embeddings via a trainable layer.
- Title: Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
- https://arxiv.org/pdf/2310.01728
ChatTS [196]:
- Uses fixed attribute descriptors that capture important TS properties
  - e.g., trends, noise, periodicity, and local variance.
- Title: ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning
- https://arxiv.org/abs/2412.03104

(3) Context Alignment

FSCA [68]
- Identifies the need to more intricately align text context with TS when fed together as input embeddings.
- Employ GNN to model the interaction between TS & Text
  - Train together with embedding and LN layers of LLM when pre-training.
- Title: Context-Alignment: Activating and Enhancing LLM Capabilities in Time Series
- https://arxiv.org/abs/2501.03747

aLLM4TS

OneFitsAll

LLM4TS

TimeLLM

ChatTS

FSCA

c) Training for Alignment with Expanded Vocabulary

Expand LLM vocabulary to align with TS datasets

Treat TS data as sentences in a foreign language

\(\rightarrow\) Adapt the LLMs toward such a language!

Differ in how they design the adaptor functions to map TS to expanded vocabulary.

Chronos [7]

Quantizes the normalized input TS values into discrete tokens
Title: Chronos: Learning the Language of Time Series
https://arxiv.org/pdf/2403.07815

ChatTime [180]

Introduce additional tokens for quantized values of input TS

( as well as NaN or missing values )
Title: ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data
https://arxiv.org/abs/2412.11376

Chronos

ChatTime

Twitter Facebook LinkedIn

How Can Time Series Analysis Benefit From Multiple Modalities? A Survey and Outlook - Part 1

How Can Time Series Analysis Benefit From Multiple Modalities? A Survey and Outlook - Part 1

Seunghan Lee

How Can Time Series Analysis Benefit From Multiple Modalities? A Survey and Outlook - Part 1

Contents

Abstract

1. Introduction

(1) Multimodal + TS

(2) Overview

2. Background and Taxonomy

(1) Background

(2) Taxonomy

3. TimeAsX: Resuing Foundation Models of Other Modalities for Efficient TSA

(1) Time As Text

a) Direct Alignment without Training

b) Training for Alignment Under Existing Vocabulary

c) Training for Alignment with Expanded Vocabulary

You May Also Enjoy