Towards Time-Series Reasoning with LLMs


Contents

  1. Abstract

  2. Introduction

  3. Related Works
    1. TS forecasting with LLMs
    2. TS QA with LLMs
  4. Methodology
    1. Architecture
    2. Training


0. Abstract

Multi-modal LLMs (MLLMs)

\(\rightarrow\) Understanding & Reasoning ! …. But not many in TS domain


Proposal: Novel multi-modal TS LLM approach,

  • that learns generalizable information across various domains
  • with powerful zero-shot performance


Procedure

  • Step 1) Train lightweight TS encoder (on top of an LLM)
  • Step 2) Finetune with chain-of-thought augmented TS tasks


1. Introduction

MLLMs

  • Enabled numerous advances in reasonining
  • Generalization ability to unseen tasks

\(\rightarrow\) But underexplored in TS!


Application of TS reasoning

  • health coaching
  • financial investing
  • environmental monitoring


Three essential steps to achieving TS reasoning:

  • (1) Perception: Understanding key characteristics in TS
  • (2) Contextualization: Extracting task-relevant features
  • (3) Deductive reasoning: Drawing conclusion

Issue of (1): Existing TS MLLMs suffers from perception bottleneck

  • Usually convert TS into text tokens \(\rightarrow\) Loss in their ability to recognize temporal patterns
  • Reasoning capabilities: can be seen on LARGE models (not on SMALL models)


To effectively recognize temporal patterns:

  • Step 1) Train lightweight TS encoder (on top of an LLM)
    • Encode various features (e.g., frequency, magnitude)
    • To address (1)
  • Step 2) Finetune with chain-of-thought augmented TS tasks
    • Promotes the learning of reasoning process
    • To address (2), (3)


2. Related Work

(1) TS forecasting with LLMs

Pretrained LLMs are typcally used as the backbones

  • ( + modules attached to capture properties of TS & align them with LLM )

\(\rightarrow\) Do not retain the “language modeling head” & not designed to “output text”


(2) TS QA with LLMs

(Modality conversion) Representing TS as text

\(\rightarrow\) Result in a loss of information


Previous works

  • [13] TS2TS > TS2Text
  • [4] Develop a dataset to asses general TS reasoning


3. Methodology

(1) Architecture

figure2


Component 1) for TS

  • 1-1) Encoder: MHSA
  • 1-2) Linear projection: to match dimension of LLM’s word embedding
  • Etc
    • Non-overlapping patch
    • (Mean & Std) are prepended in front of TS token


Component 2) for Text

  • Concatenate (text embedding) & (TS embedding)
  • LLM backbone: Mistral-7B


The above design offers significant flexibility in handling varying input formats

  • (1) Freely interleaving TS & Text
  • (2) Handle multiple different TS
    • e.g.) 3 channel TS: feed them sequentially & add text embedding describing the channel index


(2) Training

a) Dataset: Public TS datasets + Synthetic TS

b) Language model tasks: generated from a mix of predefined templates or via GPT-40

  • each task’s instruction includes 10-20 paraphrases

c) Two-stage training approach

  • Step 1) Encoder warm-up
  • Stage 2) Supervised fine-tuning on reasoning task


c-1) Encoder warm-up

  • Train “encoder & projection layer” from scratch ( Freeze the LLM )

  • Task: Next token prediction

  • Employ “curriculum learning”

    ( As convergence is challenging when training from scratch )

    • Step 1) Train on simple multiple-choice QA on synthetic TS
    • Step 2) Captioning task on synthetic data
    • Step 3) Captioning task on real data


c-2) Supervised fine-tuning on reasoning task

  • Finetune “encoder, projection layer, LLM” end-to-end
  • Use LoRA on a mixture of downstream tasks
    • Augmented with GPT-generated CoT text

figure2

Categories: , ,

Updated: