LIMA: Less Is More for Alignment

Zhou, Chunting, et al. "Lima: Less is more for alignment." NeurIPS 2023

( https://arxiv.org/pdf/2305.11206 )

참고:

  • https://aipapersacademy.com/lima/


Contents

  1. Abstract
  2. LLM Training Stages
  3. How LIMA can improve LLMs training process?
  4. Small datasets with 1000 samples
  5. Experiments


1. Abstract

LIMA = Less Is More for Alignment ( by Met AI)

  • Fine-tune the LLaMa model on only 1000 samples

    \(\rightarrow\) Achieve competitive results with top large language models (such as GPT-4, Bard and Alpaca)


2. LLM Training Stages

Stage 1) Pre-training stage: NSP task

Stage 2) Alignment stage

  • Not very good with helping in concrete tasks that LLMs are often used for

    \(\rightarrow\) Need alignment!

  • Pretrained model is being fine-tuned on a specific task dataset

    • e.g., instructions dataset human feedback with reinforcement learning (RLHF)

figure2


3. How LIMA can improve LLMs training process?

Proposal: Alignment stage can be replaced with a much more lightweight process of fine-tuning on just a small dataset

\(\rightarrow\) Still achieve remarkable and competitive results!


Why does it work well?

\(\rightarrow\) Superficial alignment hypothesis


Superficial Alignment Hypothesis

Key point: A model has learned almost entirely during the pretraining stage!

Thus, alignment stage = simple! only requires to learn..

  • What part of knowledge to use
  • What is the correct format


\(\rightarrow\) SHORT fine-tuning can ruin less of the pretraining knowledge ( & avoid catastrophic forgetting )


4. Small datasets with 1000 samples

figure2


5. Experiments

figure2


  • Alpaca = LLaMa + fine-tune on LARGE instructions dataset
  • LIMA= LLaMa + fine-tune on SMALL instructions dataset
  • DaVinci003 = (based on InstructGPT) trained with RLHF


Categories: ,

Updated: