LIMA: Less Is More for Alignment
Zhou, Chunting, et al. "Lima: Less is more for alignment." NeurIPS 2023
( https://arxiv.org/pdf/2305.11206 )
참고:
- https://aipapersacademy.com/lima/
Contents
- Abstract
- LLM Training Stages
- How LIMA can improve LLMs training process?
- Small datasets with 1000 samples
- Experiments
1. Abstract
LIMA = Less Is More for Alignment ( by Met AI)
- 
    Fine-tune the LLaMa model on only 1000 samples \(\rightarrow\) Achieve competitive results with top large language models (such as GPT-4, Bard and Alpaca) 
2. LLM Training Stages
Stage 1) Pre-training stage: NSP task
Stage 2) Alignment stage
- 
    Not very good with helping in concrete tasks that LLMs are often used for \(\rightarrow\) Need alignment! 
- 
    Pretrained model is being fine-tuned on a specific task dataset - e.g., instructions dataset human feedback with reinforcement learning (RLHF)
 

3. How LIMA can improve LLMs training process?
Proposal: Alignment stage can be replaced with a much more lightweight process of fine-tuning on just a small dataset
\(\rightarrow\) Still achieve remarkable and competitive results!
Why does it work well?
\(\rightarrow\) Superficial alignment hypothesis
Superficial Alignment Hypothesis
Key point: A model has learned almost entirely during the pretraining stage!
Thus, alignment stage = simple! only requires to learn..
- What part of knowledge to use
- What is the correct format
\(\rightarrow\) SHORT fine-tuning can ruin less of the pretraining knowledge ( & avoid catastrophic forgetting )
4. Small datasets with 1000 samples

5. Experiments

- Alpaca = LLaMa + fine-tune on LARGE instructions dataset
- LIMA= LLaMa + fine-tune on SMALL instructions dataset
- DaVinci003 = (based on InstructGPT) trained with RLHF
