ReFT: Representation Finetuning for Language Models

Wu, Zhengxuan, et al. "Reft: Representation finetuning for language models." NeurIPS 2024

참고:

1. Introduction

ReFT

Finetuning a Pre-trained Transformer is expensive

\(\rightarrow\) Parameter-efficient finetuning (PEFT)

Parameter-efficient finetuning (PEFT)

Only update a small number of weights!
e.g., LoRA
- Add small adapter weights to the model layers
- Only update the added weights

ReFT = Representation Fine-Tuning

LoRA weights are baked into the Transformer

\(\rightarrow\) Representations are impacted by the added LoRA weights

( & Not the original representations obtained from the pre-trained transformer )

Why not directly edit the representaiton?

\(\rightarrow\) via Intervention

\(\Phi_{\text {LoReFT }}(\mathbf{h})=\mathbf{h}+\mathbf{R}^{\top}(\mathbf{W h}+\mathbf{b}-\mathbf{R h})\).

Examples)

Train interventions for prefix and suffix of the tokens
- Exact size of prefix and suffix are hyperparameters
Intervention parameters:
- Either shared or not shared between different tokens of the same layer
- Different between the different layers