ORPO: Large Language Models As Optimizers

참고:

  • https://aipapersacademy.com/large-language-models-as-optimizers/
  • https://arxiv.org/pdf/2309.03409


Contents

  1. Improving the Prompt
  2. ORPO
    1. OPRO Overview for Prompt Optimization
    2. Meta-prompt Structure
    3. Summary
  3. Experiments


1. Improving the Prompt

OPRO (Optimization by PROmpting)

( by Google DeepMind)

  • New approach to leverage LLMs as optimizers


LLMs: (Input) Prompt \(\rightarrow\) (Output) Response

Extension: Better performance by extending the prompt with a carefully human crafted addition of an instruction

  • e.g., “let’s think step by step”

\(\rightarrow\) Manually crafting the prompt can be tedious!


2. ORPO

ORPO = Extension to improve the prompt “automatically”

(1) OPRO Overview for Prompt Optimization

(a) Goal = Maximize the accuracy over a dataset with prompts and responses

  • e.g., GSM8K dataset: word math problems


(b) How? By automatically yielding an instruction that will be added to the prompts in the dataset

  • e.g, “let’s think step by step” or “break it down” …

figure2


a) Optimizer LLM

(Input) Meta-prompt

  • Meta-prompt instructs the optimizer LLM to yield few instructions

(Output)

  • Yields 8 candidate instructions


b) Scorer LLM

Can be same as the optimizer LLM or a different one

(Input) Instructions

  • Created by Optimizer LLM

(Output) Scores

  • Get 8 accuracy scores

    \(\rightarrow\) Add (instruction, scores) to the meta-prompt again!


If we do not observe any improvement in the accuracy anymore …

\(\rightarrow\) End with the optimized instruction!


(2) Meta-prompt Structure

How is the meta-prompt is constructed??

figure2


(3) Summary

figure2


3. Experiments

figure2

  • Outperforms the hand-crafted prompts!


figure2

  • The accuracy is increased when we make progress with the iterations

Categories: ,

Updated: