ORPO: Large Language Models As Optimizers

참고:

https://aipapersacademy.com/large-language-models-as-optimizers/
https://arxiv.org/pdf/2309.03409

Improving the Prompt
ORPO
1. OPRO Overview for Prompt Optimization
2. Meta-prompt Structure
3. Summary
Experiments

1. Improving the Prompt

OPRO (Optimization by PROmpting)

( by Google DeepMind)

New approach to leverage LLMs as optimizers

LLMs: (Input) Prompt \(\rightarrow\) (Output) Response

Extension: Better performance by extending the prompt with a carefully human crafted addition of an instruction

e.g., “let’s think step by step”

\(\rightarrow\) Manually crafting the prompt can be tedious!

2. ORPO

ORPO = Extension to improve the prompt “automatically”

(1) OPRO Overview for Prompt Optimization

(a) Goal = Maximize the accuracy over a dataset with prompts and responses

e.g., GSM8K dataset: word math problems

(b) How? By automatically yielding an instruction that will be added to the prompts in the dataset

e.g, “let’s think step by step” or “break it down” …

a) Optimizer LLM

(Input) Meta-prompt

Meta-prompt instructs the optimizer LLM to yield few instructions

(Output)

Yields 8 candidate instructions

b) Scorer LLM

Can be same as the optimizer LLM or a different one

(Input) Instructions

Created by Optimizer LLM

(Output) Scores

Get 8 accuracy scores

\(\rightarrow\) Add (instruction, scores) to the meta-prompt again!

If we do not observe any improvement in the accuracy anymore …

\(\rightarrow\) End with the optimized instruction!

(2) Meta-prompt Structure

How is the meta-prompt is constructed??

(3) Summary

3. Experiments

Outperforms the hand-crafted prompts!

The accuracy is increased when we make progress with the iterations

Twitter Facebook LinkedIn

ORPO; Large Language Models As Optimizers

Seunghan Lee