Orca 2: Teaching Small Language Models How to Reason

Mitra, Arindam, et al. "Orca 2: Teaching small language models how to reason." arXiv preprint arXiv:2311.11045 (2023).

참고:

https://aipapersacademy.com/orca-2/
https://arxiv.org/pdf/2311.11045

Orca 1 Recap
1. Imitation learning
2. Explanation tuning
Orca 2 Improvements
1. Use the right tool for the job
2. Cautious reasoning
Training Orca 2
Experiments

1. Orca 1 Recap

(1) Imitation learning

Fine-tuning the base model on a dataset,

wherethe dataset is created using responses from ChatGPT or GPT-4

(2) Explanation tuning

(Before Orca 1) Student models mostly learn to imitate the teacher model style, rather than its reasoning process

\(\because\) Responses they use for fine-tuning were mostly simple and short

Orca 1: Explanation tuning

Student model learns the thought process of the teacher model
How is Orca 1 dataset constructed?
- Add “system instruction”
  
  \(\rightarrow\) Provides guidelines for GPT-4 regarding how it should generate the response

2. Orca 2 Improvements

Orca2: release the model weights

Two contributions

Use the right tool for the job.
Learn to use the right tool for the job ( = Cautious Reasoning )

(1) Use the right tool for the job

There are various types of system instructions

ex) step-by-step
ex) recall-then-generate
ex) explain-then-answer
ex) direct-answer
…

Each system instruction guides the model to use a specific solution strategy that helps the model to reach the correct answer.

Observation: Not every system instruction matches every user instruction!

Orca 2: Map properly between solution strategies and user-instruction types

\(\rightarrow\) The responses the model is trained on will be more accurate!

Then, how to assign?

\(\rightarrow\) Run Orca 1 on examples for a certain task type, and see which system instructions perform better on that task type.

(2) Cautious reasoning

Cautious reasoning

= The model learns to use the “right tool” for the job.

= Deciding which “solution strategy” to choose for a given task

Given a user instruction, Orca 2, as a cautious reasoner, should be able to choose the proper reasoning technique for the input instruction,

( even without a system instruction that will guide him to choose that strategy )

\(\rightarrow\) via Prompt Erasing

Prompt Erasing

(At training time) Replace the system instructions ** \(\rightarrow\) **generic system instruction.
By observing the response without the system instruction in training…

\(\rightarrow\) Orca 2 learns to decide which solution strategy to use for each task type

Generic System Instruction

You are Orca, an AI language model created by Microsoft. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior.

\(\rightarrow\) Does not contain specific details about how to generate the response!!

3. Training Orca 2

Step 1) Start with the 7B and 13B LLaMA-2
Step 2) Continue training on data from ..
- (1) FLAN-v2 dataset
  - Dataset from the first Orca paper
- (2) New dataset
  - Created for Orca 2, based on the two ideas we’ mentioned above

4. Experiments

Twitter Facebook LinkedIn

Orca 2; Teaching Small Language Models How to Reason

Seunghan Lee