Orca 2: Teaching Small Language Models How to Reason

Introduction

Orca 2 marks a significant advancement in language model development, emphasizing enhanced reasoning abilities in smaller models. This blog explores Orca 2's innovative methodologies, "Cautious Reasoning" and "Prompt Erasing," detailing their impact on AI language modeling.

Core Concepts in Orca 2

Cautious Reasoning

"Cautious Reasoning" in Orca 2 refers to the model's ability to select the most suitable solution strategy for each task. This approach enhances flexibility and adaptability in problem-solving.

Prompt Erasing

"Prompt Erasing" complements Cautious Reasoning by training the model to focus on the essence of a task, rather than being guided by explicit wording of prompts. This method enables Orca 2 to apply reasoning strategies across various contexts.

Training Orca 2: Embracing Diversity

Overview of Training

Orca 2's training is characterized by its use of approximately 817,000 instances from various sources, ensuring a comprehensive learning experience. The model undergoes progressive learning, adapting to a range of tasks and instructions.

Role of Varied Prompts in Training

Training with a variety of prompts is crucial for Orca 2's development. This exposure to different reasoning strategies, from direct answers to detailed, step-by-step problem-solving, enriches the model's understanding and adaptability.

Demonstrative Example from Flan-CoT Collection

To illustrate Cautious Reasoning in practice, consider this task from the Flan-CoT Collection:

Instructions:

You're given a short story of five sentences in the wrong order. The task is to return the correct order to create a coherent story.

Sentence1: He was scared until he found out she was ok.
Sentence2: He usually saw really bad accidents and it always unnerved him.
Sentence3: One day Bruce was called to an accident scene.
Sentence4: Bruce was a police officer who handled traffic accidents.
Sentence5: He immediately recognized his friend Tanya’s car.

Various answers demonstrate different reasoning approaches:

Answer 1: A direct approach (43152)
Answer 2: A step-by-step reasoning process leading to 43521
Answer 3: Providing an explanation along with the answer (43152)
Answer 4: A detailed analysis involving identifying themes and cause-effect relationships, resulting in the sequence 42351

This example exemplifies how Orca 2, trained on such diverse responses, can adapt its reasoning to the complexity and requirements of different tasks.

Real-Time Response Selection in Orca 2

Orca 2's ability to dynamically select the most appropriate response format for user queries is a direct outcome of its diverse training. Depending on the query, Orca 2 can provide straightforward answers, engage in detailed explanations, or adopt a step-by-step approach.

Conclusion

Orca 2's focus on "Cautious Reasoning" and "Prompt Erasing," combined with its progressive training strategy, significantly enhances its reasoning capabilities. This development challenges the notion that larger models are superior for complex tasks and paves the way for more efficient and versatile language models.

References

Orca 2: Teaching Small Language Models How to Reason.Link to paper

Created 2023-11-29T15:14:31-08:00