Exploring the "System 2 Attention" in AI: Innovations and Variations

Introduction

This blog post delves into the key concepts of "System 2 Attention" (S2A) mechanism, introduced in a recent paper by Jason Weston and Sainbayar Sukhbaatar from Meta, its implementation, and the various variations explored in the paper.

Key Concepts

S2A is inspired by the dual-process theory in human cognition, aiming to emulate the more deliberate and attentive reasoning of human 'System 2' thinking. Traditional Large Language Models (LLMs) often struggle with reasoning and can be misled by irrelevant context, a challenge S2A seeks to address.

Challenges with Current LLMs

Current LLMs, despite their vast knowledge, often make mistakes in reasoning and can be influenced by irrelevant context or exhibit sycophantic tendencies.

Soft Attention vs. System 2 Attention

S2A critiques the traditional soft attention mechanism in Transformers, which tends to spread attention across a large part of the context, including irrelevant sections. S2A, in contrast, focuses on relevant information only.

Implementation and Experiments

S2A involves regenerating the input context to focus only on relevant parts, using instruction-tuned LLMs. It was evaluated across tasks like factual question answering, longform generation, and math word problems, showing significant improvements in factuality and objectivity.

Results

In factual QA, accuracy increased from 62.8% to 80.3%.
In longform generation, objectivity increased by 57.4%.
For math word problems, accuracy improved from 51.7% to 61.3%.

Limitations and Future Work

The method requires more computational resources and is heavily dependent on the quality of zero-shot prompts.

Variations of System 2 Attention

No Context/Question Separation

This variant doesn't separate context and question, suitable for scenarios where full context copying is straightforward or for shorter contexts.

Keep Original Context

Here, the regenerated context is appended to the original, allowing the model to access both the original context and its reinterpretation.

Instructed Prompting

This approach uses instructions to ask for an unopinionated response and compares it to a baseline where additional instructions are added to the original context without performing S2A.

Emphasize Relevance/Irrelevance

This variation focuses on emphasizing the relevance versus irrelevance in the context, diverging from the standard S2A's focus on objectivity and reducing sycophancy.

Example Implementation:

Original Prompt: "Discuss the impact of social media on political opinions."
Original Context: "Social media platforms like Twitter and Facebook have a huge impact on political opinions. Many people believe that social media has led to increased political polarization."
S2A Regenerated Context: "Social media platforms impact political opinions. They contribute to political polarization."
S2A Task: Separate unbiased factual information from opinionated statements in the context.
Resulting Action: The model focuses on the factual component "Social media platforms impact political opinions" and disregards the subjective part "Many people believe."

Conclusion

"System 2 Attention" introduces a novel approach to enhance the reasoning capabilities of AI models. By focusing on relevant information and emulating human-like attentive processing, it opens new avenues for more accurate and reliable AI systems. However, the increased computational demand and reliance on prompt quality are areas for future improvement.

Reference

System 2 Attention(is something you might need too)

Created 2023-11-27T17:47:59-08:00