This blog post delves into the key concepts of "System 2 Attention" (S2A) mechanism, introduced in a recent paper by Jason Weston and Sainbayar Sukhbaatar from Meta, its implementation, and the various variations explored in the paper.
S2A is inspired by the dual-process theory in human cognition, aiming to emulate the more deliberate and attentive reasoning of human 'System 2' thinking. Traditional Large Language Models (LLMs) often struggle with reasoning and can be misled by irrelevant context, a challenge S2A seeks to address.
Current LLMs, despite their vast knowledge, often make mistakes in reasoning and can be influenced by irrelevant context or exhibit sycophantic tendencies.
S2A critiques the traditional soft attention mechanism in Transformers, which tends to spread attention across a large part of the context, including irrelevant sections. S2A, in contrast, focuses on relevant information only.
S2A involves regenerating the input context to focus only on relevant parts, using instruction-tuned LLMs. It was evaluated across tasks like factual question answering, longform generation, and math word problems, showing significant improvements in factuality and objectivity.
The method requires more computational resources and is heavily dependent on the quality of zero-shot prompts.
This variant doesn't separate context and question, suitable for scenarios where full context copying is straightforward or for shorter contexts.
Here, the regenerated context is appended to the original, allowing the model to access both the original context and its reinterpretation.
This approach uses instructions to ask for an unopinionated response and compares it to a baseline where additional instructions are added to the original context without performing S2A.
This variation focuses on emphasizing the relevance versus irrelevance in the context, diverging from the standard S2A's focus on objectivity and reducing sycophancy.
"System 2 Attention" introduces a novel approach to enhance the reasoning capabilities of AI models. By focusing on relevant information and emulating human-like attentive processing, it opens new avenues for more accurate and reliable AI systems. However, the increased computational demand and reliance on prompt quality are areas for future improvement.
System 2 Attention(is something you might need too)
Created 2023-11-27T17:47:59-08:00 · Edit