Reflexion: Language Agents with Verbal Reinforcement Learning

Reflexion is a novel framework proposed by Shinn et al. for reinforcing language agents through linguistic feedback rather than traditional weight updates. The key idea is to have agents verbally reflect on feedback signals, maintain the reflective text in an episodic memory buffer, and use this to guide better decision making in subsequent trials.

The Reflexion process involves three main components:

An Actor model (based on a large language model) that generates text and actions
An Evaluator model that scores the outputs from the Actor
A Self-Reflection model that generates verbal feedback to help the Actor self-improve

The Actor's policy is parameterized by the LLM's weights as well as an episodic memory that stores the reflective feedback. This memory provides additional context to help the Actor make better decisions over time.

Reflexion is flexible enough to incorporate various types of feedback signals (scalar values or free-form language) from different sources (external or internally simulated). It was evaluated on three types of tasks:

Sequential decision-making in the AlfWorld environment
Reasoning over Wikipedia data to answer questions from HotPotQA
Code generation in Python and Rust on HumanEval, MBPP, and a new LeetcodeHard dataset

Across all tasks, Reflexion agents achieved significant improvements over strong baselines. Most notably:

22% improvement over ReactOnly in AlfWorld after 12 trials
20% improvement over ReAct in HotPotQA reasoning
Achieved 91% solve rate on HumanEval Python, surpassing the previous 80% SOTA by GPT-4
Set new SOTA scores on HumanEval Rust, MBPP Python, MBPP Rust, and LeetcodeHard Python

Ablation studies showed that both test case generation and self-reflection are critical components for Reflexion's strong code generation performance.

The authors conclude that reinforcing language agents through self-reflection and persistent memory is a promising paradigm as language models continue to improve. Capturing experiences in natural language enables explicit credit assignment and provides more informative guidance for future trials compared to traditional RL rewards.

References

Created 2024-04-13T22:02:03-07:00

Reflexion: Language Agents with Verbal Reinforcement Learning

References

Related