Reflexion
also known as Cross-Episode Lesson Writing, Verbal Reinforcement Learning
Have the agent write linguistic lessons from past failures and consult them in future episodes.
This pattern helps complete certain larger patterns —
- specialisesReflection★★— Have the model review its own output and produce a revised version in one or more passes.
Context
A team operates an agent that attempts many similar tasks over time, such as a coding agent solving one programming problem after another or a research assistant answering successive user queries on related topics. Each task is a separate episode and the agent forgets everything between them. The team would like the agent to get better at the kinds of mistakes it has made before, but they cannot afford to fine-tune model weights with reinforcement learning every time a new failure mode shows up.
Problem
A stateless agent repeats the same mistakes across episodes because it has no memory of having made them before. The information about what went wrong last time exists, briefly, at the end of the last episode and is then thrown away with the conversation. Full reinforcement learning would in principle close the loop but is too expensive to run per failure for most teams, and changing weights is irreversible in ways that small everyday corrections do not warrant. The team needs a way to carry lessons from one episode to the next without touching model weights, but a naive 'remember everything' store quickly accumulates noise that misguides future runs more than it helps.
Forces
- Lesson quality is bounded by the model's self-critique ability.
- Lesson retrieval (which lesson applies?) is a search problem.
- Lesson rot: outdated lessons may misguide once the world changes.
Example
An agent solving programming-contest problems repeatedly trips over off-by-one in inclusive ranges. After each episode it writes a one-paragraph lesson keyed to 'range parsing' and stores it in long-term memory. On the next problem that mentions inclusive bounds, the relevant lesson is retrieved and prepended to the prompt. Same model, no fine-tune; pass-rate on that error class climbs because the agent now reads its own past lessons before writing code.
Diagram
Solution
Therefore:
After each episode, the agent reflects on success/failure and writes a verbal lesson. Lessons are stored in long-term memory keyed by task type. Future episodes retrieve relevant lessons and prepend them to context.
What this pattern forbids. Lessons are appended, not overwritten; old lessons are explicitly retired rather than silently deleted.
The smaller patterns that complete this one —
- generalisesAgentic Context Engineering Playbook·— Treat the agent's system prompt and long-lived memory as a structured, item-addressable playbook that evolves through small delta updates from a Generator/Reflector/Curator loop, so accumulated tactics resist the context collapse that monolithic rewrites cause.
And the patterns that stand alongside it, or against it —
- complementsEpisodic Summaries★★— Compress past episodes into summaries that preserve gist while shedding token cost.
- alternative-toDarwin-Gödel Self-Rewrite·— An agent rewrites its own source code, archives every successful variant, and samples mutation parents from the archive rather than the latest version, using archive diversity as stepping-stones to escape local optima.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.