Self-Refine
also known as Iterative Self-Feedback
Iterate generate → feedback (same model) → refine until a stop criterion fires, with no separate critic model.
This pattern helps complete certain larger patterns —
- specialisesReflection★★— Have the model review its own output and produce a revised version in one or more passes.
Context
A team runs a generation task (a piece of writing, a code snippet, a dialogue response) on a single large language model and has no second, independent model available to act as a critic. The team has, however, an explicit improvement target for the task: a short checklist, a quality rubric, or a definition of what 'better' means in this domain. The same model is capable of producing useful feedback against that target when given the draft and the checklist.
Problem
Running the model in one shot leaves quality on the table, but simply asking the same model in a follow-up prompt 'is this any good?' tends to produce vague praise that does not improve the draft. Without a clear separation between generating, critiquing, and revising, the model collapses the three jobs into one and ends up either making the draft worse with random rewrites or declaring it fine on the second look. A loop without a stop criterion runs forever; a loop with no structure produces drift instead of refinement. The team needs the same model to play three distinct roles in sequence, bounded by a clear termination condition.
Forces
- Same-model critique inherits the model's blind spots.
- Termination criterion is its own design.
- Cost grows linearly with iterations.
Example
A coding agent writes a function that compiles but uses an awkward API surface. Running through a Self-Refine loop where the same model produces concrete improvement points against a checklist (clarity, names, error handling), then refines, yields a noticeably cleaner function in the second pass. The team caps it at three iterations or a no-op feedback signal, accepting that self-critique catches surface issues only and not deep correctness bugs.
Diagram
Solution
Therefore:
Three roles, one model. (1) Generate: produce initial output. (2) Feedback: same model returns concrete improvement points against a fixed target. (3) Refine: same model rewrites using the feedback. Repeat until the model says 'no more issues' or max iterations.
What this pattern forbids. Feedback must conform to the chosen target; revisions must address the most recent feedback.
And the patterns that stand alongside it, or against it —
- alternative-toEvaluator-Optimizer★★— One LLM generates; another evaluates and feeds back; loop until criteria are met.
- conflicts-withSame-Model Self-Critique✕— Anti-pattern: have the same model both produce an answer and critique it, expecting independence.
- alternative-toAgentic Context Engineering Playbook·— Treat the agent's system prompt and long-lived memory as a structured, item-addressable playbook that evolves through small delta updates from a Generator/Reflector/Curator loop, so accumulated tactics resist the context collapse that monolithic rewrites cause.
- alternative-toDarwin-Gödel Self-Rewrite·— An agent rewrites its own source code, archives every successful variant, and samples mutation parents from the archive rather than the latest version, using archive diversity as stepping-stones to escape local optima.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.