Methodology · Agent Constructionemergingverified

Plan-Reason-Evaluate-Feedback Loop

also known as PREF loop, planning-reasoning-evaluation-feedback construction

Applies to: agentcoding-agent

Tags: control-loopplanningreasoningevaluationfeedback

Build the agent's control logic as a loop with four stages. Plan drafts a candidate approach. Reason fills it in using chain-of-thought or tree-of-thoughts. Evaluate scores the result, using self-consistency or a judge. Feedback hands the lessons back to Plan for the next round. Make each stage its own step with its own metrics, so the team can tune them one at a time. The thing to avoid is one giant prompt trying to do all four jobs and doing each one badly.

Methodology process overview

flowchart LR task[Task specification] --> plan[Plan stage] evalDef[Evaluator definition] --> eval[Evaluate stage] plan --> reason[Reason stage] reason --> eval eval --> fb[Feedback stage] fb -- accept --> done[Stop / emit result] fb -- replan --> plan fb -- reject with reasons --> reason plan -.-> tel[Per-stage metrics] reason -.-> tel eval -.-> tel fb -.-> tel budget[Step budget] --> fb

Intent. Split the agent's control loop into Plan, Reason, Evaluate, and Feedback so each one can be written, tested, and tuned on its own instead of crammed into a single prompt.

When to apply. Use this when the task is hard enough to need real planning and self-checking: research agents, coding agents, and multi-step problem solvers. It helps most when one big ReAct prompt has stopped improving. Don't apply it for single-shot generators and simple tool callers, where four stages on a trivial task is overkill. Skip it too when your latency budget cannot absorb the extra round-trips.

Example scenario

A platform team is building an internal research-assistant agent. It produces market-sizing memos for the strategy department. Their first big ReAct prompt stalled at memo-quality 6/10 on the strategy team's rubric. They split the control loop into PREF stages. Inputs: a task spec ('produce a market-sizing memo for product X in region Y') and an evaluator: a 12-criterion rubric scored by a separate Claude model with the strategy team's example memos in context. The Plan stage emits two candidate approaches using multi-path plan generation. The Reason stage fills in each one with tree-of-thoughts, including reasoning about source quality. The Evaluate stage scores both against the rubric, running it three times for self-consistency. The Feedback stage either accepts the winner, rejects it with reasons tied to specific rubric criteria (such as 'sizing assumption unsupported'), or asks Plan to try a different angle. The iteration budget is three loops. What they learned: most of the gain came from making Evaluate independent. The same-model self-critique they started with always rated the memo at 8/10, no matter the quality. Switching to a different model with the rubric in context lowered the average rated score but raised the real quality, as judged by humans, from 6/10 to 8.4/10. The per-stage metrics showed that Reason was the slowest stage, but Plan was where the biggest quality jumps came from. That sent their tuning effort to the right place.

Inputs

Task specification — What the agent must do, in a form you can break into sub-steps.
Evaluator definition — The judge, rubric, or self-consistency check that will score what Reason produces.

Outputs

Four-stage control loop — Plan, Reason, Evaluate, and Feedback as four separate stages, each with its own metrics.
Stage-level metrics — A success or failure signal for each stage that your telemetry can graph and alert on.

Steps (6)

Author the Plan stage
Draft a candidate plan or approach. Use single-path or multi-path plan generation depending on cost and risk.
usesPlan-and-Execute Single-Path Plan Generator Multi-Path Plan Generator
Author the Reason stage
Fill in the plan with chain-of-thought or tree-of-thoughts. Reasoning generates new detail. Do not fold it back into Plan.
usesChain of Thought Tree of Thoughts
Author the Evaluate stage
Score what Reason produced. Use self-consistency, an LLM judge, or a rubric. Evaluate must be independent enough to disagree.
usesSelf-Consistency Agent-as-a-Judge Evaluator-Optimizer
Author the Feedback stage
Turn the score into a clear next step for the planner: accept, reject with reasons, replan, or escalate. Feedback is what closes the loop.
usesReflexion Evaluator-Optimizer
Instrument each stage independently
Emit traces and metrics scoped to each stage. You cannot tune the four stages if they share one set of telemetry.
Bound iterations
Add a max-iterations budget and a test for when to stop. Otherwise the loop can bounce between plan and reflection forever.
usesStep Budget

Framework-specific instructions

Pick a framework and generate a framework-targeted rewrite of this methodology's steps.

Choose framework

AI-generated for Agent Development Kit (ADK) (Google) — verify against official docs.

Principles

Plan, Reason, Evaluate, and Feedback are four jobs. Give each its own prompt and metric.
The evaluator must be able to disagree. Same-model self-critique is a failure mode, not a method.
Feedback closes the loop, or it is not feedback.
Bound the iterations. Set a budget and a test for when to stop.

Plan-Reason-Evaluate-Feedback Loop

Methodology process overview

Steps (6)

Author the Plan stage

Author the Reason stage

Author the Evaluate stage

Author the Feedback stage

Instrument each stage independently

Bound iterations

Framework-specific instructions

Principles

Known failure modes (3)

Related patterns (8)

Related compositions (2)

Related methodologies (2)

Sources (2)

Provenance