II · Planning & Control FlowEmerging

Tool-Result Reinforcement

also known as Reinforcement (tool-return), Enriched Tool Return, Tool-Return Re-grounding

Append a goal reminder, current task status, and failure or next-step hints to each tool return so the agent is re-grounded through the action channel it already reads.

Context

An agent runs a long tool-call loop where most of the content the model reads back is the result of its own actions: a file dumped, a query answered, an HTTP body, an error trace. Across dozens of turns these raw returns dominate the active window while the standing goal, the place in the plan, and the lessons of earlier failures sit far up the history. The harness, however, owns the function that hands each tool result back to the model, so it can decide exactly what text that result carries.

Problem

A tool return that carries only the raw result tells the model what happened but not why it was doing it or what to do next, and after a failed call it often returns just the raw error with no steer away from the dead end. Restating orientation only in the system prompt or an injected block leaves the goal competing with a wall of tool output the model is actively reading, and re-reading the whole history each turn is expensive. The orientation needs to ride the very channel the model attends to most: the tool result itself.

Forces

  • The tool return is the freshest, highest-attention text the model reads on an action turn, yet a raw result spends none of that attention on orientation.
  • Padding every return with reminders, status, and hints costs tokens on each call, while a bare return costs a drifted plan or a repeated dead-end after a failure.
  • A return that restates the whole plan reintroduces the bloat the loop was meant to avoid, while one too terse omits the next step the model needs.
  • Reminders derived from live run state stay honest, whereas a static reminder copied onto every return goes stale as the task advances.

Example

An agent is asked to fix the failing tests across a service. Each test-runner call returns a long failure log; on its own that log tells the model what broke but not what it set out to do. With tool-result reinforcement, every run-tests return ends with a short block — goal: get the suite green; status: 6 of 9 files passing; last failure: import error in billing, try fixing the path before rerunning — so the agent keeps working the queue and stops re-running the same broken call.

Diagram

Solution

Therefore:

Route every tool result through a wrapper the harness controls before it reaches the model. The wrapper keeps the raw output and appends a small reinforcement block derived from the live run: the standing goal in one line, a one-line status of progress so far, and, on a failed call, a hint about what did not work and what to try next. On a successful call the block carries the goal and the immediate next step instead. The block is recomputed from the current run on each return rather than copied from the previous one, so it tracks progress, and it stays to a fixed small budget so it never swamps the result. Because the reminder rides the tool return, the model meets it at the exact point it is reasoning about what to do next, without a separate injection slot or a re-read of the history.

What this pattern forbids. The reinforcement block appended to a tool return may not alter or replace the raw result, and it must be re-derived from the live run on each return; a reminder carried over verbatim from a previous turn or one that overwrites the tool's actual output is a harness bug.

And the patterns that stand alongside it, or against it —

  • complementsStanding State InjectionRecompute a compact task-state snapshot each turn and inject it as a fresh system block before the model reasons, so a long tool-call loop stays oriented on the goal.
  • alternative-toAction Selector PatternEliminate the feedback channel from tool outputs back into the agent's reasoning step by having the agent select actions from a fixed catalog rather than free-form generation over tool output.
  • complementsTool Output Trusted VerbatimAnti-pattern: trust whatever tools return without validation, schema enforcement, or trust labels.
  • complementsNow-Anchoring·Ground the agent's reasoning in the current absolute time without requiring tool calls, so every reply is implicitly time-aware.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

Provenance