Tool-Result Reinforcement

also known as Reinforcement (tool-return), Enriched Tool Return, Tool-Return Re-grounding

Append a goal reminder, current task status, and failure or next-step hints to each tool return so the agent is re-grounded through the action channel it already reads.

Context

An agent runs a long tool-call loop where most of the content the model reads back is the result of its own actions: a file dumped, a query answered, an HTTP body, an error trace. Across dozens of turns these raw returns dominate the active window while the standing goal, the place in the plan, and the lessons of earlier failures sit far up the history. The harness, however, owns the function that hands each tool result back to the model, so it can decide exactly what text that result carries.

Problem

A tool return that carries only the raw result tells the model what happened but not why it was doing it or what to do next, and after a failed call it often returns just the raw error with no steer away from the dead end. Restating orientation only in the system prompt or an injected block leaves the goal competing with a wall of tool output the model is actively reading, and re-reading the whole history each turn is expensive. The orientation needs to ride the very channel the model attends to most: the tool result itself.

Forces

The tool return is the freshest, highest-attention text the model reads on an action turn, yet a raw result spends none of that attention on orientation.
Padding every return with reminders, status, and hints costs tokens on each call, while a bare return costs a drifted plan or a repeated dead-end after a failure.
A return that restates the whole plan reintroduces the bloat the loop was meant to avoid, while one too terse omits the next step the model needs.
Reminders derived from live run state stay honest, whereas a static reminder copied onto every return goes stale as the task advances.

Example

An agent is asked to fix the failing tests across a service. Each test-runner call returns a long failure log; on its own that log tells the model what broke but not what it set out to do. With tool-result reinforcement, every run-tests return ends with a short block — goal: get the suite green; status: 6 of 9 files passing; last failure: import error in billing, try fixing the path before rerunning — so the agent keeps working the queue and stops re-running the same broken call.

Diagram

flowchart TD C[Model issues tool call] --> D[Dispatch -> raw result] D --> W[Wrapper: append goal + status + failure/next-step hint, derived from live run] W --> E[Enriched return: raw result + reinforcement block] E --> M[Model reads result and reminder together] M -->|next action| C

Solution

Therefore:

Route every tool result through a wrapper the harness controls before it reaches the model. The wrapper keeps the raw output and appends a small reinforcement block derived from the live run: the standing goal in one line, a one-line status of progress so far, and, on a failed call, a hint about what did not work and what to try next. On a successful call the block carries the goal and the immediate next step instead. The block is recomputed from the current run on each return rather than copied from the previous one, so it tracks progress, and it stays to a fixed small budget so it never swamps the result. Because the reminder rides the tool return, the model meets it at the exact point it is reasoning about what to do next, without a separate injection slot or a re-read of the history.

What it gives you

Orientation reaches the model on the highest-attention text of an action turn, cutting goal drift without a separate injection block.
A failure hint travels with the error, so the model is steered off a dead end on the same turn instead of retrying the same call.
The reinforcement block is a few dozen tokens regardless of history length, so the orientation cost stays flat as the run grows.

What it costs you

A reminder derived from a wrong reading of run state confidently misdirects the model, which trusts the text attached to its own result.
Wrapping every return adds a small fixed cost to each tool call and lengthens the context the model must read.
If the appended block crowds or visually merges with the raw output, the model may confuse the harness reminder for tool-produced content.

What this pattern forbids. The reinforcement block appended to a tool return may not alter or replace the raw result, and it must be re-derived from the live run on each return; a reminder carried over verbatim from a previous turn or one that overwrites the tool's actual output is a harness bug.

And the patterns that stand alongside it, or against it —

complementsStanding State Injection★— Recompute a compact task-state snapshot each turn and inject it as a fresh system block before the model reasons, so a long tool-call loop stays oriented on the goal.
alternative-toAction Selector Pattern★— Eliminate the feedback channel from tool outputs back into the agent's reasoning step by having the agent select actions from a fixed catalog rather than free-form generation over tool output.
complementsTool Output Trusted Verbatim✕— Anti-pattern: trust whatever tools return without validation, schema enforcement, or trust labels.
complementsNow-Anchoring·— Ground the agent's reasoning in the current absolute time without requiring tool calls, so every reply is implicitly time-aware.