Tool Output Trusted Verbatim

also known as Untyped Tool Returns, No Tool Output Validation

Anti-pattern: trust whatever tools return without validation, schema enforcement, or trust labels.

Context

A team is building an agent that calls tools and then feeds their output back into the model as if it were a fact. The implementation accepts whatever the tool returns at face value: no schema validation, no size limit, no trust labelling, no escape pass over instruction-shaped content. The implicit assumption is that the tool is honest, returns well-formed JSON, and stays within content limits.

Problem

Real-world tools do not behave that way. They return errors as HTTP 200 OK with a JSON body of {"error": ...} that the agent confuses for a successful result. They return multi-megabyte responses that blow the context window. They return HTML with embedded scripts, or text with embedded prompt-injection payloads instructing the agent to ignore its previous instructions. By trusting every byte of tool output verbatim, the agent loses control over both its context budget and its safety boundary, and a misbehaving or hijacked tool can quietly redirect the agent.

Forces

Validation feels like duplicate work when typed function calls exist.
Schema enforcement requires per-tool work.
Size limits are tool-specific.

Example

A team's agent treats every tool response as trusted gospel, with schema validation off, size cap off, no trust labels. Real tools then do what real tools do: a 200 OK with `{error: 'rate limit'}`, a 12MB HTML blob with embedded scripts, a JSON field whose 'description' contains a prompt-injection payload. The agent ingests it all and misbehaves. They stop doing this and validate, cap, sanitise, and apply tool-output-poisoning defenses at the boundary.

Diagram

flowchart TD T[Tool returns response] --> Tr{Trust verbatim?} Tr -- yes anti-pattern --> Ing[Ingest into context as-is] Ing --> Bad[200 OK errors / oversized blob /<br/>injected instructions / scripts] Bad --> Mis[Agent misbehaves silently] Tr -.fix.-> Val[Validate against schema] Val --> Cap[Cap size + sanitise] Cap --> Lab[Attach trust label]

Solution

Therefore:

Don't. Validate every tool result against a schema. Cap response size. Sanitise HTML. Apply tool-output-poisoning defenses. See tool-output-poisoning, structured-output, input-output-guardrails.

What this pattern forbids. Avoiding it imposes a trust boundary at every tool return: results must not flow into context unvalidated; each one is schema-checked, size-capped, sanitised, and labeled with its trust level before the model reads it.

The patterns that counter or replace it —

alternative-toTool Output Poisoning Defense★— Treat tool output as untrusted content and apply instruction-stripping plus per-tool trust labels.
alternative-toStructured Output★★— Constrain the model's output to conform to a JSON Schema (or similar typed shape).
alternative-toInput/Output Guardrails★★— Validate inputs before they reach the model and outputs before they reach the user.
complementsMemo-As-Source Confusion✕— Anti-pattern: the agent cites its own past memos as ground truth instead of re-verifying them against the artifacts they describe, accumulating false confidence in stale summaries.
complementsGoal Hijacking✕— Anti-pattern: let agent objectives be redirectable through any input the agent reads — direct prompts, retrieved documents, tool output, memory writes.
complementsControl-Flow Integrity★— Treat the agent's planned step sequence as a trusted control-flow graph that tool outputs, retrieved content, and user-supplied data cannot redirect at runtime.
complementsFalse Resolution✕— The agent proposes a compromise that addresses each constraint individually but subtly violates one in joint interpretation, shipping as success but discovered as failure at audit.
complementsSilent External-Source Rot✕— Anti-pattern: an agent keeps reporting success while a wrapped external source has silently changed structure, so its tool returns valid-but-empty or degraded output that nothing watches.
complementsTool-Output Arithmetic Trust✕— Anti-pattern: the agent compares, ranks, or sums correctly returned tool data in its own head instead of offloading the computation to a deterministic tool, emitting confident wrong aggregates.
complementsTool-Result Reinforcement★— Append a goal reminder, current task status, and failure or next-step hints to each tool return so the agent is re-grounded through the action channel it already reads.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

OWASP LLM01: Prompt Injection
spec

Provenance

Source: patterns/tool-output-trusted-verbatim.md on GitHub · commit 4fa1213 · view history
Added to catalog: 2026-04-30
Last updated: 2026-05-21
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.