Tool Output Trusted Verbatim
also known as Untyped Tool Returns, No Tool Output Validation
Anti-pattern: trust whatever tools return without validation, schema enforcement, or trust labels.
Context
A team is building an agent that calls tools and then feeds their output back into the model as if it were a fact. The implementation accepts whatever the tool returns at face value: no schema validation, no size limit, no trust labelling, no escape pass over instruction-shaped content. The implicit assumption is that the tool is honest, returns well-formed JSON, and stays within content limits.
Problem
Real-world tools do not behave that way. They return errors as HTTP 200 OK with a JSON body of {"error": ...} that the agent confuses for a successful result. They return multi-megabyte responses that blow the context window. They return HTML with embedded scripts, or text with embedded prompt-injection payloads instructing the agent to ignore its previous instructions. By trusting every byte of tool output verbatim, the agent loses control over both its context budget and its safety boundary, and a misbehaving or hijacked tool can quietly redirect the agent.
Forces
- Validation feels like duplicate work when typed function calls exist.
- Schema enforcement requires per-tool work.
- Size limits are tool-specific.
Example
A team's agent treats every tool response as trusted gospel, with schema validation off, size cap off, no trust labels. Real tools then do what real tools do: a 200 OK with `{error: 'rate limit'}`, a 12MB HTML blob with embedded scripts, a JSON field whose 'description' contains a prompt-injection payload. The agent ingests it all and misbehaves. They stop doing this and validate, cap, sanitise, and apply tool-output-poisoning defenses at the boundary.
Diagram
Solution
Therefore:
Don't. Validate every tool result against a schema. Cap response size. Sanitise HTML. Apply tool-output-poisoning defenses. See tool-output-poisoning, structured-output, input-output-guardrails.
What this pattern forbids. By definition, this anti-pattern imposes no useful constraint; the missing validation is the failure.
And the patterns that stand alongside it, or against it —
- alternative-toTool Output Poisoning Defense★— Treat tool output as untrusted content and apply instruction-stripping plus per-tool trust labels.
- alternative-toStructured Output★★— Constrain the model's output to conform to a JSON Schema (or similar typed shape).
- alternative-toInput/Output Guardrails★★— Validate inputs before they reach the model and outputs before they reach the user.
- complementsMemo-As-Source Confusion★— Anti-pattern: the agent cites its own past memos as ground truth instead of re-verifying them against the artifacts they describe, accumulating false confidence in stale summaries.
- complementsGoal Hijacking✕— Anti-pattern: let agent objectives be redirectable through any input the agent reads — direct prompts, retrieved documents, tool output, memory writes.
- complementsControl-Flow Integrity★— Treat the agent's planned step sequence as a trusted control-flow graph that tool outputs, retrieved content, and user-supplied data cannot redirect at runtime.
- complementsFalse Resolution✕— The agent proposes a compromise that addresses each constraint individually but subtly violates one in joint interpretation, shipping as success but discovered as failure at audit.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.