Anti-Patterns

Goal Hijacking

Anti-pattern: let agent objectives be redirectable through any input the agent reads — direct prompts, retrieved documents, tool output, memory writes.

Problem

When the model decides which inputs count as instructions, an attacker who controls any reachable input — a webpage the agent fetches, a comment in a document, an email it summarises — can plant an instruction that redirects the agent's goal. The tool-equipped autonomy that makes the agent useful becomes the foothold: a hijacked goal now has API keys, write access, and the operator's trust.

Solution

Don't. Adopt explicit goal-isolation: only the principal's signed prompt can set or change the agent's goal. Treat all retrieved content, tool output, and memory reads as data, not as instructions. Apply prompt-injection-defense, dual-llm-pattern (a privileged planner that never reads untrusted content), and capability-bounded-execution. See also memory-poisoning for the persistent variant.

When to use

Cite this entry in threat models to label objective redirection through any channel the agent reads.
You are exposed if retrieved content, tool output, or memory writes can change what the agent is trying to do.
Defend with prompt-injection-defense, dual-llm-pattern, and least-privilege tool scopes so a hijack has bounded blast radius.

Open the full interactive page →

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Problem

Solution

When to use

Related