Anti-Patterns

Goal Hijacking

Anti-pattern: let agent objectives be redirectable through any input the agent reads — direct prompts, retrieved documents, tool output, memory writes.

Problem

When the model decides which inputs count as instructions, an attacker who controls any reachable input — a webpage the agent fetches, a comment in a document, an email it summarises — can plant an instruction that redirects the agent's goal. The tool-equipped autonomy that makes the agent useful becomes the foothold: a hijacked goal now has API keys, write access, and the operator's trust.

Solution

Don't. Adopt explicit goal-isolation: only the principal's signed prompt can set or change the agent's goal. Treat all retrieved content, tool output, and memory reads as data, not as instructions. Apply prompt-injection-defense, dual-llm-pattern (a privileged planner that never reads untrusted content), and capability-bounded-execution. See also memory-poisoning for the persistent variant.

When to use

  • Never. Cite to label the failure mode in threat models.
  • Use prompt-injection-defense and dual-llm-pattern to separate goal channel from data channel.
  • Enforce least-privilege tool scopes so a hijack has bounded blast radius.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related