MAESTRO Threat Modeling
also known as agent threat modeling with MAESTRO, MAESTRO security pass
Run an agent-specific security review before you ship, using the MAESTRO categories. They are: risks from the model itself, threats to the data, attacks from inside and outside the system, and red-team probing. This takes classic threat modeling, such as STRIDE and LINDDUN, and adapts it to how agents get attacked. That includes prompt injection, memory poisoning, tool misuse, data theft, and agent-on-agent attacks. The output is a list of threats, each one tied to a concrete defence.
Methodology process overview
Intent. Replace a generic security review with an agent-aware one that lists the attack types specific to agents and pairs each with a concrete defence before you ship.
When to apply. Run this before any agent goes to production, above all when it has tool access, a free-running loop, multi-agent coordination, or sensitive data. Run it again after every real change in capability. Do not use it instead of normal application security. MAESTRO adds to AppSec, it does not replace it. Don't apply it when the system really is a single-turn chat with no tools, no memory, and no access to private data. There is no agent attack surface to model.
Inputs
- Agent architecture — The model, tools, memory, loop, and integrations. Together they make up the attack surface.
- Data classification — What data the agent reads, writes, or sends, and how sensitive each piece is.
- Adversary model — Who would attack this agent and what they want: an outside attacker, a malicious user, a compromised tool, or a hostile agent.
Outputs
- Threat catalogue — A list of threats, grouped by MAESTRO category, each with a severity and a likelihood.
- Defensive control map — For each high-priority threat, the specific defence you put in place: a pattern, a guardrail, or an isolation boundary.
- Red-team report — What you found when you attacked the agent yourself against the listed threats.
Steps (6)
Enumerate foundation-model threats
List the risks that come from the model itself: jailbreaks, prompt injection, hallucination, sycophancy, and pulling data out of training. Decide which ones you must defend against.
Enumerate data-protection threats
Find where sensitive data enters, sits in, or leaves the agent. Track PII, secrets, and proprietary content across context, memory, tool calls, and outputs.
Enumerate internal threats
List threats from inside the agent system: memory poisoning, agent scheming, hidden mode switching, role drift, and agent-on-agent attacks.
Enumerate external threat vectors
List threats from outside: a compromised tool or model in the supply chain, hostile tool outputs, malicious documents in a retrieval corpus, and network attacks.
Map defensive controls to each threat
For each high-priority threat, name the defence: input filtering, output filtering, a sandbox, a dual-LLM split, an allowlist, a human in the loop, or a policy gate.
usesDual LLM PatternPolicy-Gated Agent Action (KRITIS)Human-in-the-Loop
Red-team the agent against the catalogue
Attack the live agent with hostile prompts, malicious tool outputs, and planted memory entries. Write down where the defences held and where they broke.
Framework-specific instructions
Pick a framework and generate a framework-targeted rewrite of this methodology's steps.
Choose framework
AI-generated for Agent Development Kit (ADK) (Google) — verify against official docs.
Principles
- Threat modeling for agents is not generic AppSec. It must list agent-specific threats.
- Every listed threat is paired with a concrete defence or a written note that you accept the risk.
- Red-team probing is part of the method, not an optional follow-up.
- Re-run the pass after every real change in capability: new tools, new data, more freedom.
Known failure modes (4)
- ✕Memory Poisoning
Memory threat was acknowledged in the catalogue but not paired with an authentication or quorum control.
- ✕Agent Privilege Escalation
Tool-access threat was modelled but no policy-gating control was added before deployment.
- ✕Agentic Supply Chain Compromise
External-threat enumeration missed the tool-provider supply chain.
- ✕Tool Output Trusted Verbatim
Tool outputs treated as trustworthy because no defensive control was tied to that specific threat.
Related patterns (8)
- ★Prompt Injection Defense
Tag user-supplied or tool-supplied content as untrusted and refuse to follow instructions found inside it.
- ★★Input/Output Guardrails
Validate inputs before they reach the model and outputs before they reach the user.
- ★Dual LLM Pattern
Split agent work between a privileged model that holds tool access and a quarantined model that reads untrusted content, exchanging only opaque references between them.
- ★Policy-Gated Agent Action (KRITIS)
Each agent action passes through a policy gate (NIS2, EU the agent Act, BSI rules) and is tagged with Run ID + Model Digest + Policy Hash for WORM-audit reconstruction.
- ★★Sandbox Isolation
Run agent-emitted code or actions in a contained environment with restricted filesystem, network, and process privileges.
- ★Subagent Isolation
Run subagents in isolated workspaces so their writes do not collide and parallelism is safe.
- ★★Human-in-the-Loop
Require explicit human approval at defined points before the agent performs an action.
- ★Red-Team Sandbox Reproduction
Routinely re-reproduce canonical alignment-failure modes inside a sealed sandbox per release; treat the alignment regression suite as a deployment gate.
Related compositions (1)
Related methodologies (2)
- Agentic Workflow Construction★★
Make agent authors name the four parts and the freedom level before they code, so a failure points to one part instead of smearing across a vague agent.
- Deferential Agent Design★
Build agents whose goal is to satisfy human preferences they only partly know, not to chase a fixed proxy, so they stay deferential and correctable by default.
Sources (2)
Building Applications with AI Agents
Ch 12 'Protecting Agentic Systems' “Emerging threat vectors specific to agentic systems ... Red Teaming and threat modeling with MAESTRO ... Data privacy, encryption, and provenance ... Internal and external safeguards”
Agentic AI Threat Modeling Framework: MAESTRO
“MAESTRO (Multi-Agent Environment, Security, Threat, Risk, & Outcome) ... a novel threat modeling framework designed specifically for the unique challenges of Agentic AI”
Provenance
- Added to catalog:
- Last updated:
- Verification status: verified