Methodology · Agent Constructionemergingverified

MAESTRO Threat Modeling

also known as agent threat modeling with MAESTRO, MAESTRO security pass

Applies to: agentmulti-agent-system

Tags: securitythreat-modelingmaestrored-teamagent-safety

Run an agent-specific security review before you ship, using the MAESTRO categories. They are: risks from the model itself, threats to the data, attacks from inside and outside the system, and red-team probing. This takes classic threat modeling, such as STRIDE and LINDDUN, and adapts it to how agents get attacked. That includes prompt injection, memory poisoning, tool misuse, data theft, and agent-on-agent attacks. The output is a list of threats, each one tied to a concrete defence.

Methodology process overview

flowchart TD arch[Agent architecture] --> s1[Enumerate foundation-model threats] cls[Data classification] --> s2[Enumerate data-protection threats] adv[Adversary model] --> s3[Enumerate internal threats] arch --> s4[Enumerate external threat vectors] s1 --> s5[Map defensive controls to each threat] s2 --> s5 s3 --> s5 s4 --> s5 s5 --> s6[Red-team agent against catalogue] s6 --> out1[Threat catalogue] s6 --> out2[Defensive control map] s6 --> out3[Red-team report]

Intent. Replace a generic security review with an agent-aware one that lists the attack types specific to agents and pairs each with a concrete defence before you ship.

When to apply. Run this before any agent goes to production, above all when it has tool access, a free-running loop, multi-agent coordination, or sensitive data. Run it again after every real change in capability. Do not use it instead of normal application security. MAESTRO adds to AppSec, it does not replace it. Don't apply it when the system really is a single-turn chat with no tools, no memory, and no access to private data. There is no agent attack surface to model.

Example scenario

A SaaS company is getting ready to ship its first agent feature. It is an autonomous email-triage assistant that reads customer support inboxes, drafts replies, and can update Zendesk tickets. Security agreed to ship only after a MAESTRO pass. Inputs: the four-component agent architecture (a GPT-4 model; three tools, gmail.read, gmail.search, and zendesk.update; pgvector memory of past tickets; an autonomous loop), a data classification noting that 12% of tickets contain PII and 0.5% contain payment data, and an adversary model covering outside attackers who craft malicious customer emails, malicious vendors in the supply chain, and curious internal staff. Model threats: prompt injection through crafted ticket bodies, and the model leaking its system prompt. They handled these with input filtering and a dual-LLM split. Data threats: PII leaving the model and landing in Zendesk drafts. They paired this with output filtering and a redaction step. Internal threats: memory poisoning, where a malicious customer plants false 'preference' facts that bias future replies. They handled this with signed write provenance on the memory store and a human in the loop on any reply that cites a memory entry younger than 24h. External threats: a supply-chain compromise of the embedding provider. They pinned the vendor and added a fallback. Red-team probe: they sent 40 adversarial emails through a sandboxed instance. 3 managed to extract hidden data, and all were closed before launch. The team learned that 'we have AppSec' was not enough. The memory-poisoning class of threats simply did not exist on their earlier systems and needed brand-new controls.

Inputs

Agent architecture — The model, tools, memory, loop, and integrations. Together they make up the attack surface.
Data classification — What data the agent reads, writes, or sends, and how sensitive each piece is.
Adversary model — Who would attack this agent and what they want: an outside attacker, a malicious user, a compromised tool, or a hostile agent.

Outputs

Threat catalogue — A list of threats, grouped by MAESTRO category, each with a severity and a likelihood.
Defensive control map — For each high-priority threat, the specific defence you put in place: a pattern, a guardrail, or an isolation boundary.
Red-team report — What you found when you attacked the agent yourself against the listed threats.

Steps (6)

Enumerate foundation-model threats
List the risks that come from the model itself: jailbreaks, prompt injection, hallucination, sycophancy, and pulling data out of training. Decide which ones you must defend against.
usesPrompt Injection Defense Input/Output Guardrails
Enumerate data-protection threats
Find where sensitive data enters, sits in, or leaves the agent. Track PII, secrets, and proprietary content across context, memory, tool calls, and outputs.
usesPII Redaction Secrets Handling
Enumerate internal threats
List threats from inside the agent system: memory poisoning, agent scheming, hidden mode switching, role drift, and agent-on-agent attacks.
usesSandbox Isolation Subagent Isolation
Enumerate external threat vectors
List threats from outside: a compromised tool or model in the supply chain, hostile tool outputs, malicious documents in a retrieval corpus, and network attacks.
Map defensive controls to each threat
For each high-priority threat, name the defence: input filtering, output filtering, a sandbox, a dual-LLM split, an allowlist, a human in the loop, or a policy gate.
usesDual LLM Pattern Policy-Gated Agent Action (KRITIS)Human-in-the-Loop
Red-team the agent against the catalogue
Attack the live agent with hostile prompts, malicious tool outputs, and planted memory entries. Write down where the defences held and where they broke.
usesRed-Team Sandbox Reproduction

Framework-specific instructions

Pick a framework and generate a framework-targeted rewrite of this methodology's steps.

Choose framework

AI-generated for Agent Development Kit (ADK) (Google) — verify against official docs.

Principles

Threat modeling for agents is not generic AppSec. It must list agent-specific threats.
Every listed threat is paired with a concrete defence or a written note that you accept the risk.
Red-team probing is part of the method, not an optional follow-up.
Re-run the pass after every real change in capability: new tools, new data, more freedom.

Known failure modes (4)

Related patterns (8)

Related compositions (1)

recipe · abstract shape
Safety Hardening
The minimum set of constraints to put around any production agent before it touches the world: budgets, gates, charters, kill-switches, approvals.

Related methodologies (2)

Sources (2)

Provenance

Added to catalog: 2026-05-24
Last updated: 2026-05-27
Verification status: verified

Methodology process overview

Steps (6)

Enumerate foundation-model threats

Enumerate data-protection threats

Enumerate internal threats

Enumerate external threat vectors

Map defensive controls to each threat

Red-team the agent against the catalogue