Methodology · Agent Constructionemergingverified

MAESTRO Threat Modeling

also known as agent threat modeling with MAESTRO, MAESTRO security pass

Applies to: agentmulti-agent-system

Tags: securitythreat-modelingmaestrored-teamagent-safety

Run an agent-specific security review before you ship, using the MAESTRO categories. They are: risks from the model itself, threats to the data, attacks from inside and outside the system, and red-team probing. This takes classic threat modeling, such as STRIDE and LINDDUN, and adapts it to how agents get attacked. That includes prompt injection, memory poisoning, tool misuse, data theft, and agent-on-agent attacks. The output is a list of threats, each one tied to a concrete defence.

Methodology process overview

Intent. Replace a generic security review with an agent-aware one that lists the attack types specific to agents and pairs each with a concrete defence before you ship.

When to apply. Run this before any agent goes to production, above all when it has tool access, a free-running loop, multi-agent coordination, or sensitive data. Run it again after every real change in capability. Do not use it instead of normal application security. MAESTRO adds to AppSec, it does not replace it. Don't apply it when the system really is a single-turn chat with no tools, no memory, and no access to private data. There is no agent attack surface to model.

Inputs

  • Agent architectureThe model, tools, memory, loop, and integrations. Together they make up the attack surface.
  • Data classificationWhat data the agent reads, writes, or sends, and how sensitive each piece is.
  • Adversary modelWho would attack this agent and what they want: an outside attacker, a malicious user, a compromised tool, or a hostile agent.

Outputs

  • Threat catalogueA list of threats, grouped by MAESTRO category, each with a severity and a likelihood.
  • Defensive control mapFor each high-priority threat, the specific defence you put in place: a pattern, a guardrail, or an isolation boundary.
  • Red-team reportWhat you found when you attacked the agent yourself against the listed threats.

Steps (6)

  1. Enumerate foundation-model threats

    List the risks that come from the model itself: jailbreaks, prompt injection, hallucination, sycophancy, and pulling data out of training. Decide which ones you must defend against.

    usesPrompt Injection DefenseInput/Output Guardrails

  2. Enumerate data-protection threats

    Find where sensitive data enters, sits in, or leaves the agent. Track PII, secrets, and proprietary content across context, memory, tool calls, and outputs.

    usesPII RedactionSecrets Handling

  3. Enumerate internal threats

    List threats from inside the agent system: memory poisoning, agent scheming, hidden mode switching, role drift, and agent-on-agent attacks.

    usesSandbox IsolationSubagent Isolation

  4. Enumerate external threat vectors

    List threats from outside: a compromised tool or model in the supply chain, hostile tool outputs, malicious documents in a retrieval corpus, and network attacks.

  5. Map defensive controls to each threat

    For each high-priority threat, name the defence: input filtering, output filtering, a sandbox, a dual-LLM split, an allowlist, a human in the loop, or a policy gate.

    usesDual LLM PatternPolicy-Gated Agent Action (KRITIS)Human-in-the-Loop

  6. Red-team the agent against the catalogue

    Attack the live agent with hostile prompts, malicious tool outputs, and planted memory entries. Write down where the defences held and where they broke.

    usesRed-Team Sandbox Reproduction

Framework-specific instructions

Pick a framework and generate a framework-targeted rewrite of this methodology's steps.

Choose framework

AI-generated for Agent Development Kit (ADK) (Google) — verify against official docs.

Principles

  • Threat modeling for agents is not generic AppSec. It must list agent-specific threats.
  • Every listed threat is paired with a concrete defence or a written note that you accept the risk.
  • Red-team probing is part of the method, not an optional follow-up.
  • Re-run the pass after every real change in capability: new tools, new data, more freedom.

Known failure modes (4)

Related patterns (8)

Related compositions (1)

Related methodologies (2)

Sources (2)

Provenance

  • Added to catalog:
  • Last updated:
  • Verification status: verified