Retrieval-Saturation Tool Attack
also known as Tool-Retrieval Saturation, Semantic-Covering Tool Hiding
Anti-pattern: trust a tool-retrieval layer to surface tools, while an adversary injects a few crafted tools whose embeddings cover the query space and saturate the top-k, so benign tools never reach the agent's context.
Context
An agent with a large or open tool registry does not put every tool in context; a retrieval layer ranks tools by similarity to the request and loads only the top-k. Tools can be contributed from outside the trust boundary — a marketplace, an MCP server, a plugin ecosystem — so the registry is not fully curated. The agent acts on whatever tools that retrieval step returns.
Problem
An adversary who can register tools does not need to defeat the agent's selection or output handling; they can attack the retrieval step itself. By crafting a few tools whose embeddings are placed to cover the query space, the attacker makes those tools rank at the top for almost any request, saturating the top-k so the benign tools the agent needs are pushed out and never loaded. The agent then chooses only from attacker-controlled tools, and every selection-time and output-time defense downstream is bypassed because the safe options were never in context to begin with.
Forces
- Retrieving only the top-k tools is necessary to fit context, but it creates a scarce slot set an attacker can compete for.
- Embedding similarity can be gamed: a few tools placed to cover the query space rank highly for almost any request.
- Open or marketplace tool registries accept contributions from outside the trust boundary, so an attacker can inject tools at all.
- Defenses that act at selection or output time are downstream of retrieval, so they never see the benign tools that retrieval dropped.
Example
An agent loads tools from an open MCP marketplace, retrieving the top eight by similarity for each task. An attacker publishes three tools with descriptions engineered to match almost any request. From then on, nearly every task retrieves the attacker's tools in the top slots and pushes the legitimate ones out, so the agent only ever sees tools the attacker controls — without any prompt injection in the conversation itself.
Diagram
Solution
Therefore:
Treat tool retrieval as an attack surface, not a neutral ranking. Vet and trust-rank registered tools so contributions from outside the trust boundary cannot rank as freely as vetted ones, and cap how many of the top-k slots any single contributor or low-trust source can occupy so a few injected tools cannot fill the result. Monitor the embedding space for tools placed to cover the query space — a hallmark of a saturation attack — and exclude or downrank them. Guarantee a path for the benign tools a request needs to reach context, for example by reserving slots for vetted tools or retrieving from a trusted subset first. The retrieval layer itself has to be defended, because selection-time and output-time controls cannot protect tools that were never loaded.
What this pattern forbids. The tool-retrieval layer must not be trusted to return a safe set on ranking alone; contributions are trust-ranked, no single low-trust source may occupy the whole top-k, embedding-covering tools are detected and downranked, and benign tools cannot be entirely crowded out.
The patterns that counter or replace it —
- complementsTool Output Poisoning Defense★— Treat tool output as untrusted content and apply instruction-stripping plus per-tool trust labels.
- complementsHallucinated Tools✕— Anti-pattern: trust the model to invoke only the tools it has been given, then debug calls to functions that do not exist.
- complementsTool Search Lazy Loading★— Defer loading tool schemas into the context window until a search step shows they are needed.
- complementsPrompt Injection Defense★— Tag user-supplied or tool-supplied content as untrusted and refuse to follow instructions found inside it.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.