Safety & Control

Exception Handling and Recovery

Catch and react to predictable failure modes (tool errors, rate limits, validation failures) with structured recovery paths.

Problem

If the tool layer returns errors as opaque strings stuffed back into the conversation, the agent treats them as text and reacts with whatever the model invents — sometimes a retry, sometimes a confident hallucinated explanation to the user, sometimes a stall. The agent has no way to branch deterministically on a rate-limit versus a validation error, so it cannot back off correctly on the first or replan on the second. Without typed errors and named recovery branches, the team is forced to choose between blanket retries that mask real bugs and giving up on partial-failure handling altogether.

Solution

Catalogue failure modes. For each, define: detect (typed error), respond (retry / fall back / surface to user / replan), and log. The agent receives a structured error message and can react with a typed branch in its loop.

When to use

  • Tool errors, rate limits, or validation failures occur often enough that random retries waste effort.
  • Failure modes can be catalogued with typed errors and structured recovery responses.
  • The agent loop can branch on typed error messages.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related