X · Governance & ObservabilityEmerging

Agent Middleware Chain

also known as Agent Interceptor Pipeline, Pre/Post Middleware

Wrap every model call, tool call, and memory access in a composable pre/execute/post interceptor pipeline so cross-cutting concerns attach without touching agent or orchestrator code.

Context

An agent runtime accumulates cross-cutting concerns: structured logging of every model call, rate-limit enforcement on third-party APIs, PII redaction on inputs and outputs, guardrail evaluation, latency metrics, an approval gate that may pause a call. Each concern needs to fire on the same set of touchpoints — model calls, tool calls, memory reads/writes — without each concern reimplementing the wiring.

Problem

If each concern is implemented as a wrapper at the agent or orchestrator layer, the runtime accretes a deep stack of decorators, the order is implicit, and adding or removing a concern requires editing agent code. Worse, concerns differ in shape — some need to see the request before the call, some need to mutate the response, some need to catch errors. Without a uniform middleware surface, each concern carries its own ad-hoc hook code and the cross-cutting layer is no longer composable or testable in isolation.

Forces

  • Pre-execution interceptors (request modification, validation) need the request; post-execution interceptors (response logging, redaction) need the response; error handlers need the exception.
  • Ordering matters — guardrails before logging, redaction before persistence.
  • Middleware must compose at runtime so a team can add or remove a concern by configuration.
  • Each middleware must remain testable in isolation against a synthetic call.

Example

An agent runtime mounts five middlewares in order: rate-limit, PII-redact-in, guardrail-eval, metrics, approval-gate. Every model and tool call flows through the chain forward, then through the reverse chain on response. Adding a new compliance log later is a single registration in the chain config — no agent code is touched.

Diagram

Solution

Therefore:

Define a BaseMiddleware with three hooks: process_request (called before the underlying call, may modify or short-circuit), process_response (called after, may mutate the response), process_error (called on exception). A MiddlewareChain runs the chain forward through process_request, invokes the underlying call, then runs the chain in reverse through process_response. Mount the chain at the runtime layer — every model call, tool call, and memory access flows through it. Cross-cutting concerns are then registered, not coded into agents.

What this pattern forbids. Cross-cutting concerns may not be coded directly into agent or orchestrator logic; they must register through the middleware contract so order is explicit and the chain is reviewable.

The smaller patterns that complete this one —

  • usesInput/Output Guardrails★★Validate inputs before they reach the model and outputs before they reach the user.
  • usesPII Redaction★★Detect and remove personally identifiable information from inputs to and outputs from the model.
  • usesRate Limiting★★Cap the number of requests, tokens, or tool calls per user (or session) within a time window.

And the patterns that stand alongside it, or against it —

  • complementsDecision Log★★Persist the agent's reasoning trace alongside its actions so post-hoc review can explain why.
  • complementsKill SwitchProvide an out-of-band control plane to halt running agent instances without redeploy.
  • composes-withPolicy-as-Code GateEvaluate every proposed agent action against externally-managed machine-readable policies before dispatch, so compliance authorship lives outside the prompt and outside the agent code.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.