Trajectory Anomaly Monitor

also known as Trajectory Guard, Sequence-Aware Action Monitor

Run a trained, non-LLM verifier out-of-band over the agent's action trajectory at runtime to flag task-misaligned plans and malformed step sequences at millisecond latency, before the actions cause damage.

Context

An autonomous agent takes real actions in sequence — tool calls, plan steps, state changes — where a misaligned or malformed trajectory can cause damage. The team wants a runtime safety check on every step, but an LLM judge on each action is too slow and too expensive to sit in the hot path, and output-quality scoring after the fact arrives only once the action has already happened.

Problem

Per-step oversight by an LLM judge adds latency and cost that production cannot absorb on every action, and scoring an agent's final output reveals nothing about a dangerous action mid-trajectory until it is too late. What is missing is a check that reads the whole action sequence as it unfolds — recognising that a plan has drifted off the task or that the step structure is malformed — and does so fast enough to intervene before the next action lands. Output-quality monitors are not sequence-aware, and loop-shape heuristics catch only repetition, not subtler misalignment.

Forces

Per-step LLM-judge oversight is the most flexible check but is far too slow and costly for the hot path.
Output scoring is cheap but post-hoc, so it cannot stop a damaging action mid-trajectory.
A trained sequence model is fast and sequence-aware but must be built, supervised, and maintained.
The monitor must run out-of-band so it does not add the agent's own latency to every step.

Example

An agent with shell and database access executes multi-step tasks for users. Running an LLM judge on every action would triple latency, and scoring the final answer would only catch damage after it happened. The team trains a lightweight sequence model on labelled trajectories — good runs and known-bad ones — and runs it out-of-band on each step at tens of milliseconds. When a trajectory drifts toward an unrequested destructive command, the monitor flags it and a policy gate pauses the agent for human review before the command runs, at a fraction of the cost of judging every step with an LLM.

Diagram

flowchart TD A[Agent emits next action] --> S[Action trajectory stream] S --> V[Trained sequence verifier, out-of-band] V --> D{Anomaly flagged?} D -->|no| Go[Execute action] D -->|yes| Gate[Policy gate: halt / escalate before damage] Gate --> H[Human or LLM-judge review of flagged case]

Solution

Therefore:

Train a dedicated verifier — a sequence model or a process-supervised classifier, not an LLM judge — on agent trajectories labelled for task alignment and structural validity. At runtime it consumes the agent's action sequence out-of-band and emits an anomaly signal at millisecond latency, fast enough to gate or pause the agent before the next action executes. Reported results put such a verifier at tens of milliseconds per check, well over an order of magnitude faster than an LLM-judge baseline, with process supervision over the trajectory outperforming output-only checks. Compose with a policy gate that halts or escalates on a flagged trajectory, and reserve LLM-judge review for the flagged cases rather than every step. Distinct from scoring final outputs and from loop-shape heuristics: the unit is the whole action sequence, and the timing is pre-damage.

What this pattern forbids. The agent may not advance to its next action while a flagged trajectory is unresolved; a step sequence the monitor judges task-misaligned or malformed is gated before execution rather than scored after the fact.

And the patterns that stand alongside it, or against it —

alternative-toScorer Live Monitoring★— Score agent outputs asynchronously in production with non-blocking scorers that observe, alert, and log but do not regenerate the output.
alternative-toLLM-as-Judge★★— Use an LLM to score open-ended outputs against rubric criteria when no exact-match metric applies.
alternative-toTyped Tool-Loop Failure Detector★— Lift tool-loop detection from prompt-level rules to a mechanical dispatch-boundary veto with typed failure modes and per-tool caps that returns a formatted refusal the model must consume.
alternative-toVerifier-Aware Reward Hacking✕— Anti-pattern: hand the agent read access to its own grader or test harness and assume a passing score means the task was actually done.
complementsAgent-Speed Incident-Response Gap✕— Anti-pattern: govern an autonomous agent with incident-response and breach-reporting frameworks scaled to human reaction time, even though a compromised agent can exfiltrate data and erase its traces in seconds.
complementsAdversary-Indistinguishability Blind Spot✕— Anti-pattern: rely on behavioral-anomaly detection calibrated to irregular human behaviour, so an autonomous adversary acting with legitimate credentials, standard protocols, and superhuman consistency is less anomalous than a human and slips past unseen.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in frameworks

References

Provenance

Source: patterns/trajectory-anomaly-monitor.md on GitHub · commit e2386c5 · view history
Added to catalog: 2026-06-05
Last updated: 2026-06-05
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.