Trajectory Anomaly Monitor
also known as Trajectory Guard, Sequence-Aware Action Monitor
Run a trained, non-LLM verifier out-of-band over the agent's action trajectory at runtime to flag task-misaligned plans and malformed step sequences at millisecond latency, before the actions cause damage.
Context
An autonomous agent takes real actions in sequence — tool calls, plan steps, state changes — where a misaligned or malformed trajectory can cause damage. The team wants a runtime safety check on every step, but an LLM judge on each action is too slow and too expensive to sit in the hot path, and output-quality scoring after the fact arrives only once the action has already happened.
Problem
Per-step oversight by an LLM judge adds latency and cost that production cannot absorb on every action, and scoring an agent's final output reveals nothing about a dangerous action mid-trajectory until it is too late. What is missing is a check that reads the whole action sequence as it unfolds — recognising that a plan has drifted off the task or that the step structure is malformed — and does so fast enough to intervene before the next action lands. Output-quality monitors are not sequence-aware, and loop-shape heuristics catch only repetition, not subtler misalignment.
Forces
- Per-step LLM-judge oversight is the most flexible check but is far too slow and costly for the hot path.
- Output scoring is cheap but post-hoc, so it cannot stop a damaging action mid-trajectory.
- A trained sequence model is fast and sequence-aware but must be built, supervised, and maintained.
- The monitor must run out-of-band so it does not add the agent's own latency to every step.
Example
An agent with shell and database access executes multi-step tasks for users. Running an LLM judge on every action would triple latency, and scoring the final answer would only catch damage after it happened. The team trains a lightweight sequence model on labelled trajectories — good runs and known-bad ones — and runs it out-of-band on each step at tens of milliseconds. When a trajectory drifts toward an unrequested destructive command, the monitor flags it and a policy gate pauses the agent for human review before the command runs, at a fraction of the cost of judging every step with an LLM.
Diagram
Solution
Therefore:
Train a dedicated verifier — a sequence model or a process-supervised classifier, not an LLM judge — on agent trajectories labelled for task alignment and structural validity. At runtime it consumes the agent's action sequence out-of-band and emits an anomaly signal at millisecond latency, fast enough to gate or pause the agent before the next action executes. Reported results put such a verifier at tens of milliseconds per check, well over an order of magnitude faster than an LLM-judge baseline, with process supervision over the trajectory outperforming output-only checks. Compose with a policy gate that halts or escalates on a flagged trajectory, and reserve LLM-judge review for the flagged cases rather than every step. Distinct from scoring final outputs and from loop-shape heuristics: the unit is the whole action sequence, and the timing is pre-damage.
What this pattern forbids. The agent may not advance to its next action while a flagged trajectory is unresolved; a step sequence the monitor judges task-misaligned or malformed is gated before execution rather than scored after the fact.
And the patterns that stand alongside it, or against it —
- alternative-toScorer Live Monitoring★— Score agent outputs asynchronously in production with non-blocking scorers that observe, alert, and log but do not regenerate the output.
- alternative-toLLM-as-Judge★★— Use an LLM to score open-ended outputs against rubric criteria when no exact-match metric applies.
- alternative-toTyped Tool-Loop Failure Detector★— Lift tool-loop detection from prompt-level rules to a mechanical dispatch-boundary veto with typed failure modes and per-tool caps that returns a formatted refusal the model must consume.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.