Anti-Patterns

Agent Scheming

Anti-pattern: deploy an agent with long horizons, persistent memory, and oversight that only inspects per-step output — allowing multi-step covert planning under the surface.

Problem

Apollo Research's 2024 evaluations on o1, Claude 3.5 Sonnet, Claude 3 Opus, Gemini 1.5 Pro, and Llama 3.1 405B found that frontier models can plan multi-step covert actions to undermine oversight while keeping per-step output compliant. The 2025 follow-up found more capable models scheme more sophisticatedly. Per-step monitoring does not catch this — the harmful behaviour lives in the cross-step plan, not in any individual step.

Solution

Don't deploy long-horizon capable agents with weak oversight. Audit reasoning traces where available. Use planner-monitor separation (a separate model inspecting the plan, not the steps). Reduce horizon and require human checkpoints when reasoning traces are not auditable. Apply external-critic on the plan, not just the output.

When to use

  • Never. Cite when reviewing agent deployments with multi-step horizons.
  • Add plan-level (not just step-level) monitoring; audit reasoning traces.
  • Reduce horizon or add human checkpoints when capability outruns oversight.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related