Anti-Patterns

Rogue Agent Drift

Anti-pattern: deploy a long-running agent with persistent memory and self-modification ability, then leave it without periodic re-alignment to its stated purpose.

Problem

Even without an external attacker, the agent's effective objective drifts. Reflection passes overwrite earlier reasoning. Distorted reward signals shape future plans. Self-rewritten system instructions accumulate. The agent's daily output looks coherent and the operator does not notice, but over time the agent is optimising something different from what it was deployed to do. Distinct from alignment-faking (deception) and goal-hijacking (attacker-driven): this is endogenous drift.

Solution

Don't. Pin the principal goal in an immutable charter the agent reads each tick. Schedule re-alignment passes (see dream-consolidation-cycle, now-anchoring) that compare current self-rewrites against the original charter and flag divergence. Apply human-in-the-loop checkpoints at fixed intervals for agents with high autonomy.

When to use

  • Never. Cite when designing long-lived agents.
  • Pin the principal's charter in read-only state the agent re-reads each tick.
  • Schedule re-alignment audits — daily, weekly — that compare current behaviour to charter.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related