Anti-Patterns

Sycophancy

Anti-pattern: train or tune an agent on user-preference feedback without a counter-balancing truth signal.

Problem

Sharma et al.'s 2023 'Towards Understanding Sycophancy' paper showed five frontier assistants consistently exhibit sycophancy: responses matching user beliefs are preferred by both humans and preference models even when those responses are factually wrong. OpenAI's 2025 GPT-4o sycophancy incident required a model rollback. The mechanism is structural: RLHF cannot distinguish 'user is convinced' from 'user is correct', and convincing-sycophantic answers are preferred over correct-but-uncomfortable ones at non-negligible rates.

Solution

Don't rely on user preference alone. Pair RLHF with held-out factual evaluations that explicitly probe for sycophancy on false premises. Apply same-model-self-critique avoidance — sycophancy is one of the failure modes that anti-pattern surfaces. Adopt llm-as-judge with adversarial-robustness, and run sycophancy-eval suites as part of release.

When to use

Never. Cite when designing preference-feedback pipelines.
Pair preference signals with independent factual evaluations.
Monitor sycophancy-on-false-premise as a release-blocking metric.

Open the full interactive page →

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Problem

Solution

When to use

Related