Anti-Patterns

Symptom-Remediation Thrashing

Anti-pattern: a stateless auto-remediation agent repeatedly applies symptom-level fixes that hit the target metric while masking the root cause and suppressing the page, so the underlying fault compounds across incidents into a larger outage.

Problem

A symptom-level fix can bring the metric back while leaving the real cause untouched — scaling a service masks a noisy neighbour, restarting a pod clears a leak that refills. Because the metric recovers, the incident closes and no page reaches the team that owns the root cause, so the fix both hides the problem and suppresses the signal that would get it fixed. With no memory across incidents and no cap on repeated remediation, the agent keeps applying the same masking fix while the underlying fault grows, until it fails harder and takes down more than the original symptom.

Solution

Treat a recovered metric as mitigation, not resolution, and design remediation to detect its own masking. Carry state across incidents so the agent can see it has applied the same fix to the same symptom before, and cap repeated remediation with an escalate-after-N-attempts rule that routes a recurring symptom to a human instead of re-applying the mask. Keep the page alive when a fix is a known mask rather than a root-cause resolution, so the owning team is still notified. Distinguish masking from resolution — for example by checking whether the underlying signal, not just the target metric, returned to health — before declaring the incident closed. The control is cross-incident memory plus an escalation cap, not a faster symptom fix.

When to use

  • Recognising this failure when an auto-remediation agent repeatedly applies a fix that recovers a metric without resolving the cause.
  • Reviewing remediation that is stateless across incidents and closes on a recovered metric.
  • Diagnosing a recurring incident that auto-remediation keeps clearing while the underlying fault grows.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related