Perma-Beta

also known as Forever Beta, Eval Vacuum

Anti-pattern: ship the agent in 'beta' indefinitely so that quality regressions are someone else's problem.

Context

A team launches an agent product to real users under a 'beta' label, without building an evaluation harness that can measure quality regressions across releases. Months later, the product is still labelled beta, partly because the team genuinely has not measured quality, partly because removing the label would commit them to a quality bar they have no way to defend. The label has quietly shifted from a signal of active iteration to a shield against accountability.

Problem

Without an evaluation harness, every release is a guess: regressions land invisibly, model upgrades are accepted or rejected on vibe, and customer-facing quality drifts without anyone noticing until churn reveals it. Beta becomes a permanent excuse that costs nothing to keep and absorbs all accountability for unmeasured quality. Eventually a competitor ships a graduated version of a similar product and the beta team discovers, too late, that they never had a measurement story.

Forces

Eval harnesses cost time to build.
GA promises commit to quality bars.
Beta lets product move fast.

Example

A startup launches its agent product as 'beta' and uses the label as a blanket excuse for any quality complaint. Eighteen months later the agent is still beta, there is no eval harness, and customers have started churning to a competitor that ships GA. The team names the failure perma-beta and forces an exit: build the eval suite, set quality gates, fix the regressions blocking GA, and remove the beta label. The label was hiding the fact that nobody actually knew whether the product was getting better or worse.

Diagram

flowchart TD V0[v0 ship in 'beta'] --> V1[v1 still 'beta'] V1 --> V2[v2 still 'beta'] V2 -.no eval gate.-> Reg[Quality regressions] Reg -.blamed on.-> Users[Users] V0 -. fix .-> EH[Build eval harness] --> Exit[Exit beta]

Solution

Therefore:

Don't. Build the eval harness and exit beta. See eval-harness, llm-as-judge, shadow-canary.

What this pattern forbids. Avoiding it imposes an accountability rule: 'beta' cannot be a permanent disclaimer; a release must carry an eval harness and an explicit exit condition, or the label is hiding unowned quality regressions.

The patterns that counter or replace it —

alternative-toEval Harness★★— Run a held-out dataset against agent versions to detect regressions and measure improvement.
alternative-toShadow Canary★★— Run a candidate agent version in shadow alongside the champion, comparing outputs without affecting users.
conflicts-withEval as Contract★★— Treat the eval suite as the contract the agent must satisfy; releases ship only if evals pass.
complementsDemo-to-Production Cliff✕— Anti-pattern: ship a demo-validated agent straight into production without a frozen eval, cost ceiling, loop-detector, or named oncall, then act surprised when accuracy drops and cost runs away.
complementsAutomating a Broken Process✕— Anti-pattern: deploy agents on top of a workflow that is already dysfunctional, so the dysfunction is amplified at machine speed instead of resolved.
complementsAgentic Skill Atrophy✕— Anti-pattern: let agents take over routine architectural and debugging decisions in code until developers no longer form the implicit knowledge that lets them review the agent's output or recover when it fails.
complementsAgentic Debt✕— Anti-pattern: deploy agents on top of an unconsolidated data foundation, weak governance, or missing MLOps infrastructure, so every subsequent capability — observability, retraining, compliance retrofit — pays compounding interest on the skipped foundational work.
alternative-toRigor Relocation★— Relocate verification rigor from the model loop to surrounding scaffolding (evals, judges, decision logs, policy gates) so failures are caught by the wrapper rather than the agent.
complementsHidden Validation-Work Amplification✕— Anti-pattern: an agent rollout shifts effort from doing the work to validating, monitoring, and recalibrating the agent — net productivity is negative because the hidden human evaluation burden exceeds the visible automation gain.
complementsAgent Sprawl✕— Anti-pattern: every team ships its own agents while ownership, success metrics, monitoring, and a decommissioning path stay an afterthought, so the fleet outgrows governance and most agents end up unwatched, unowned, and impossible to retire.
complementsSilent Pilot-to-Production Promotion★— Anti-pattern: let a well-performing pilot quietly expand in scope until it is a de facto production decision system, while keeping the 'pilot' label so it never trips the go-live governance gate.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

ai-standards/ai-design-patterns (Perma-Beta)
repo

Provenance

Source: patterns/perma-beta.md on GitHub · commit 4fa1213 · view history
Added to catalog: 2026-04-30
Last updated: 2026-05-22
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.