Silent Hypotheses in Generated Code

also known as Silent Hypothesis to Production, Unnamed Assumption Shipped, Hypothese Silencieuse

Anti-pattern: model-written code rests on an unstated runtime premise that passing tests and code review never surface, so the hidden assumption travels into production and fails there.

Context

A coding agent generates a function or a change and the code reads cleanly, the test suite stays green, and a reviewer approves it. To produce plausible code the model fills every gap the prompt left open with a default guess: that an input is always sorted, that a list is never empty, that a currency is the local one, that a timestamp is in the server's timezone, that an upstream service answers within a second. Each guess is a premise the code now depends on, yet none of it is written down anywhere a reader or a test can see.

Problem

Passing tests prove only that the code behaves correctly on the cases the tests cover, which are usually the cases that share the same unstated premise the code was written under. The hidden assumption is therefore invisible at exactly the two moments meant to catch defects: the green suite confirms the happy path the model already assumed, and the reviewer, reading fluent code, sees no flag because nothing names the premise. The assumption surfaces only when production sends an input that violates it, and by then the failure looks like a runtime incident with no obvious cause rather than a design choice nobody made on purpose.

Forces

To generate runnable code the model must resolve every ambiguity the prompt left open, and the cheapest resolution is a silent default rather than a question back to the developer.
A green test suite reads as proof of correctness, but it only certifies the cases tested, which tend to share the same premise the code was generated under.
Fluent, idiomatic generated code lowers a reviewer's guard precisely when the load-bearing assumption is the thing that is missing from the page.
Naming and testing every assumption is slow and pushes against the speed that made code generation attractive in the first place.

Example

A developer asks an agent to add a function that returns the average order value. The code reads cleanly and the tests pass. Nobody notices the agent assumed the order list is never empty, because every test fixture has orders in it. Three weeks later a new customer with no orders hits the endpoint, the function divides by zero, and the checkout page errors out. The bug was a silent assumption that travelled all the way to production without ever being named.

Diagram

flowchart TD A[Ambiguous prompt] --> B[Model fills gaps with silent defaults] B --> C[Code with an unnamed premise] C --> D{Test suite} D -->|green: same premise| E[Review] E -->|clean: premise invisible| F[Merged and shipped] F --> G[Production input violates premise] G --> H[Failure with no named cause]

Solution

Therefore:

The remedy is to treat passing tests as non-proof of correctness and to surface the hidden premises before they ship. When generating code, have the model emit the assumptions it made explicit alongside the code as comments, preconditions, or assertions, so every silent default becomes a named, reviewable claim. Add tests that target the violated-premise cases — empty inputs, unsorted data, foreign currency, late upstream responses — rather than only the happy path the code was written under. In review, ask of each change what it assumes about its inputs and environment and whether anything checks that assumption, treating an unnamed premise as a defect rather than a detail. Where a premise cannot be tested cheaply, encode it as a runtime guard that fails loudly instead of corrupting state quietly.

What this pattern forbids. Generated code must not be merged on a green suite and a clean review alone; each unstated premise about inputs and environment must be named as a comment, assertion, or test, and passing tests cannot be treated as proof that no hidden assumption remains.

The patterns that counter or replace it —

complementsWorkflow-Success vs Business-Validity Gap✕— Anti-pattern: a terminal success status from the agent or its workflow engine is read as proof the deliverable is business-correct, when it certifies only technical completion.
complementsPhantom Action Completion✕— Anti-pattern: the agent reports a side-effecting action as complete from its own narration, when the tool call silently failed or never ran and nothing checked that the effect occurred.
alternative-toGenerate-and-Test Strategy★— Generate multiple candidate solutions in parallel, then systematically test each against declared constraints rather than committing to the first plausible one — adapted from Langley & Simon's cognitive-science research on human expert problem-solving.
complementsEval as Contract★★— Treat the eval suite as the contract the agent must satisfy; releases ship only if evals pass.
complementsBehavior-Pinning Test Before Agent Edit★— Capture the current behaviour of agent-touchable code as golden characterization tests before an agent edits it, with load-bearing values computed deterministically and only prose left to the model, run as a regression gate.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

Provenance

Source: patterns/silent-hypotheses-to-production.md on GitHub · commit ad426c4 · view history
Added to catalog: 2026-06-14
Last updated: 2026-06-14
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.