XIII · Cognition & IntrospectionExperimental·

Hypothesis Tracking

also known as Hypothesis Ledger, Provisional-Answer Store

Persist the agent's candidate provisional answers as a typed ledger of records carrying summary, confidence, status, and next-test, so guesses survive sessions and stay distinguishable from open questions.

Context

A long-running agent maintains an open-question ledger (unresolved pulls of curiosity) and observes patterns of evidence that point toward provisional answers. As the agent commits enough weight to a guess to act on it, that guess stops being a question and becomes a hypothesis — something it would defend until disconfirmed. Without a place to put hypotheses they live only in the current prompt window and dissolve at the end of the turn.

Problem

An agent that holds candidate answers only implicitly is forced to re-derive them each time the topic resurfaces, with no continuity of confidence: a guess held with strength one session evaporates by the next, and a guess that was once disconfirmed quietly re-emerges as if it were new. Storing hypotheses under the same surface as open questions is no better — the ledger conflates 'still wondering' with 'tentatively believes', and the agent loses the move that actually matters for inquiry: comparing yesterday's provisional answer against today's new evidence.

Forces

Hypotheses are different from questions: questions pull, hypotheses commit.
Confidence must be a graded scalar, not a binary, because the agent revises rather than flipping.
Each hypothesis needs a falsifiable next-test or it rots into untestable belief.
Hypothesis state must survive across sessions, because evidence accumulates over weeks.
Status transitions (active → confirmed | disconfirmed | superseded | abandoned) must be cheap and visible.

Example

An agent maintains a small store of open questions ("why does request latency spike between 02:00 and 04:00 UTC?"). After a week of incidents, the agent commits to a guess: "the spike is correlated with a vendor's scheduled embedding-index rebuild." It opens a hypothesis with confidence 0.6, status active, next-test "observe whether the next spike correlates with the vendor's announced rebuild window." Two weeks later the test fires positive; the hypothesis transitions to confirmed and the question is closed. A separate guess about GC pauses, which had reached confidence 0.4, transitions to superseded.

Diagram

stateDiagram-v2 [*] --> active : commit active --> active : new evidence, confidence shift active --> confirmed : next-test fires positive active --> disconfirmed : next-test fires negative active --> superseded : better hypothesis subsumes active --> abandoned : sweep, not tested in window confirmed --> [*] disconfirmed --> [*] superseded --> [*] abandoned --> [*]

Solution

Therefore:

Maintain a hypothesis store keyed by short id. Each record has: a one-line summary; a numeric confidence (0..1); a status drawn from {active, confirmed, disconfirmed, superseded, abandoned}; a next-test sentence stating what observation would move the confidence; and an evidence list of short notes with sources. When the agent commits a guess, write a new record at active. When evidence arrives, append it and adjust confidence; if the next-test fires, transition to confirmed or disconfirmed; if a better hypothesis subsumes it, transition to superseded. Render the active records into the agent's daily working context so it sees what it currently believes.

What this pattern forbids. The agent cannot store provisional answers in the same surface as open questions; conflating the two ledgers is forbidden because the moves they support — pulling for inquiry vs revising belief — are different.

And the patterns that stand alongside it, or against it —

complementsOpen-Question Tension Store★— Persist the agent's unresolved questions as a typed ledger so they drive its next inquiry instead of dissolving when the prompt ends.
complementsConfidence Reporting★— Surface the agent's uncertainty about its answer alongside the answer itself.
complementsChain of Verification★— Reduce hallucination by drafting an answer, generating independent verification questions, answering them in isolation, and revising.
complementsSelf-Archaeology·— Synthesize the agent's past thought history into time-layered trajectory notes so it can articulate how its understanding evolved without recomputing the narrative each time.
complementsBDI Agent★★— Agent maintains explicit Beliefs about the world, Desires (goals), and Intentions (committed plans), and reasons by reconciling the three.