Cognition & Introspection

Partial-Output Salvage

Stream every model token to a tmp-plus-atomic-replace partial file so crashes mid-inference leave a consistent salvage, then promote partials at startup with a typed recovery marker the model can see.

Problem

When a hard kill arrives mid-stream, the partial output exists only in in-process memory and is lost completely. The next run sees no record that anything was happening, so it neither finishes the work nor warns the user about the gap. Worse, the agent may later return to the same topic with no awareness that a prior attempt died mid-sentence, and confidently begin again with no acknowledgement that a partial result might exist somewhere. Per-chunk fsync would solve durability but is too expensive to do on every token.

Solution

Mechanical finite-state machine. On stream start: open `partial.tmp`, write a start marker with thought-id, timestamp, model id. On each chunk: append to tmp, periodically `os.rename(tmp, partial)` for atomicity. On normal stream end: rename to the canonical thought path, delete partial. On startup: scan for orphan `partial.*` files, finalize each with a typed RecoveryStatus enum (RECOVERED_FROM_PARTIAL for hard kill, TIMEOUT_PARTIAL for watchdog timeout). The next prompt's system context includes `last_partial_recovery: <status>` so the model can adjust.

When to use

  • The runtime can SIGKILL the agent mid-stream and that loses meaningful work.
  • Inference is long enough per call that a partial stream has real value.
  • Filesystem supports atomic rename in the working directory.

Open the full interactive page

Diagram, neighbourhood map, code examples, related patterns and full provenance.

Related