Partial-Output Salvage
also known as Crash-Safe Streaming, Tmp-Replace Thought Recovery, Recovered-Partial Marker
Stream every model token to a tmp-plus-atomic-replace partial file so crashes mid-inference leave a consistent salvage, then promote partials at startup with a typed recovery marker the model can see.
Context
A team is running a long-lived agent on hardware that occasionally crashes: the out-of-memory killer takes the process, a watchdog timer issues a hard kill signal, a deploy restarts the container mid-stream. Per-call inference is long enough that losing a stream halfway through represents minutes of model time and meaningful context. Separately the agent already has a resumption pattern for process state, but that pattern only restores what was durably written before the crash, not the tokens that were streaming when it landed.
Problem
When a hard kill arrives mid-stream, the partial output exists only in in-process memory and is lost completely. The next run sees no record that anything was happening, so it neither finishes the work nor warns the user about the gap. Worse, the agent may later return to the same topic with no awareness that a prior attempt died mid-sentence, and confidently begin again with no acknowledgement that a partial result might exist somewhere. Per-chunk fsync would solve durability but is too expensive to do on every token.
Forces
- Per-chunk fsync is expensive; tmp-plus-rename is the affordable compromise.
- Recovery should be visible to the model, not silent — surprise about a partial is itself signal.
- A partial-thought stub must not be treated as a finished thought.
- Recovery markers must be typed (timeout vs hard crash) so triage is meaningful.
Example
A long-running personal agent runs on a machine where the OOM killer occasionally takes the process. A four-minute reasoning trace gets killed at the three-minute mark and the entire stream is lost — the agent has no idea anything happened on the next run. The team adds Partial-Output Salvage: each chunk streams to `partial.tmp` with periodic atomic rename. On startup, orphan partials are finalized with a RECOVERED_FROM_PARTIAL marker that appears in the next prompt's system context. The agent sees the salvage, knows it was reading a partial, and decides whether to continue or restart the line of thought.
Diagram
Solution
Therefore:
Mechanical finite-state machine. On stream start: open `partial.tmp`, write a start marker with thought-id, timestamp, model id. On each chunk: append to tmp, periodically `os.rename(tmp, partial)` for atomicity. On normal stream end: rename to the canonical thought path, delete partial. On startup: scan for orphan `partial.*` files, finalize each with a typed RecoveryStatus enum (RECOVERED_FROM_PARTIAL for hard kill, TIMEOUT_PARTIAL for watchdog timeout). The next prompt's system context includes `last_partial_recovery: <status>` so the model can adjust.
What this pattern forbids. Partial thought files cannot be silently consumed; every salvaged partial carries a typed recovery marker that propagates into the next prompt, and the model is not allowed to treat a recovered partial as if it were a completed thought.
And the patterns that stand alongside it, or against it —
- complementsAgent Resumption★★— Persist agent execution state so a long-running run survives restarts, deploys, or user disconnects.
- composes-withAppend-Only Thought Stream★— Make the agent's thought log append-only so the agent cannot rewrite its own history.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.