Test-Time Memorization (Titans)
also known as Inference-Time Memory, Titans Memory Module
Memory module that learns at inference time by incorporating recent inputs into its parameters during the session rather than relying solely on pre-trained weights.
Context
A long-running agent task generates new information that should influence later decisions in the same task — but happens after training. Standard models either lose this information at session end (no learning) or require expensive retraining cycles to incorporate it.
Problem
Pre-trained-only models can't learn within a session. Retraining is too slow and expensive to do per-session. RAG retrieves but doesn't internalize. The agent needs a way to memorize within a session that's faster than retraining but more integrated than retrieval.
Forces
- Test-time training adds inference-time compute cost.
- Memory module design affects what's memorizable and at what fidelity.
- Concurrency issues — multiple sessions writing to the same module would interfere.
Example
A research-agent session processes 200 papers over 6 hours. With standard model: early papers' content fades by paper 150. With Titans test-time memorization: each processed paper updates the memory module; by paper 150 the model effectively recalls patterns from paper 5 without RAG retrieval. End-of-session synthesis is dramatically better.
Diagram
Solution
Therefore:
Behrouz et al. 2024 — Titans architecture. A neural memory module sits alongside the main model; during a session, inputs trigger updates to the module's parameters (gradient steps at inference time). Later steps in the same session benefit from this in-session learning. Module state is per-session and ephemeral. Pair with episodic-memory, agentic-memory, landmark-attention, agent-resumption.
What this pattern forbids. Memory module parameter updates may not persist beyond session end without explicit promotion to LTM; no cross-session bleed of in-session learned state is allowed by default.
And the patterns that stand alongside it, or against it —
- complementsEpisodic Memory★★— Record past events as time-stamped first-person experiences the agent can recall later, separately from extracted facts (semantic) and learned how-to (procedural).
- complementsAgentic Memory★— Expose memory management as first-class tool actions (ADD, UPDATE, DELETE, RETRIEVE, SUMMARY, FILTER) the LLM chooses at every step, trained end-to-end so short-term and long-term memory live under one learned policy.
- complementsLandmark Attention·— Long-context attention mechanism placing sparse landmark tokens across very long inputs so the model jumps directly to relevant sections via landmark lookup rather than scanning linearly.
- complementsAgent Resumption★★— Persist agent execution state so a long-running run survives restarts, deploys, or user disconnects.
- complementsLarge Reasoning Model (LRM) Paradigm★— Route reasoning-heavy tasks to a reasoning-tuned model that trades inference time for deliberation, rather than to a fast LLM that exhibits premature-closure.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.