V · MemoryExperimental·

Landmark Attention

also known as Random-Access Long-Context Attention

Long-context attention mechanism placing sparse landmark tokens across very long inputs so the model jumps directly to relevant sections via landmark lookup rather than scanning linearly.

Context

A model processes very long inputs (entire books, long-form documents, massive logs). Standard transformer attention scales quadratically with sequence length and suffers from lost-in-the-middle positional bias. The team needs a mechanism that lets the model navigate long inputs efficiently.

Problem

Standard attention's quadratic cost limits practical context; positional bias means content in the middle of the context performs worse on retrieval than content at the ends. Naive truncation loses information; sliding-window attention loses long-range structure.

Forces

  • Landmark-aware architectures require model-side changes (training or fine-tuning).
  • Landmark placement heuristics affect retrieval quality.
  • Backward-compatibility with standard transformers is partial.

Example

A legal research agent processes a 200-page contract. Standard transformer with 32k context fails on retrieval from middle pages. Landmark-Attention model: landmark tokens at section boundaries; agent's queries land first on the relevant landmark, then read surrounding pages. Retrieval accuracy from middle sections climbs from 41% to 88%.

Diagram

Solution

Therefore:

Mohtashami & Jaggi 2023 — augment the input with landmark tokens at topic / section / chunk boundaries. The model's attention learns to use landmarks as a sparse index, enabling random-access lookup across very long contexts. Effective context length extends significantly. Pair with information-chunking-memory, lost-in-the-middle (addresses), context-window-packing.

What this pattern forbids. The model must be trained to use landmark tokens; standard transformers do not benefit from naively-inserted landmarks.

And the patterns that stand alongside it, or against it —

  • complementsInformation Chunking for Agent Memory★★Structure inputs into digestible topical segments (chunks) before feeding to short-term memory rather than throwing the full input at the model; reduces overload and increases accuracy (~40% improvement observed in customer-service deployment).
  • complementsLost in the Middle (Positional Bias)LLM accuracy on retrieving information from long contexts drops sharply when relevant content sits in the middle of the prompt rather than at the start or end.
  • complementsContext Window Packing★★Choose what fits in the context window each turn given a fixed token budget.
  • complementsTest-Time Memorization (Titans)·Memory module that learns at inference time by incorporating recent inputs into its parameters during the session rather than relying solely on pre-trained weights.
  • complementsMemGPT-Style PagingTreat the LLM context window as RAM and external storage as disk, with the model issuing tool calls to page memory in and out.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

References

Provenance