Lost in the Middle (Positional Bias)
also known as Long-Context Positional Bias, U-Curve Attention
LLM accuracy on retrieving information from long contexts drops sharply when relevant content sits in the middle of the prompt rather than at the start or end.
Context
A team puts a long context in front of the model (RAG with many chunks, long documents, multi-turn conversation history). Quality on retrieval-style queries depends on where the relevant content sits in the prompt. The team doesn't know about the positional bias and is surprised when middle-of-prompt content gets ignored.
Problem
The model exhibits a U-shaped attention curve: content at the start (primacy) and end (recency) of the prompt is retrieved well; content in the middle is poorly retrieved. The team feeds RAG chunks ordered by relevance — relevant chunks end up in the middle of the prompt — and the model misses them. Distinct from context-fragmentation (which is about simultaneous holding of constraints) by being positional, not relational.
Forces
- Positional bias is an attention-architecture property; not fixable in prompt.
- Reordering content to put relevance at the ends costs preprocessing.
- Some content (instructions) must stay in a known position; can't be reordered freely.
Example
A RAG-based research agent retrieves 20 chunks for each query, packs them into the prompt in order of relevance score. Queries that should be answered from chunk 7-12 (middle) fail; queries answered from chunk 1-3 or 17-20 succeed. Team initially thinks 'the retrieval is wrong' — wrong diagnosis; the retrieval was right, but the model didn't attend to the middle chunks. Fix: reorder so highest-relevance chunks land at start and end, drop the rest.
Diagram
Solution
Therefore:
Acknowledge the bias as architectural. Pair with: landmark-attention (architectural mitigation, requires model support), information-chunking-memory (preprocessing mitigation), context-window-packing (positional design), context-window-dumb-zone (related utilization limit).
What this pattern forbids. No useful constraint; the missing constraint is positional-quality awareness in prompt design.
And the patterns that stand alongside it, or against it —
- alternative-toLandmark Attention·— Long-context attention mechanism placing sparse landmark tokens across very long inputs so the model jumps directly to relevant sections via landmark lookup rather than scanning linearly.
- alternative-toInformation Chunking for Agent Memory★★— Structure inputs into digestible topical segments (chunks) before feeding to short-term memory rather than throwing the full input at the model; reduces overload and increases accuracy (~40% improvement observed in customer-service deployment).
- alternative-toContext Window Packing★★— Choose what fits in the context window each turn given a fixed token budget.
- complementsContext Window Dumb-Zone Cap★— Hold context-window utilization below a working threshold (~40%) to keep the model out of the 'dumb zone' where it begins ignoring earlier instructions and hallucinating.
- complementsContext Fragmentation✕— Anti-pattern: the LLM cannot hold multiple interconnected constraints in mind simultaneously the way human working memory can; it processes each constraint locally and loses the cross-constraint view.
Neighbourhood
Click any neighbour to follow the language. Scroll to zoom, drag to pan.