Cross-Encoder Reranking

also known as Reranker, Two-Stage Retrieval, Retrieve-Then-Rerank

After cheap bi-encoder or BM25 retrieval, rescore top-N candidates with a cross-encoder that jointly attends over (query, candidate).

Context

A team is using a two-stage retrieval pipeline. The first stage is a fast bi-encoder that embeds the query and each document independently and compares their vectors; an approximate nearest-neighbour index returns a top-k candidate set from a large corpus. Because the encoder sees query and document separately, it cannot model fine-grained interactions between them, and because the index is tuned for recall, the top-k list mixes truly relevant candidates with topically similar but unhelpful ones.

Problem

Feeding the entire top-k list into the downstream generator wastes its context window on irrelevant candidates and lets the loudest distractor mislead the answer. The team needs a way to re-order or filter the candidate set so that the most relevant items rise to the top, but they cannot afford to run a heavy joint scoring model over the whole corpus on every query. They need a small but expensive scorer that runs only over the cheap retriever's shortlist and resorts it by genuine query-document relevance.

Forces

Cross-encoder cost is one model call per candidate.
Latency budget caps N (typically 20-100).
Fine-tuning a custom reranker is a separate effort.

Example

A legal-research agent retrieves 100 candidate paragraphs from a corpus of contracts that mention 'force majeure'. Many are off-topic. Before showing them to the LLM, a small cross-encoder model scores each candidate against the user's exact question, picks the top 5, and discards the rest. The LLM only ever reads the sharpest results.

Diagram

flowchart TD Q[Query] --> Retr[Bi-encoder retrieval, top-100] Retr --> CE[Cross-encoder scores query against each candidate] CE --> Rank[Rerank by score] Rank --> Top[Top-5 to LLM]

Solution

Therefore:

Two-stage retrieval. Stage 1: cheap retrieve (BM25, dense, hybrid) returns top-N. Stage 2: cross-encoder scores each (query, candidate) jointly. Return top-K << N to the generator.

What this pattern forbids. The generator sees only the reranker's top-K; pre-rerank candidates are not used.

And the patterns that stand alongside it, or against it —

composes-withNaive RAG★★— Condition the generator on top-k chunks retrieved from an external dense index so knowledge lives outside parameters.
composes-withHybrid Search★★— Combine sparse lexical retrieval (BM25) with dense vector retrieval and fuse the results.
composes-withAgentic RAG★★— Replace static retrieve-then-generate with autonomous agents that plan, choose sources, retrieve iteratively, reflect, and re-query.
composes-withContextual Retrieval★— Prepend a short LLM-generated description to each chunk before embedding so the chunk carries its situating context.
composes-withHyDE★— Have the LLM write a hypothetical answer document, embed it, and use it as the retrieval query.
composes-withQuery Rewriting★★— Use an LLM to generate several alternative formulations of the user's query, retrieve documents for each, and rank-fuse the results so recall does not depend on one phrasing.
composes-withHippoRAG★— Build an LLM-extracted schemaless knowledge graph from the corpus and run Personalized PageRank seeded on the query's key concepts so multi-hop retrieval completes in a single pass.
composes-withModular RAG★— Decompose RAG into a typed three-layer hierarchy of Module Types, Modules, and Operators so the pipeline (routing, scheduling, fusion, retrieval, post-retrieval, generation) can be rearranged per query rather than running a fixed linear retrieve-then-generate.
composes-withHierarchical Retrieval★★— Route a query through a multi-level cascade — coarse source or index selection, then per-source narrower retrieval, then chunk-level — so each retrieval decision is pushed to the cheapest tier that can answer it.

Neighbourhood

Click any neighbour to follow the language. Scroll to zoom, drag to pan.

Used in recipes

Production RAG
core

Used in frameworks

References

Passage Re-ranking with BERT
paper

Provenance

Source: patterns/cross-encoder-reranking.md on GitHub · commit 4fa1213 · view history
Added to catalog: 2026-04-30
Last updated: 2026-05-21
Contribute: open an issue or PR at github.com/agentpatternscatalog/patterns.