Reranking
What is reranking?
Vector search uses a bi-encoder: the query and each document are embedded separately into a fixed vector, then ranked by cosine similarity. It scales to billions of documents but loses fine-grained relevance because the encoders never see the query and document together. A reranker uses a cross-encoder: it scores each (query, document) pair jointly. This is far more accurate but ~100x more expensive, so you run it only on the top candidates from the bi-encoder. The standard two-stage retrieval pipeline:The Reranker interface
All four built-in providers implement the same interface so you can swap them freely.
RerankDocument
Accepts either a plain string OR an object with optional id + metadata:
RerankOptions
RerankResult
index field is the most important detail — it lets you trace each result back to the original input array without comparing strings.
Built-in providers
CohereReranker
npm install cohere-ai (optional peer dep).
Model options:
| Model | Languages | Notes |
|---|---|---|
rerank-v3.5 | 100+ | Default. Best balance of quality + cost. |
rerank-multilingual-v3.5 | 100+ | Optimized for non-English. |
rerank-english-v3.5 | English only | Slightly faster on English. |
VoyageReranker
fetch API directly. Throws Error("VoyageReranker: missing API key") if neither apiKey nor VOYAGE_API_KEY env is set.
Model options:
| Model | Notes |
|---|---|
rerank-2 | Default. General purpose. |
rerank-2-lite | ~5x cheaper, slightly lower recall. |
JinaReranker
ColbertReranker (local, no API key)
npm install @xenova/transformers (optional peer dep).
Runs a HuggingFace cross-encoder model entirely in process via WASM/ONNX. The first call after construction lazy-loads the model (~50MB download for MiniLM-L-6-v2); subsequent calls are local-only.
Important: the default MiniLM-L-6-v2 is a classic cross-encoder, not true ColBERT v2 late interaction. For production-grade ColBERT (~3x better quality, similar latency), point this at a dedicated endpoint such as JinaAI ColBERT or self-host ColBERTv2/PLAID. The class name “ColbertReranker” refers to the role (late-interaction reranker), not the model itself.
Wiring into a vector store
EveryVectorStore in @agentium/core accepts a rerank option:
How rerankMultiplier works
When a reranker is set:
- The vector backend fetches
topK * rerankMultipliercandidates from the underlying ANN index. - The reranker scores each one against the original query.
- The reranker returns the top
topKby its own score.
rerankMultiplier defaults to 3. Larger values give the reranker more candidates to choose from (better recall) at the cost of latency + reranker tokens. topK=5, rerankMultiplier=10 is a sensible “high quality” setting.
Query types the reranker sees
The reranker requires a text query. The vector backend hands it whatever it can extract:| Original query | What the reranker gets |
|---|---|
string | the string verbatim |
ContentPart[] with at least one text part | concatenated text parts joined with spaces |
ContentPart[] with no text (e.g. image-only) | reranker is skipped, vector ranking is used |
number[] (precomputed vector) | reranker is skipped, vector ranking is used |
Backend-by-backend behavior
All four built-in backends call the sameBaseVectorStore.applyRerank() chokepoint, so behavior is identical:
InMemoryVectorStore— fetchestopK * multiplierfrom the local cosine ranking.PgVectorStore— adjusts the SQLLIMITto the larger fetch size; doesn’t applyminScoreuntil after rerank.QdrantVectorStore— setslimit: fetchKand omitsscore_thresholdwhen reranker is set (rerank handles thresholding).MongoDBVectorStore— applies to both the Atlas$vectorSearchpath and the in-process brute-force fallback.
minScore interaction
When you pass minScore with rerank:
Standalone usage
A reranker also works without a vector store, e.g. to reorder a BM25 candidate list or to score a set of LLM-generated options:Composing rerankers
You can stack rerankers cheaply by calling them in sequence:Performance characteristics
| Provider | Latency (50 docs) | Cost per 1K reranks | Notes |
|---|---|---|---|
Cohere rerank-v3.5 | ~200ms | $1.00 (Cohere pricing) | HTTPS round-trip + model |
Voyage rerank-2 | ~250ms | $0.50 | Comparable quality |
Voyage rerank-2-lite | ~150ms | $0.05 | Great for large batches |
Jina jina-reranker-v2 | ~300ms | $0.10 | Multilingual focus |
Local MiniLM-L-6-v2 | ~50ms per doc, batched | free | First call: 50MB model download |
Errors and edge cases
| Situation | Behavior |
|---|---|
Empty documents array | Returns [] immediately without calling the API |
apiKey missing AND env var missing | Constructor succeeds; first rerank() call throws "missing API key" |
cohere-ai not installed (Cohere provider) | Constructor throws "cohere-ai is required..." with install hint |
| HTTP 429 / 500 / 502 / 503 | Auto-retry up to 2 times with exponential backoff |
| HTTP 400 / 401 / 403 / 404 | Throws immediately (no retry) |
Reranker returns more results than topK | Truncated to topK |
See also
- Semantic Tool Selection reuses
Rerankerto pick the best tools when an agent has many. - GraphRAG / HybridRetriever composes vector + graph + rerank.
- Embeddings — the first stage of two-stage retrieval.