Reranking

What is reranking?

Vector search uses a bi-encoder: the query and each document are embedded separately into a fixed vector, then ranked by cosine similarity. It scales to billions of documents but loses fine-grained relevance because the encoders never see the query and document together. A reranker uses a cross-encoder: it scores each (query, document) pair jointly. This is far more accurate but ~100x more expensive, so you run it only on the top candidates from the bi-encoder. The standard two-stage retrieval pipeline:

                   ┌─────────────────────┐
   user query ───▶ │ Vector / BM25 search │ ─▶ top 30 candidates
                   └─────────────────────┘
                              │
                              ▼
                   ┌─────────────────────┐
                   │      Reranker        │ ─▶ top 5 final results
                   └─────────────────────┘

Empirical wins published by Cohere, Voyage, Jina, and the ColBERTv2/PLAID papers are 10–30% improvement in nDCG@10 over vector-only retrieval.

The `Reranker` interface

All four built-in providers implement the same interface so you can swap them freely.

interface Reranker {
  /** Provider identifier for logs/telemetry (e.g. "cohere", "voyage"). */
  readonly providerId: string;

  /**
   * Reorder documents by their relevance to `query`.
   * Returns results sorted by score descending.
   */
  rerank(
    query: string,
    documents: RerankDocument[],
    options?: RerankOptions,
  ): Promise<RerankResult[]>;
}

`RerankDocument`

Accepts either a plain string OR an object with optional id + metadata:

type RerankDocument =
  | string
  | {
      id?: string;                          // preserved through the reranker
      content: string;                      // the text to score
      metadata?: Record<string, unknown>;   // arbitrary payload, returned unchanged
    };

`RerankOptions`

interface RerankOptions {
  /** Maximum results to return. If omitted, returns all reranked docs. */
  topK?: number;

  /** Drop any result with `score < minScore`. */
  minScore?: number;
}

`RerankResult`

interface RerankResult {
  index: number;                          // original index in the input array
  score: number;                          // relevance score, higher = more relevant
  content: string;                        // doc text as fed to the reranker
  id?: string;                            // copied from input if it had one
  metadata?: Record<string, unknown>;     // copied from input
}

The index field is the most important detail — it lets you trace each result back to the original input array without comparing strings.

Built-in providers

`CohereReranker`

import { CohereReranker } from "@agentium/core";

const reranker = new CohereReranker({
  apiKey: process.env.COHERE_API_KEY,  // defaults to COHERE_API_KEY env
  model: "rerank-v3.5",                // defaults to "rerank-v3.5"
});

Requires: npm install cohere-ai (optional peer dep). Model options:

Model	Languages	Notes
`rerank-v3.5`	100+	Default. Best balance of quality + cost.
`rerank-multilingual-v3.5`	100+	Optimized for non-English.
`rerank-english-v3.5`	English only	Slightly faster on English.

Retry behavior: automatic on HTTP 429 / 500 / 502 / 503 with exponential backoff (1s → 2s → fail), up to 2 retries.

`VoyageReranker`

import { VoyageReranker } from "@agentium/core";

const reranker = new VoyageReranker({
  apiKey: process.env.VOYAGE_API_KEY,   // defaults to VOYAGE_API_KEY env
  model: "rerank-2",                    // defaults to "rerank-2"
  baseURL: "https://api.voyageai.com/v1", // override if you self-host
});

No SDK install required — uses the global fetch API directly. Throws Error("VoyageReranker: missing API key") if neither apiKey nor VOYAGE_API_KEY env is set. Model options:

Model	Notes
`rerank-2`	Default. General purpose.
`rerank-2-lite`	~5x cheaper, slightly lower recall.

`JinaReranker`

import { JinaReranker } from "@agentium/core";

const reranker = new JinaReranker({
  apiKey: process.env.JINA_API_KEY,
  model: "jina-reranker-v2-base-multilingual",  // default
  baseURL: "https://api.jina.ai/v1",
});

No SDK install required. Same fetch-based pattern as Voyage.

`ColbertReranker` (local, no API key)

import { ColbertReranker } from "@agentium/core";

const reranker = new ColbertReranker({
  model: "Xenova/ms-marco-MiniLM-L-6-v2", // default - small + fast
  prewarm: true,                          // load model on construction
});

Requires: npm install @xenova/transformers (optional peer dep). Runs a HuggingFace cross-encoder model entirely in process via WASM/ONNX. The first call after construction lazy-loads the model (~50MB download for MiniLM-L-6-v2); subsequent calls are local-only. Important: the default MiniLM-L-6-v2 is a classic cross-encoder, not true ColBERT v2 late interaction. For production-grade ColBERT (~3x better quality, similar latency), point this at a dedicated endpoint such as JinaAI ColBERT or self-host ColBERTv2/PLAID. The class name “ColbertReranker” refers to the role (late-interaction reranker), not the model itself.

Wiring into a vector store

Every VectorStore in @agentium/core accepts a rerank option:

import { CohereReranker, InMemoryVectorStore, OpenAIEmbedding } from "@agentium/core";

const embedder = new OpenAIEmbedding();
const store = new InMemoryVectorStore(embedder);
const reranker = new CohereReranker();

const results = await store.search("docs", "Tell me about cats.", {
  topK: 5,
  rerank: reranker,
  rerankMultiplier: 3,        // fetch 5*3=15 candidates, rerank down to 5
});

How `rerankMultiplier` works

When a reranker is set:

The vector backend fetches topK * rerankMultiplier candidates from the underlying ANN index.
The reranker scores each one against the original query.
The reranker returns the top topK by its own score.

rerankMultiplier defaults to 3. Larger values give the reranker more candidates to choose from (better recall) at the cost of latency + reranker tokens. topK=5, rerankMultiplier=10 is a sensible “high quality” setting.

Query types the reranker sees

The reranker requires a text query. The vector backend hands it whatever it can extract:

Original query	What the reranker gets
`string`	the string verbatim
`ContentPart[]` with at least one text part	concatenated text parts joined with spaces
`ContentPart[]` with no text (e.g. image-only)	reranker is skipped, vector ranking is used
`number[]` (precomputed vector)	reranker is skipped, vector ranking is used

This matters for multimodal indexes: if you want reranking on an image query, supply a text caption alongside the image part.

Backend-by-backend behavior

All four built-in backends call the same BaseVectorStore.applyRerank() chokepoint, so behavior is identical:

InMemoryVectorStore — fetches topK * multiplier from the local cosine ranking.
PgVectorStore — adjusts the SQL LIMIT to the larger fetch size; doesn’t apply minScore until after rerank.
QdrantVectorStore — sets limit: fetchK and omits score_threshold when reranker is set (rerank handles thresholding).
MongoDBVectorStore — applies to both the Atlas $vectorSearch path and the in-process brute-force fallback.

`minScore` interaction

When you pass minScore with rerank:

await store.search("docs", query, { topK: 5, minScore: 0.7, rerank });

The threshold is applied by the reranker, not by the vector backend, because the two score distributions are completely different (cosine 0–1 vs Cohere relevance scores typically 0–10).

Standalone usage

A reranker also works without a vector store, e.g. to reorder a BM25 candidate list or to score a set of LLM-generated options:

const ranked = await reranker.rerank(
  "Which big cat lives in Asia?",
  [
    { id: "1", content: "Tigers roam Asian forests." },
    { id: "2", content: "Lions live in African savannahs." },
    { id: "3", content: "Snow leopards inhabit the Himalayas." },
  ],
  { topK: 2 },
);
// ranked[0] => { index: 0, score: ~0.92, id: "1", content: "Tigers..." }

Composing rerankers

You can stack rerankers cheaply by calling them in sequence:

// Stage 2a: fast lite reranker narrows 100 -> 30
const stage2a = await new VoyageReranker({ model: "rerank-2-lite" })
  .rerank(query, candidates, { topK: 30 });

// Stage 2b: expensive top-tier reranker scores the final 30 -> 5
const stage2b = await new CohereReranker().rerank(
  query,
  stage2a.map((r) => ({ id: r.id, content: r.content })),
  { topK: 5 },
);

Performance characteristics

Provider	Latency (50 docs)	Cost per 1K reranks	Notes
Cohere `rerank-v3.5`	~200ms	$1.00 (Cohere pricing)	HTTPS round-trip + model
Voyage `rerank-2`	~250ms	$0.50	Comparable quality
Voyage `rerank-2-lite`	~150ms	$0.05	Great for large batches
Jina `jina-reranker-v2`	~300ms	$0.10	Multilingual focus
Local `MiniLM-L-6-v2`	~50ms per doc, batched	free	First call: 50MB model download

(Numbers are rough; benchmark your own workload.)

Errors and edge cases

Situation	Behavior
Empty `documents` array	Returns `[]` immediately without calling the API
`apiKey` missing AND env var missing	Constructor succeeds; first `rerank()` call throws `"missing API key"`
`cohere-ai` not installed (Cohere provider)	Constructor throws `"cohere-ai is required..."` with install hint
HTTP 429 / 500 / 502 / 503	Auto-retry up to 2 times with exponential backoff
HTTP 400 / 401 / 403 / 404	Throws immediately (no retry)
Reranker returns more results than `topK`	Truncated to `topK`

​Reranking

​What is reranking?

​The Reranker interface

​RerankDocument

​RerankOptions

​RerankResult

​Built-in providers

​CohereReranker

​VoyageReranker

​JinaReranker

​ColbertReranker (local, no API key)

​Wiring into a vector store

​How rerankMultiplier works

​Query types the reranker sees

​Backend-by-backend behavior

​minScore interaction

​Standalone usage

​Composing rerankers

​Performance characteristics

​Errors and edge cases

​See also