Best reranker model / MTEB reranking leaderboard SOTA
The best reranker is the one that wins your second-stage retrieval test.
Use the live MTEB reranking leaderboard to find current SOTA candidates, then test them on your own corpus. Public MTEB reranking is useful for shortlisting, but it should not be treated as a guarantee that one model is the best reranker for every RAG, search, or question-answering system.
Reranking vs embeddings
They solve adjacent parts of the retrieval stack. Confusing them leads to slow systems or weak relevance.
| Axis | Embeddings | Reranking |
|---|---|---|
| Primary job | Create vectors for broad candidate retrieval. | Re-score a small candidate set for final ordering. |
| Input shape | Usually one text at a time: query or document. | Query-document pairs, often scored one pair at a time or in batches. |
| Where it fits | Stage 1 retrieval, clustering, deduplication, semantic search. | Stage 2 ranking for RAG, search, QA, and recommendation candidates. |
| Main tradeoff | Fast and scalable, but may miss subtle relevance ordering. | More precise ordering, but higher latency and compute per candidate. |
Decision guide
Use this before asking which model is SOTA. The right reranker depends on the retrieval architecture.
| Situation | Choose | Why |
|---|---|---|
| You need the best possible answer order | Use a strong reranker after first-stage embedding retrieval. | Rerankers compare the query with each candidate passage directly, so they usually improve final ordering when the candidate set is already relevant. |
| You need low-latency search over millions of documents | Use embeddings first, then rerank only the top 20-200 candidates. | Embedding search is built for fast approximate nearest-neighbor lookup; reranking every document is normally too expensive. |
| You serve regulated or private workloads | Prefer an auditable open-weight reranker or a private deployment path. | MTEB does not answer data-governance, retention, logging, or deployment-control questions. |
| You handle long technical documents | Check context length, chunking behavior, and domain evaluation before trusting a rank. | A high reranking score can hide practical limits around long passages, tables, citations, or code-heavy text. |
The MTEB reranking caveat
MTEB reranking scores are benchmark evidence, not production truth. They compress many datasets into public comparisons, but they cannot know your corpus, your query distribution, your chunking, your languages, or your latency ceiling.
Because public leaderboards change, this page avoids precise current scores. For exact ranks, use the live MTEB leaderboard and the broader CodeSOTA MTEB overview.
Practical shortlist rules
- Treat the MTEB reranking leaderboard as a shortlist generator, not a final procurement answer.
- Validate on your own query logs, negative examples, document lengths, languages, and latency budget.
- Compare a pure embedding baseline against embedding plus reranking; the lift matters more than the absolute public rank.
- Prefer qualitative robustness over tiny leaderboard gaps when scores are close or when exact current numbers are not verified.