Text Ranking

Text ranking is the invisible backbone of every search engine and RAG pipeline. The field was transformed by ColBERT (2020) introducing late interaction, then by instruction-tuned embedding models like E5-Mistral and GTE-Qwen that turned general LLMs into retrieval engines. MS MARCO and BEIR remain the standard battlegrounds, but the real test is zero-shot transfer — can a model trained on web search generalize to legal documents, scientific papers, and code? The gap between supervised and zero-shot performance has shrunk from 15+ points to under 3 in two years.

2
Datasets
8
Results
ndcg
Canonical metric
Canonical Benchmark

BEIR

Heterogeneous information retrieval benchmark across 18 datasets

Primary metric: ndcg
View full leaderboard

Top 10

Leading models on BEIR.

RankModelndcg@10YearSource
1
NV-Embed-v2
62.62024paper
2
GTE-Qwen2-7B-instruct
60.32024paper
3
E5-Mistral-7B-instruct
56.92024paper
4
ColBERTv2
49.42022paper

All datasets

2 datasets tracked for this task.

Related tasks

Other tasks in Natural Language Processing.

Run Inference

Looking to run a model? HuggingFace hosts inference for this task type.

HuggingFace