Codesota · Benchmark · BEIRHome/Leaderboards/BEIR
Unknown

BEIR.

Heterogeneous information retrieval benchmark across 18 datasets

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Ndcg@10

Ndcg@10 is the reported evaluation metric for BEIR. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Ndcg@10verifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01NV-Embed-v2
NV-Embed-v2 average nDCG@10 on BEIR (15 datasets). Rank #1 retrieval on MTEB leaderboard.
verified62.652024Source ↗Looks wrong?
02GTE-Qwen2-7B-instruct
GTE-Qwen2-7B average nDCG@10 on BEIR (MTEB retrieval sub-task). From HF model card.
verified60.252024Source ↗Looks wrong?
03E5-Mistral-7B-instruct
E5-Mistral-7B average nDCG@10 on BEIR 15 datasets, from paper Table 1.
verified56.92024Source ↗Looks wrong?
04ColBERTv2
ColBERTv2 average nDCG@10 on BEIR 18 datasets. From original BEIR / ColBERTv2 papers.
verified49.42022Paper ↗Looks wrong?

Ndcg 10

Ndcg 10 is the reported evaluation metric for BEIR. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for Ndcg 10verifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01ModernBERT (large)unverified442024Paper ↗Code ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Leaderboards