Codesota · Natural Language Processing · Feature Extraction · MTEB LeaderboardTasks/Natural Language Processing/Feature Extraction
Feature Extraction · benchmark dataset · 2022 · EN

MTEB Leaderboard.

Massive Text Embedding Benchmark across 8 task categories

Submit a result
§ 01 · Leaderboard

Best published scores.

44 results indexed across 2 metrics. Shaded row marks current SOTA; ties broken by submission date.


Primary
accuracy · higher is better
All metrics
avg-score, mteb-score
avg-score
6 rows
#ModelOrgSubmittedPaper / codeavg-score
01NV-Embed-v2OpenNVIDIASep 2024NV-Embed: Improved Techniques for Training LLMs as Gener…72.31
02GTE-Qwen2-7B-instructOpenAlibabaJun 2024arxiv72.05
03voyage-3-largeVoyage AIJan 2025arxiv70.32
04E5-Mistral-7B-instructOpenMicrosoftJan 2024Improving Text Embeddings with Large Language Models66.63
05jina-embeddings-v3OpenJina AISep 2024jina-embeddings-v3: Multilingual Embeddings With Task Lo…65.18
06text-embedding-3-largeOpenAIJan 2024arxiv64.60
mteb-score
38 rows
#ModelOrgSubmittedPaper / codemteb-score
01QZhou-EmbeddingAug 2025QZhou-Embedding Technical Report75.97
02Qwen3-Embedding-8BJun 2025Qwen3 Embedding: Advancing Text Embedding and Reranking … · code75.23
03Jasper-Token-Compression-600MNov 2025Jasper-Token-Compression-600M Technical Report · code74.75
04Qwen3-Embedding-4BJun 2025Qwen3 Embedding: Advancing Text Embedding and Reranking … · code74.61
05LGAI-Embedding-PreviewJun 2025LG-ANNA-Embedding technical report74.12
06F2LLM-4BOct 2025F2LLM Technical Report: Matching SOTA Embedding Performa… · code73.67
07gemini-embedding-001Mar 2025Gemini Embedding: Generalizable Embeddings from Gemini73.30
08F2LLM-v2-14BMar 2026F2LLM-v2: Inclusive, Performant, and Efficient Embedding…73.08
09F2LLM-v2-8BMar 2026F2LLM-v2: Inclusive, Performant, and Efficient Embedding…72.86
10F2LLM-v2-4BMar 2026F2LLM-v2: Inclusive, Performant, and Efficient Embedding…72.41
11F2LLM-1.7BOct 2025F2LLM Technical Report: Matching SOTA Embedding Performa… · code72.01
12jina-embeddings-v5-omni-smallMay 2026jina-embeddings-v5-omni: Text-Geometry-Preserving Multim…71.78
13jina-embeddings-v5-text-smallFeb 2026jina-embeddings-v5-text: Task-Targeted Embedding Distill…71.78
14F2LLM-v2-1.7BMar 2026F2LLM-v2: Inclusive, Performant, and Efficient Embedding…71.63
15jasper_en_vision_language_v1Dec 2024Jasper and Stella: distillation of SOTA embedding models · code71.41
16KaLM-embedding-multilingual-mini-instruct-v2.5Jun 2025KaLM-Embedding-V2: Superior Training Techniques and Data… · code71.29
17jina-embeddings-v5-text-nanoFeb 2026jina-embeddings-v5-text: Task-Targeted Embedding Distill…71.11
18jina-embeddings-v5-omni-nanoMay 2026jina-embeddings-v5-omni: Text-Geometry-Preserving Multim…71.11
19GTE-Qwen2-7B-instructOpenAlibabaAug 2023Towards General Text Embeddings with Multi-stage Contras…70.72
20Qwen3-Embedding-0.6BJun 2025Qwen3 Embedding: Advancing Text Embedding and Reranking … · code70.47
21ICT-TIME-and-Querit-embedding-v1Feb 2026Bagging-Based Model Merging for Robust General Text Embe…70.12
22F2LLM-0.6BOct 2025F2LLM Technical Report: Matching SOTA Embedding Performa… · code70.03
23F2LLM-v2-0.6BMar 2026F2LLM-v2: Inclusive, Performant, and Efficient Embedding…69.97
24NV-Embed-v2OpenNVIDIAMay 2024NV-Embed: Improved Techniques for Training LLMs as Gener…69.81
25Linq-Embed-MistralMay 2024pwc-dump69.80
26embeddinggemma-300mSep 2025EmbeddingGemma: Powerful and Lightweight Text Representa…69.67
27stella_en_1.5B_v5Dec 2024Jasper and Stella: distillation of SOTA embedding models · code69.43
28stella_en_400M_v5Dec 2024Jasper and Stella: distillation of SOTA embedding models · code69.39
29SFR-Embedding-MistralJan 2024pwc-dump69.31
30F2LLM-v2-330MMar 2026F2LLM-v2: Inclusive, Performant, and Efficient Embedding…68.86
31NV-Embed-v1May 2024NV-Embed: Improved Techniques for Training LLMs as Gener…68.32
32E5-Mistral-7B-instructOpenMicrosoftDec 2023Improving Text Embeddings with Large Language Models · code67.97
33gte-Qwen2-1.5B-instructAug 2023Towards General Text Embeddings with Multi-stage Contras…67.20
34GritLM-7BFeb 2024Generative Representational Instruction Tuning · code67.07
35UAE-Large-V1Sep 2023AnglE-optimized Text Embeddings · code66.40
36mxbai-embed-large-v1Sep 2023AnglE-optimized Text Embeddings · code66.26
37GIST-large-Embedding-v0Feb 2024GISTEmbed: Guided In-sample Selection of Training Negati… · code66.25
38GritLM-8x7BFeb 2024Generative Representational Instruction Tuning · code66.16
Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.
§ 04 · Literature

20 papers
tied to this benchmark.

Every paper below corresponds to at least one row in the leaderboard above. Click through for the arXiv preprint and, when available, the reference implementation.

§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies