Feature extraction — generating dense vector embeddings from text — is the unsung infrastructure layer powering semantic search, RAG pipelines, clustering, and recommendation systems. Sentence-BERT (2019) made it practical, but the field exploded in 2023-2024 with instruction-tuned embedding models like E5-Mistral, GTE-Qwen2, and Nomic Embed that turned decoder-only LLMs into embedding engines, pushing MTEB scores past 70 average across 50+ tasks. The key insight was that pre-training scale transfers to embedding quality — a 7B parameter embedding model crushes a 110M one on zero-shot retrieval. Matryoshka representation learning (Kusupati et al., 2022) added the ability to truncate embeddings to any dimension without retraining, making deployment flexible across latency and storage budgets.
Massive Text Embedding Benchmark across 8 task categories
Leading models on MTEB Leaderboard.
| # | Model | mteb-score | Year | Source |
|---|---|---|---|---|
| ★ | QZhou-Embedding | 76.0 | 2025 | paper ↗ |
| 2 | Qwen3-Embedding-8B | 75.2 | 2025 | paper ↗ |
| 3 | Jasper-Token-Compression-600M | 74.8 | 2025 | paper ↗ |
| 4 | Qwen3-Embedding-4B | 74.6 | 2025 | paper ↗ |
| 5 | LGAI-Embedding-Preview | 74.1 | 2025 | paper ↗ |
| 6 | F2LLM-4B | 73.7 | 2025 | paper ↗ |
| 7 | gemini-embedding-001 | 73.3 | 2025 | paper ↗ |
| 8 | F2LLM-v2-14B | 73.1 | 2026 | paper ↗ |
| 9 | F2LLM-v2-8B | 72.9 | 2026 | paper ↗ |
| 10 | F2LLM-v2-4B | 72.4 | 2026 | paper ↗ |
Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.
Still looking for something on Feature Extraction? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.
Real humans read every message. We track what people are asking for and prioritize accordingly.