Text Embeddings Deep Dive
Not all embedding models are equal. Learn to choose the right one for your use case.
Quick Start: Real Working Code
Let's start with code you can copy, paste, and run immediately. Install the dependencies first:
Text Embedding with Sentence Transformers
This is the most common way to create embeddings. The BGE model from BAAI is currently one of the best open-source options.
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer('BAAI/bge-large-en-v1.5')
documents = ['The cat sat on the mat', 'A dog played in the park', 'Machine learning is fascinating']
embeddings = model.encode(documents, normalize_embeddings=True)
query = 'pets resting at home'
query_embedding = model.encode(query, normalize_embeddings=True)
similarities = np.dot(embeddings, query_embedding)
for doc, sim in zip(documents, similarities):
print(f'{sim:.3f}: {doc}')0.621: The cat sat on the mat 0.487: A dog played in the park 0.312: Machine learning is fascinating
Build a Semantic Search Index with FAISS
For production use with thousands or millions of documents, use FAISS for efficient similarity search.
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
model = SentenceTransformer('BAAI/bge-large-en-v1.5')
documents = [
'Python is a programming language',
'JavaScript runs in the browser',
'SQL is used to query databases',
'Redis is an in-memory data store',
'PostgreSQL is a relational database'
]
embeddings = model.encode(documents, normalize_embeddings=True).astype('float32')
# Create FAISS index for fast similarity search
index = faiss.IndexFlatIP(embeddings.shape[1])
index.add(embeddings)
# Search for similar documents
query = 'how to store data'
query_vec = model.encode([query], normalize_embeddings=True).astype('float32')
D, I = index.search(query_vec, k=3)
print('Top 3 results:')
for score, idx in zip(D[0], I[0]):
print(f'{score:.3f}: {documents[idx]}')SOTA Performance: MTEB Benchmark
The Massive Text Embedding Benchmark (MTEB) is the industry standard for comparing embedding models. Here are current SOTA scores:
Current SOTA Embedding Models
Key Insight
A model with MTEB score of 65 is not just "better" than one with 55. It means the 65-score model will find more relevant documents, make fewer mistakes in classification, and cluster your data more accurately. These differences compound in production.
Embedding Model Categories
There are three main categories of text embedding models:
Sentence Transformers (Open Source)
Free, run locally, no API costs. Best options: bge-large-en-v1.5 (MTEB 64.23), all-mpnet-base-v2.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('BAAI/bge-large-en-v1.5')
embedding = model.encode('Hello world', normalize_embeddings=True)OpenAI Embeddings (API)
High quality (MTEB ~64.6), pay-per-use pricing. Easy integration but adds latency and cost.
from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(
model="text-embedding-3-large",
input="Hello world"
)
embedding = response.data[0].embeddingCohere Embeddings (API)
Strong multilingual support. Purpose-built for search and retrieval tasks.
import cohere
co = cohere.Client('api-key')
response = co.embed(
texts=["Hello world"],
model="embed-english-v3.0",
input_type="search_document"
)Code Reference and Model Comparison
Explore different embedding models and see real code examples. All code snippets are copy-paste ready.
Select Embedding Model
from sentence_transformers import SentenceTransformer
import numpy as np
model = SentenceTransformer('BAAI/bge-large-en-v1.5')
documents = ['The cat sat on the mat', 'A dog played in the park', 'Machine learning is fascinating']
embeddings = model.encode(documents, normalize_embeddings=True)
query = 'pets resting at home'
query_embedding = model.encode(query, normalize_embeddings=True)
similarities = np.dot(embeddings, query_embedding)
for doc, sim in zip(documents, similarities):
print(f'{sim:.3f}: {doc}')Model Comparison
| Model | Dimensions | MTEB Score | Cost | Best For |
|---|---|---|---|---|
BAAI/bge-large-en-v1.5 | 1024 | 64.23 | Free (local) | Best open-source model |
all-MiniLM-L6-v2 | 384 | 56.3 | Free (local) | Fast, lightweight |
text-embedding-3-large | 3072 | 64.6 | Pay per use | Highest quality OpenAI embedding |
text-embedding-3-small | 1536 | 62.3 | Pay per use | Cost-effective API option |
embed-english-v3.0 | 1024 | 64.5 | Pay per use | Strong performance |
Understanding Embedding Dimensions
Embedding models output vectors of different sizes. This is called the embedding dimension. Common sizes are:
Small
Fast, low memory. Good for prototypes.
Medium
Best balance. BGE-large uses this.
Large
Highest quality. OpenAI large model.
The Tradeoffs
| Dimension | Storage (per vector) | Search Speed | Quality |
|---|---|---|---|
| 384 | 1.5 KB | Fastest | Good |
| 1024 | 4 KB | Fast | Better |
| 1536 | 6 KB | Medium | Best |
| 3072 | 12 KB | Slower | Best |
At 1 million documents, a 384-dim embedding takes 1.5 GB while a 3072-dim embedding takes 12 GB. This matters for vector databases and real-time search.
When to Use Which Model
Choosing the right embedding model depends on your constraints:
Prototyping / Learning
Use all-MiniLM-L6-v2. Free, fast, runs anywhere.
384 dims | 5ms latency | MTEB 56.3 | Good enough to validate ideas.
Production (Best Open Source)
Use BAAI/bge-large-en-v1.5. SOTA quality without API costs.
1024 dims | MTEB 64.23 | Run locally or self-host. Best open-source option.
Production (Quality-Critical API)
Use text-embedding-3-large or embed-english-v3.0.
1024-3072 dims | MTEB 64+ | Best retrieval quality. Worth the cost for high-stakes RAG.
Multilingual
Use embed-multilingual-v3.0 (Cohere) or multilingual sentence transformers.
Trained on 100+ languages. Essential if your content is not English-only.
Latency-Critical (Real-time)
Use locally-hosted small models. API calls add 50-200ms overhead.
For search-as-you-type or real-time features, local inference is essential.
Key Takeaways
- 1
BGE-large is the best open-source option: MTEB 64.23, free to run locally, and competitive with commercial APIs.
- 2
MTEB is the benchmark: Use it to compare models objectively. Higher scores mean better retrieval and classification.
- 3
Use FAISS for production: Efficient similarity search at scale with support for millions of vectors.
- 4
Match model to use case: Prototyping wants speed. Production wants quality. Real-time needs local inference.
Practice Exercise
Copy the code above and try these exercises:
- 1.Change the query to find documents about "data storage" and see which documents are most similar.
- 2.Add more documents to the list and rebuild the FAISS index. Test with k=5 results.
- 3.Try switching from bge-large-en-v1.5 to all-MiniLM-L6-v2 and compare the results.