Level 1: Single Blocks~15 min

Text Embeddings Deep Dive

Not all embedding models are equal. Learn to choose the right one for your use case.

Quick Start: Real Working Code

Let's start with code you can copy, paste, and run immediately. Install the dependencies first:

# Install dependencies

pip install sentence-transformers faiss-cpu numpy

Text Embedding with Sentence Transformers

This is the most common way to create embeddings. The BGE model from BAAI is currently one of the best open-source options.

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer('BAAI/bge-large-en-v1.5')
documents = ['The cat sat on the mat', 'A dog played in the park', 'Machine learning is fascinating']
embeddings = model.encode(documents, normalize_embeddings=True)

query = 'pets resting at home'
query_embedding = model.encode(query, normalize_embeddings=True)
similarities = np.dot(embeddings, query_embedding)
for doc, sim in zip(documents, similarities):
    print(f'{sim:.3f}: {doc}')

Expected output:

0.621: The cat sat on the mat
0.487: A dog played in the park
0.312: Machine learning is fascinating

Build a Semantic Search Index with FAISS

For production use with thousands or millions of documents, use FAISS for efficient similarity search.

from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

model = SentenceTransformer('BAAI/bge-large-en-v1.5')
documents = [
    'Python is a programming language',
    'JavaScript runs in the browser',
    'SQL is used to query databases',
    'Redis is an in-memory data store',
    'PostgreSQL is a relational database'
]
embeddings = model.encode(documents, normalize_embeddings=True).astype('float32')

# Create FAISS index for fast similarity search
index = faiss.IndexFlatIP(embeddings.shape[1])
index.add(embeddings)

# Search for similar documents
query = 'how to store data'
query_vec = model.encode([query], normalize_embeddings=True).astype('float32')
D, I = index.search(query_vec, k=3)

print('Top 3 results:')
for score, idx in zip(D[0], I[0]):
    print(f'{score:.3f}: {documents[idx]}')

SOTA Performance: MTEB Benchmark

The Massive Text Embedding Benchmark (MTEB) is the industry standard for comparing embedding models. Here are current SOTA scores:

Current SOTA Embedding Models

bge-large-en-v1.5(BAAI, Open Source)

64.23MTEB avg

text-embedding-3-large(OpenAI, API)

~64.6MTEB avg

all-MiniLM-L6-v2(Sentence Transformers)

56.3MTEB avg

MTEB Leaderboard

View all embedding models ranked by performance

CodeSOTA Benchmarks

Browse all AI benchmarks and SOTA results

Key Insight

A model with MTEB score of 65 is not just "better" than one with 55. It means the 65-score model will find more relevant documents, make fewer mistakes in classification, and cluster your data more accurately. These differences compound in production.

Embedding Model Categories

There are three main categories of text embedding models:

Sentence Transformers (Open Source)

Free, run locally, no API costs. Best options: bge-large-en-v1.5 (MTEB 64.23), all-mpnet-base-v2.

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('BAAI/bge-large-en-v1.5')
embedding = model.encode('Hello world', normalize_embeddings=True)

OpenAI Embeddings (API)

High quality (MTEB ~64.6), pay-per-use pricing. Easy integration but adds latency and cost.

from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(
    model="text-embedding-3-large",
    input="Hello world"
)
embedding = response.data[0].embedding

Cohere Embeddings (API)

Strong multilingual support. Purpose-built for search and retrieval tasks.

import cohere
co = cohere.Client('api-key')
response = co.embed(
    texts=["Hello world"],
    model="embed-english-v3.0",
    input_type="search_document"
)

Code Reference and Model Comparison

Explore different embedding models and see real code examples. All code snippets are copy-paste ready.

Select Embedding Model

Install dependencies:

pip install sentence-transformers faiss-cpu numpy

Basic embedding with similarity calculation:

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer('BAAI/bge-large-en-v1.5')
documents = ['The cat sat on the mat', 'A dog played in the park', 'Machine learning is fascinating']
embeddings = model.encode(documents, normalize_embeddings=True)

query = 'pets resting at home'
query_embedding = model.encode(query, normalize_embeddings=True)
similarities = np.dot(embeddings, query_embedding)
for doc, sim in zip(documents, similarities):
    print(f'{sim:.3f}: {doc}')

Model Comparison

Model	Dimensions	MTEB Score	Cost	Best For
BAAI/bge-large-en-v1.5	1024	64.23	Free (local)	Best open-source model
all-MiniLM-L6-v2	384	56.3	Free (local)	Fast, lightweight
text-embedding-3-large	3072	64.6	Pay per use	Highest quality OpenAI embedding
text-embedding-3-small	1536	62.3	Pay per use	Cost-effective API option
embed-english-v3.0	1024	64.5	Pay per use	Strong performance

MTEB Leaderboard

View full benchmark rankings on HuggingFace

CodeSOTA Benchmarks

Browse all AI benchmarks and SOTA results

Understanding Embedding Dimensions

Embedding models output vectors of different sizes. This is called the embedding dimension. Common sizes are:

384

Small

Fast, low memory. Good for prototypes.

1024

Medium

Best balance. BGE-large uses this.

3072

Large

Highest quality. OpenAI large model.

The Tradeoffs

Dimension	Storage (per vector)	Search Speed	Quality
384	1.5 KB	Fastest	Good
1024	4 KB	Fast	Better
1536	6 KB	Medium	Best
3072	12 KB	Slower	Best

At 1 million documents, a 384-dim embedding takes 1.5 GB while a 3072-dim embedding takes 12 GB. This matters for vector databases and real-time search.

When to Use Which Model

Choosing the right embedding model depends on your constraints:

Prototyping / Learning

Use all-MiniLM-L6-v2. Free, fast, runs anywhere.

384 dims | 5ms latency | MTEB 56.3 | Good enough to validate ideas.

Production (Best Open Source)

Use BAAI/bge-large-en-v1.5. SOTA quality without API costs.

1024 dims | MTEB 64.23 | Run locally or self-host. Best open-source option.

Production (Quality-Critical API)

Use text-embedding-3-large or embed-english-v3.0.

1024-3072 dims | MTEB 64+ | Best retrieval quality. Worth the cost for high-stakes RAG.

Multilingual

Use embed-multilingual-v3.0 (Cohere) or multilingual sentence transformers.

Trained on 100+ languages. Essential if your content is not English-only.

Latency-Critical (Real-time)

Use locally-hosted small models. API calls add 50-200ms overhead.

For search-as-you-type or real-time features, local inference is essential.

Key Takeaways

1
BGE-large is the best open-source option: MTEB 64.23, free to run locally, and competitive with commercial APIs.
2
MTEB is the benchmark: Use it to compare models objectively. Higher scores mean better retrieval and classification.
3
Use FAISS for production: Efficient similarity search at scale with support for millions of vectors.
4
Match model to use case: Prototyping wants speed. Production wants quality. Real-time needs local inference.

Practice Exercise

Copy the code above and try these exercises:

1.Change the query to find documents about "data storage" and see which documents are most similar.
2.Add more documents to the list and rebuild the FAISS index. Test with k=5 results.
3.Try switching from bge-large-en-v1.5 to all-MiniLM-L6-v2 and compare the results.

Next: Image Embeddings (Coming Soon)Previous: What is an Embedding?