Level 1: Single Blocks~15 min

Text Embeddings Deep Dive

Not all embedding models are equal. Learn to choose the right one for your use case.

Quick Start: Real Working Code

Let's start with code you can copy, paste, and run immediately. Install the dependencies first:

# Install dependencies
pip install sentence-transformers faiss-cpu numpy

Text Embedding with Sentence Transformers

This is the most common way to create embeddings. The BGE model from BAAI is currently one of the best open-source options.

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer('BAAI/bge-large-en-v1.5')
documents = ['The cat sat on the mat', 'A dog played in the park', 'Machine learning is fascinating']
embeddings = model.encode(documents, normalize_embeddings=True)

query = 'pets resting at home'
query_embedding = model.encode(query, normalize_embeddings=True)
similarities = np.dot(embeddings, query_embedding)
for doc, sim in zip(documents, similarities):
    print(f'{sim:.3f}: {doc}')
Expected output:
0.621: The cat sat on the mat
0.487: A dog played in the park
0.312: Machine learning is fascinating

Build a Semantic Search Index with FAISS

For production use with thousands or millions of documents, use FAISS for efficient similarity search.

from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

model = SentenceTransformer('BAAI/bge-large-en-v1.5')
documents = [
    'Python is a programming language',
    'JavaScript runs in the browser',
    'SQL is used to query databases',
    'Redis is an in-memory data store',
    'PostgreSQL is a relational database'
]
embeddings = model.encode(documents, normalize_embeddings=True).astype('float32')

# Create FAISS index for fast similarity search
index = faiss.IndexFlatIP(embeddings.shape[1])
index.add(embeddings)

# Search for similar documents
query = 'how to store data'
query_vec = model.encode([query], normalize_embeddings=True).astype('float32')
D, I = index.search(query_vec, k=3)

print('Top 3 results:')
for score, idx in zip(D[0], I[0]):
    print(f'{score:.3f}: {documents[idx]}')

SOTA Performance: MTEB Benchmark

The Massive Text Embedding Benchmark (MTEB) is the industry standard for comparing embedding models. Here are current SOTA scores:

Current SOTA Embedding Models

bge-large-en-v1.5(BAAI, Open Source)
64.23MTEB avg
text-embedding-3-large(OpenAI, API)
~64.6MTEB avg
all-MiniLM-L6-v2(Sentence Transformers)
56.3MTEB avg

Key Insight

A model with MTEB score of 65 is not just "better" than one with 55. It means the 65-score model will find more relevant documents, make fewer mistakes in classification, and cluster your data more accurately. These differences compound in production.

Embedding Model Categories

There are three main categories of text embedding models:

Sentence Transformers (Open Source)

Free, run locally, no API costs. Best options: bge-large-en-v1.5 (MTEB 64.23), all-mpnet-base-v2.

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('BAAI/bge-large-en-v1.5')
embedding = model.encode('Hello world', normalize_embeddings=True)

OpenAI Embeddings (API)

High quality (MTEB ~64.6), pay-per-use pricing. Easy integration but adds latency and cost.

from openai import OpenAI
client = OpenAI()
response = client.embeddings.create(
    model="text-embedding-3-large",
    input="Hello world"
)
embedding = response.data[0].embedding

Cohere Embeddings (API)

Strong multilingual support. Purpose-built for search and retrieval tasks.

import cohere
co = cohere.Client('api-key')
response = co.embed(
    texts=["Hello world"],
    model="embed-english-v3.0",
    input_type="search_document"
)

Code Reference and Model Comparison

Explore different embedding models and see real code examples. All code snippets are copy-paste ready.

Select Embedding Model

Install dependencies:
pip install sentence-transformers faiss-cpu numpy
Basic embedding with similarity calculation:
from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer('BAAI/bge-large-en-v1.5')
documents = ['The cat sat on the mat', 'A dog played in the park', 'Machine learning is fascinating']
embeddings = model.encode(documents, normalize_embeddings=True)

query = 'pets resting at home'
query_embedding = model.encode(query, normalize_embeddings=True)
similarities = np.dot(embeddings, query_embedding)
for doc, sim in zip(documents, similarities):
    print(f'{sim:.3f}: {doc}')

Model Comparison

ModelDimensionsMTEB ScoreCostBest For
BAAI/bge-large-en-v1.5
102464.23Free (local)Best open-source model
all-MiniLM-L6-v2
38456.3Free (local)Fast, lightweight
text-embedding-3-large
307264.6Pay per useHighest quality OpenAI embedding
text-embedding-3-small
153662.3Pay per useCost-effective API option
embed-english-v3.0
102464.5Pay per useStrong performance

Understanding Embedding Dimensions

Embedding models output vectors of different sizes. This is called the embedding dimension. Common sizes are:

384

Small

Fast, low memory. Good for prototypes.

1024

Medium

Best balance. BGE-large uses this.

3072

Large

Highest quality. OpenAI large model.

The Tradeoffs

DimensionStorage (per vector)Search SpeedQuality
3841.5 KBFastestGood
10244 KBFastBetter
15366 KBMediumBest
307212 KBSlowerBest

At 1 million documents, a 384-dim embedding takes 1.5 GB while a 3072-dim embedding takes 12 GB. This matters for vector databases and real-time search.

When to Use Which Model

Choosing the right embedding model depends on your constraints:

Prototyping / Learning

Use all-MiniLM-L6-v2. Free, fast, runs anywhere.

384 dims | 5ms latency | MTEB 56.3 | Good enough to validate ideas.

Production (Best Open Source)

Use BAAI/bge-large-en-v1.5. SOTA quality without API costs.

1024 dims | MTEB 64.23 | Run locally or self-host. Best open-source option.

Production (Quality-Critical API)

Use text-embedding-3-large or embed-english-v3.0.

1024-3072 dims | MTEB 64+ | Best retrieval quality. Worth the cost for high-stakes RAG.

Multilingual

Use embed-multilingual-v3.0 (Cohere) or multilingual sentence transformers.

Trained on 100+ languages. Essential if your content is not English-only.

Latency-Critical (Real-time)

Use locally-hosted small models. API calls add 50-200ms overhead.

For search-as-you-type or real-time features, local inference is essential.

Key Takeaways

  • 1

    BGE-large is the best open-source option: MTEB 64.23, free to run locally, and competitive with commercial APIs.

  • 2

    MTEB is the benchmark: Use it to compare models objectively. Higher scores mean better retrieval and classification.

  • 3

    Use FAISS for production: Efficient similarity search at scale with support for millions of vectors.

  • 4

    Match model to use case: Prototyping wants speed. Production wants quality. Real-time needs local inference.

Practice Exercise

Copy the code above and try these exercises:

  1. 1.Change the query to find documents about "data storage" and see which documents are most similar.
  2. 2.Add more documents to the list and rebuild the FAISS index. Test with k=5 results.
  3. 3.Try switching from bge-large-en-v1.5 to all-MiniLM-L6-v2 and compare the results.