Hybrid Sparse + Dense Retrieval
Combine lexical (BM25) and dense retrieval with weighted fusion or cascades to improve recall and precision for search and RAG.
Understanding Hybrid Retrieval
Why combining keyword search and semantic search gives you the best of both worlds.
Hybrid Retrieval: The Solution
Combine sparse and dense retrieval, then fuse their rankings. You get exact keyword matching AND semantic understanding. Documents that score well in both methods rise to the top.
Sparse vs Dense: How They Work
Two fundamentally different approaches to representing and matching text.
Sparse (BM25)
Most dimensions are zero (sparse)
- - Exact term matching (product codes, names)
- - No training required
- - Interpretable (you know why it matched)
- - Fast and efficient
Dense (Embeddings)
Every dimension has a value (dense)
- - Understands synonyms ("car" ~ "automobile")
- - Captures meaning, not just keywords
- - Handles paraphrasing well
- - Works across languages (multilingual models)
Interactive: See the Difference
Try different queries and see how BM25, dense retrieval, and hybrid ranking compare.
The Python programming language was created by Guido van Rossum and first released in 1991.
Guido van Rossum worked at Google and later at Dropbox before retiring.
Python is known for its clean syntax and readability, making it ideal for beginners.
The Zen of Python emphasizes that explicit is better than implicit.
Reciprocal Rank Fusion
The elegant algorithm that combines multiple rankings.
Advanced Hybrid Methods
Beyond simple BM25 + dense fusion, there are more sophisticated approaches.
ColBERT
Instead of one embedding per document, ColBERT creates one embedding per token. At query time, each query token finds its best-matching document token (MaxSim).
SPLADE
Uses a neural network to learn sparse representations. Unlike BM25, SPLADE can expand queries with related terms and weight them by learned importance.
Hybrid + Reranking
First stage: Hybrid retrieval returns top 50-100 candidates quickly. Second stage: Cross-encoder reranker scores each (query, doc) pair for precision.
Linear Combination
Instead of RRF, directly combine normalized scores with tunable weights. Requires score normalization to be on the same scale.
Implementation Examples
Ready-to-use code for popular frameworks.
from langchain.retrievers import EnsembleRetriever
from langchain_community.retrievers import BM25Retriever
from langchain_community.vectorstores import Chroma
# Create the BM25 retriever (sparse)
bm25_retriever = BM25Retriever.from_documents(documents)
bm25_retriever.k = 5 # Top 5 results
# Create the dense vector retriever
vectorstore = Chroma.from_documents(documents, embeddings)
dense_retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
# Combine with equal weights
hybrid_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, dense_retriever],
weights=[0.5, 0.5] # Weight for each retriever
)
# Retrieve with hybrid search
results = hybrid_retriever.invoke("python creator")LangChain's EnsembleRetriever handles score normalization and fusion automatically.
Decision Guide: When to Use What
The Complete Hybrid Pipeline
Hybrid retrieval combines the precision of keyword matching with the semantic understanding of embeddings. RRF elegantly fuses rankings without worrying about score normalization. For most RAG applications, hybrid retrieval with optional reranking is the recommended approach.
Use Cases
- ✓Enterprise search
- ✓Legal/medical retrieval
- ✓E-commerce search
- ✓RAG recall boost
Architectural Patterns
Score Fusion
Normalize and fuse BM25 and dense scores (e.g., RRF, weighted sum).
Cascade + Rerank
Retrieve with BM25, expand with dense, then cross-encode rerank.
Implementations
Benchmarks
Quick Facts
- Input
- Text
- Output
- Structured Data
- Implementations
- 3 open source, 0 API
- Patterns
- 2 approaches
Related Blocks
Have benchmark data?
Help us track the state of the art for hybrid sparse + dense retrieval.
Submit Results