Named Entity Recognition
Extract named entities (people, organizations, locations, dates) from text. Key for information extraction and knowledge graphs.
How Named Entity Recognition Works
A technical deep-dive into Named Entity Recognition. From token classification to zero-shot approaches with GLiNER and LLMs.
What is Named Entity Recognition?
NER identifies and classifies named entities in text into predefined categories. Think of it as structured information extraction from unstructured text.
NER in Action
Barack Obama was the 444th Presidentof the UUnited Statesfrom 22009to 22017
Common Entity Types
Extract structured data from documents, emails, and news articles. Build knowledge graphs automatically.
Identify key entities in contracts, legal documents, and medical records for downstream processing.
Power entity-based search, auto-linking, and semantic navigation in content management systems.
BIO/IOB Tagging Scheme
NER is formulated as token classification. Each token gets a tag. BIO tagging handles multi-word entities by marking the Beginning,Inside, and Outside of entities.
BIO Tags Explained
Example: "Elon Musk founded SpaceX in California"
| Token | Elon | Musk | founded | SpaceX | in | California |
|---|---|---|---|---|---|---|
| BIO Tag | B-PERSON | I-PERSON | O | B-ORG | O | B-LOC |
Why Not Just Label Tokens?
"John Smith met Jane Doe" - without BIO, we cannot tell where one PERSON ends and another begins.
B- marks entity boundaries, I- continues them. Now we know there are two distinct people.
Token Classification Architecture
Modern NER uses a BERT-style encoder + linear classification head. Each token gets its own prediction based on contextual embeddings.
BERT-NER Architecture
Uses only [CLS] embedding for whole-document label.
|
v
class_label
Uses ALL token embeddings, one label per token.
| | |
v v v
B-PER I-PER O
Subword Alignment Challenge
BERT tokenizes words into subwords. "SpaceX" becomes ["Space", "##X"]. But we only have one label for "SpaceX". Solutions:
NER Models Comparison
From production-ready spaCy to zero-shot GLiNER. Choose based on your constraints.
| Model | Type | Speed | Accuracy | Notes |
|---|---|---|---|---|
| spaCy (en_core_web_trf) | Pre-trained | ~10K tokens/sec | F1: 89% | Production-ready, 18 entity types, GPU optional |
| BERT-NER | Fine-tuned | ~5K tokens/sec | F1: 92% | Fine-tune on CoNLL-2003, requires training |
| RoBERTa-NER | Fine-tuned | ~4K tokens/sec | F1: 93% | Better than BERT, larger model |
| GLiNER | Zero-Shot | ~2K tokens/sec | F1: 85-90% | Any entity type without training |
| GPT-4 / Claude | LLM | ~100 tokens/sec | F1: 80-95% | Most flexible, expensive at scale |
- - You need production-ready, tested solution
- - Standard entity types (PER, ORG, LOC, etc.)
- - CPU inference is acceptable
- - You have labeled training data
- - Domain-specific entity types
- - Maximum accuracy is critical
- - No labeled data available
- - Entity types change frequently
- - Quick prototyping / exploration
- - Complex, nuanced entity definitions
- - Need explanations with extractions
- - Low volume, high-value documents
GLiNER can extract any entity type you define without training. Just describe what you want: "pharmaceutical_company", "medical_condition", "dosage". It uses a generative approach to match entity spans to your labels. Great for prototyping before investing in fine-tuning.
Benchmarks and Evaluation
Standard datasets for evaluating NER systems. F1 score is the primary metric.
| Dataset | Language | Entities | Size | SOTA |
|---|---|---|---|---|
| CoNLL-2003 | English | 4 types | 22K sent. | 94.6% (LUKE) |
| OntoNotes 5.0 | English | 18 types | 77K sent. | 92.4% (LUKE) |
| WNUT-17 | English | 6 types | 5K sent. | 60.4% (emerging entities) |
| MultiNERD | 10 languages | 15 types | 164K sent. | varies by lang |
Understanding F1 Score for NER
NER evaluation is strict: entity must match both the span (exact boundaries) AND the type. Partial matches are typically counted as wrong.
CoNLL-2003: The Standard Benchmark
Reuters news articles with 4 entity types: PER, ORG, LOC, MISC. Most papers report results here. Current SOTA is ~94.6% F1.
Code Examples
Get started with NER in Python. From simple spaCy to zero-shot GLiNER.
import spacy
# Load transformer-based model
nlp = spacy.load("en_core_web_trf")
text = "Apple Inc. was founded by Steve Jobs in Cupertino, California in 1976."
# Process text
doc = nlp(text)
# Extract entities
for ent in doc.ents:
print(f"{ent.text:20} {ent.label_:10} {ent.start_char}-{ent.end_char}")
# Output:
# Apple Inc. ORG 0-10
# Steve Jobs PERSON 27-37
# Cupertino GPE 41-50
# California GPE 52-62
# 1976 DATE 66-70Quick Reference
- - spaCy (en_core_web_trf)
- - Fine-tuned BERT/RoBERTa
- - F1 > 90% on standard types
- - GLiNER (zero-shot)
- - LLM with structured output
- - Define any entity type
- - F1 Score (precision + recall)
- - Exact span matching
- - Per-entity-type breakdown
Use Cases
- ✓Information extraction
- ✓Knowledge graph building
- ✓Resume parsing
- ✓News analysis
- ✓Legal document processing
Architectural Patterns
Sequence Labeling
Tag each token with entity type (BIO scheme).
- +Fast
- +Well-understood
- +Good for standard entities
- -Fixed entity types
- -Needs labeled data
Span Extraction
Predict start/end positions of entity spans.
- +Handles nested entities
- +More flexible
- -Slower
- -More complex training
LLM-Based NER
Use LLMs to extract entities via prompting.
- +Zero-shot for new entity types
- +Context-aware
- -Expensive
- -Inconsistent formats
Implementations
API Services
AWS Comprehend
AWSManaged NER. Custom entity training available.
Open Source
Benchmarks
Quick Facts
- Input
- Text
- Output
- Structured Data
- Implementations
- 4 open source, 1 API
- Patterns
- 3 approaches