Home/Building Blocks/Named Entity Recognition

Text→Structured Data

Named Entity Recognition

Extract named entities (people, organizations, locations, dates) from text. Key for information extraction and knowledge graphs.

How Named Entity Recognition Works

A technical deep-dive into Named Entity Recognition. From token classification to zero-shot approaches with GLiNER and LLMs.

1. What is NER 2. BIO Tagging 3. Architecture 4. Models 5. Benchmarks 6. Code

What is Named Entity Recognition?

NER identifies and classifies named entities in text into predefined categories. Think of it as structured information extraction from unstructured text.

NER in Action

Barack Obama was the 444th Presidentof the UUnited Statesfrom 22009to 22017

PERSONORDINALGPEDATE

Common Entity Types

PERSON

Names of people

"Albert Einstein"

ORG

Organizations, companies, agencies

"OpenAI"

LOC

Locations, countries, cities

"New York"

DATE

Dates and time expressions

"January 2024"

MONEY

Monetary values

"$1.5 billion"

GPE

Geopolitical entities

"United States"

PRODUCT

Products, objects, vehicles

"iPhone 15"

EVENT

Named events

"World War II"

Information Extraction

Extract structured data from documents, emails, and news articles. Build knowledge graphs automatically.

Document Understanding

Identify key entities in contracts, legal documents, and medical records for downstream processing.

Search Enhancement

Power entity-based search, auto-linking, and semantic navigation in content management systems.

BIO/IOB Tagging Scheme

NER is formulated as token classification. Each token gets a tag. BIO tagging handles multi-word entities by marking the Beginning,Inside, and Outside of entities.

BIO Tags Explained

Beginning

First token of an entity

Inside

Continuation of entity

Outside

Not an entity

Example: "Elon Musk founded SpaceX in California"

Token	Elon	Musk	founded	SpaceX	in	California
BIO Tag	B-PERSON	I-PERSON	O	B-ORG	O	B-LOC

Why Not Just Label Tokens?

Problem: Adjacent Entities

"John Smith met Jane Doe" - without BIO, we cannot tell where one PERSON ends and another begins.

John(PER) Smith(PER) met(O) Jane(PER) Doe(PER) // ambiguous!

Solution: BIO Tagging

B- marks entity boundaries, I- continues them. Now we know there are two distinct people.

John(B-PER) Smith(I-PER) met(O) Jane(B-PER) Doe(I-PER) // clear!

Variants: BIOES (adds Single and End tags), BILOU (similar with Last/Unit). BIO is most common due to simplicity. More tags = slightly better accuracy but harder to learn.

Token Classification Architecture

Modern NER uses a BERT-style encoder + linear classification head. Each token gets its own prediction based on contextual embeddings.

BERT-NER Architecture

Input Text

Raw text to analyze

"Elon Musk founded SpaceX

Tokenization

Split into subwords

["Elon", "Musk", "founded

BERT Encoder

Contextual embeddings

[768-dim] x 5 tokens

Linear Layer

Per-token classification

[num_labels] x 5 tokens

Softmax

Tag probabilities

B-PER, I-PER, O, B-ORG, I

Text Classification (CLS token)

Uses only [CLS] embedding for whole-document label.

[CLS] token1 token2 ... [SEP]
|
v
class_label

Token Classification (NER)

Uses ALL token embeddings, one label per token.

[CLS] token1 token2 token3 [SEP]
        |     |     |
        v     v     v
      B-PER  I-PER   O

Subword Alignment Challenge

BERT tokenizes words into subwords. "SpaceX" becomes ["Space", "##X"]. But we only have one label for "SpaceX". Solutions:

First Subword Only

Label first subword, ignore rest (-100 loss mask)

Copy to All

Copy label to all subwords of the word

Post-Process

Aggregate predictions during inference

NER Models Comparison

From production-ready spaCy to zero-shot GLiNER. Choose based on your constraints.

Model	Type	Speed	Accuracy	Notes
spaCy (en_core_web_trf)	Pre-trained	~10K tokens/sec	F1: 89%	Production-ready, 18 entity types, GPU optional
BERT-NER	Fine-tuned	~5K tokens/sec	F1: 92%	Fine-tune on CoNLL-2003, requires training
RoBERTa-NER	Fine-tuned	~4K tokens/sec	F1: 93%	Better than BERT, larger model
GLiNER	Zero-Shot	~2K tokens/sec	F1: 85-90%	Any entity type without training
GPT-4 / Claude	LLM	~100 tokens/sec	F1: 80-95%	Most flexible, expensive at scale

Use spaCy when:

- You need production-ready, tested solution
- Standard entity types (PER, ORG, LOC, etc.)
- CPU inference is acceptable

Use Fine-tuned BERT when:

- You have labeled training data
- Domain-specific entity types
- Maximum accuracy is critical

Use GLiNER when:

- No labeled data available
- Entity types change frequently
- Quick prototyping / exploration

Use LLMs when:

- Complex, nuanced entity definitions
- Need explanations with extractions
- Low volume, high-value documents

GLiNER: Zero-Shot NER

GLiNER can extract any entity type you define without training. Just describe what you want: "pharmaceutical_company", "medical_condition", "dosage". It uses a generative approach to match entity spans to your labels. Great for prototyping before investing in fine-tuning.

Benchmarks and Evaluation

Standard datasets for evaluating NER systems. F1 score is the primary metric.

Dataset	Language	Entities	Size	SOTA
CoNLL-2003	English	4 types	22K sent.	94.6% (LUKE)
OntoNotes 5.0	English	18 types	77K sent.	92.4% (LUKE)
WNUT-17	English	6 types	5K sent.	60.4% (emerging entities)
MultiNERD	10 languages	15 types	164K sent.	varies by lang

Understanding F1 Score for NER

Precision

Of entities predicted, how many are correct?

TP / (TP + FP)

Recall

Of actual entities, how many did we find?

TP / (TP + FN)

F1 Score

Harmonic mean of precision and recall

2 * P * R / (P + R)

NER evaluation is strict: entity must match both the span (exact boundaries) AND the type. Partial matches are typically counted as wrong.

CoNLL-2003: The Standard Benchmark

Reuters news articles with 4 entity types: PER, ORG, LOC, MISC. Most papers report results here. Current SOTA is ~94.6% F1.

PER: Person namesORG: OrganizationsLOC: LocationsMISC: Miscellaneous

Code Examples

Get started with NER in Python. From simple spaCy to zero-shot GLiNER.

spaCypip install spacy && python -m spacy download en_core_web_trf

Recommended Start

import spacy

# Load transformer-based model
nlp = spacy.load("en_core_web_trf")

text = "Apple Inc. was founded by Steve Jobs in Cupertino, California in 1976."

# Process text
doc = nlp(text)

# Extract entities
for ent in doc.ents:
    print(f"{ent.text:20} {ent.label_:10} {ent.start_char}-{ent.end_char}")

# Output:
# Apple Inc.           ORG        0-10
# Steve Jobs           PERSON     27-37
# Cupertino            GPE        41-50
# California           GPE        52-62
# 1976                 DATE       66-70

Quick Reference

For Production

- spaCy (en_core_web_trf)
- Fine-tuned BERT/RoBERTa
- F1 > 90% on standard types

For Prototyping

- GLiNER (zero-shot)
- LLM with structured output
- Define any entity type

Key Metrics

- F1 Score (precision + recall)
- Exact span matching
- Per-entity-type breakdown

Use Cases

✓Information extraction
✓Knowledge graph building
✓Resume parsing
✓News analysis
✓Legal document processing

Architectural Patterns

Sequence Labeling

Tag each token with entity type (BIO scheme).

Pros:

+Fast
+Well-understood
+Good for standard entities

Cons:

-Fixed entity types
-Needs labeled data

Span Extraction

Predict start/end positions of entity spans.

Pros:

+Handles nested entities
+More flexible

Cons:

-Slower
-More complex training

LLM-Based NER

Use LLMs to extract entities via prompting.

Pros:

+Zero-shot for new entity types
+Context-aware

Cons:

-Expensive
-Inconsistent formats

Implementations

API Services

AWS Comprehend

AWS

API

Managed NER. Custom entity training available.

Open Source

spaCy

MIT

Open Source

Production NLP. Fast, accurate, many languages.

GitHub

GLiNER

Apache 2.0

Open Source

Zero-shot NER. Generalize to any entity type.

HuggingFace

BERT-NER

Apache 2.0

Open Source

Classic BERT-based NER. Good accuracy.

HuggingFace

Flair

MIT

Open Source

State-of-the-art NER. Stacked embeddings.

GitHub

Benchmarks

CoNLL-2003 →OntoNotes 5.0 →

Quick Facts

Input: Text
Output: Structured Data
Implementations: 4 open source, 1 API
Patterns: 3 approaches

Related Blocks

Text Classification

Text → Structured Data

Document Extraction

Document → Structured Data

Have benchmark data?

Help us track the state of the art for named entity recognition.

Submit Results