Codesota - NLP - Named Entity RecognitionCoNLL-2003 - OntoNotes - MultiNERDTask page

00 - Named Entity Recognition

Named entity recognition task router

NER extracts entity spans and types from text. It is useful only when spans, offsets, and stable schemas matter. Use fine-tuned token classifiers for scale, GLiNER-style models for custom labels, and LLMs when the schema changes often.

Benchmark

CoNLL-2003 - OntoNotes - MultiNERD

Current pick

Fine-tuned DeBERTa v3

01 - Explainer

What this task measures.

Named entity recognition finds text spans and assigns schema labels such as PERSON, ORG, location, product, statute, disease, or ticker. The hard part is not only spotting names; it is returning stable offsets, resolving boundary ambiguity, and matching the entity taxonomy your downstream system expects.

02 - Benchmarks

Use a benchmark ladder.

One leaderboard rarely captures the task. Use the canonical benchmark for lineage, then add harder or more domain-specific checks before choosing a model.

Benchmark	Role	Metric	Caveat
CoNLL-2003	Classic English NER	Entity-level F1	Narrow four-label newswire schema; useful for lineage, weak for modern entity extraction.
OntoNotes 5.0	Broader English schema	Span F1	More entity types, but still not enough for legal, medical, finance, or product catalogs.
MultiNERD	Multilingual NER	Macro / micro F1	Better language coverage; still requires local checks for aliases and domain terms.
Local gold set	Production gate	Span F1 + schema error rate	The only reliable way to measure custom labels and costly false positives.

03 - Evaluation

What to compare.

The public benchmark is a shortlist signal. Production choice still depends on latency, cost, domain drift, and how expensive mistakes are.

Axis	Value	Why it matters
Canonical benchmark	CoNLL-2003	Classic English PERSON, ORG, LOC, MISC benchmark; useful but narrow.
Broader schema	OntoNotes / MultiNERD	More entity types and multilingual coverage for modern extraction.
Production metric	Span F1 + schema error rate	Wrong boundaries and wrong labels break downstream knowledge graphs.
Failure mode	Domain entities missed	Company tickers, statutes, products, drugs, and aliases need local examples.

04 - Routing

Pick by task shape.

Known schema at scale

Fine-tuned encoder

Fast, cheap, stable offsets, and strong F1 when labels are fixed.

Custom labels quickly

GLiNER / span model

Extract new entity types without a full annotation project.

Messy document extraction

LLM structured output

Useful when labels are semantic and context-heavy, but validate spans.

Regulated pipeline

Audited token classifier

Offsets, deterministic behavior, and testable schemas matter.

05 - Related

Need implementation details?

Open the lower-level explainer for architecture, code examples, and implementation options.

Open NER explainer ->