Fine-tuned encoder
Fast, cheap, stable offsets, and strong F1 when labels are fixed.
NER extracts entity spans and types from text. It is useful only when spans, offsets, and stable schemas matter. Use fine-tuned token classifiers for scale, GLiNER-style models for custom labels, and LLMs when the schema changes often.
Named entity recognition finds text spans and assigns schema labels such as PERSON, ORG, location, product, statute, disease, or ticker. The hard part is not only spotting names; it is returning stable offsets, resolving boundary ambiguity, and matching the entity taxonomy your downstream system expects.
One leaderboard rarely captures the task. Use the canonical benchmark for lineage, then add harder or more domain-specific checks before choosing a model.
| Benchmark | Role | Metric | Caveat |
|---|---|---|---|
| CoNLL-2003 | Classic English NER | Entity-level F1 | Narrow four-label newswire schema; useful for lineage, weak for modern entity extraction. |
| OntoNotes 5.0 | Broader English schema | Span F1 | More entity types, but still not enough for legal, medical, finance, or product catalogs. |
| MultiNERD | Multilingual NER | Macro / micro F1 | Better language coverage; still requires local checks for aliases and domain terms. |
| Local gold set | Production gate | Span F1 + schema error rate | The only reliable way to measure custom labels and costly false positives. |
The public benchmark is a shortlist signal. Production choice still depends on latency, cost, domain drift, and how expensive mistakes are.
| Axis | Value | Why it matters |
|---|---|---|
| Canonical benchmark | CoNLL-2003 | Classic English PERSON, ORG, LOC, MISC benchmark; useful but narrow. |
| Broader schema | OntoNotes / MultiNERD | More entity types and multilingual coverage for modern extraction. |
| Production metric | Span F1 + schema error rate | Wrong boundaries and wrong labels break downstream knowledge graphs. |
| Failure mode | Domain entities missed | Company tickers, statutes, products, drugs, and aliases need local examples. |
Fast, cheap, stable offsets, and strong F1 when labels are fixed.
Extract new entity types without a full annotation project.
Useful when labels are semantic and context-heavy, but validate spans.
Offsets, deterministic behavior, and testable schemas matter.
Open the lower-level explainer for architecture, code examples, and implementation options.
Open NER explainer ->