Codesota · Benchmark · CoNLL-2003Home/Leaderboards/Language & Knowledge/Named Entity Recognition/CoNLL-2003
Unknown

CoNLL-2003.

Reuters news stories annotated with 4 entity types: PER, ORG, LOC, MISC. The standard NER benchmark.

Paper Leaderboard
§ 01 · SOTA history

Year over year.

§ 02 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

f1

F1 is the reported evaluation metric for CoNLL-2003. Codesota tracks published model scores on this metric so readers can compare state-of-the-art results across sources and model families.

Higher is better

Trust tiers for f1verifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01GLiNER-multitask
GLiNER-multitask fine-tuned. Source: arxiv:2406.18568 Table 1.
verified93.82024Paper ↗Looks wrong?
02DeBERTa-v3-large
DeBERTa-v3-Large fine-tuned.
verified93.42021Paper ↗Source ↗Looks wrong?
03GPT-4o
GPT-4o few-shot NER.
verified91.72023Paper ↗Source ↗Looks wrong?
04Llama 3.1 405B
Llama 3.1 405B Instruct few-shot NER. Source: Llama 3 paper.
verified90.62024Paper ↗Looks wrong?
05Qwen2 72B
Qwen2 72B Instruct. Source: Qwen2 technical report.
verified90.22024Paper ↗Looks wrong?
06Llama 3 70B
Llama 3 70B Instruct. Source: Llama 3 paper.
verified89.32024Paper ↗Looks wrong?
07Mistral 7B
Mistral 7B few-shot NER. Source: Mistral 7B paper.
verified83.52023Paper ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Named Entity Recognition
CoNLL-2003 Leaderboard | CodeSOTA | CodeSOTA