Entity Linking
Linking mentions to knowledge base entities.
Entity linking maps mentions of entities in text to their corresponding entries in a knowledge base (Wikipedia, Wikidata). BLINK and GENRE established neural entity linking, while LLMs now enable zero-shot entity resolution that handles ambiguity and out-of-knowledge-base entities with conversational context.
History
TAC-KBP shared tasks establish entity linking evaluation methodology
Neural entity linking with entity embeddings begins to outperform feature-based methods
BLINK (Facebook) uses bi-encoder architecture for scalable entity retrieval and linking
GENRE (Facebook) generates entity names autoregressively — constrained decoding over entity trie
De Cao et al. show autoregressive entity linking handles ambiguous and out-of-KB entities
EntQA frames entity linking as question answering for better context understanding
ReFinED (Amazon) provides an efficient, production-grade entity linking system
LLM-based entity linking — GPT-4/Claude resolve entity mentions with conversational context
Multimodal entity linking handles entities mentioned in images, tables, and video alongside text
How Entity Linking Works
Mention Detection
Entity mentions are identified in the text — proper nouns, acronyms, and referential expressions that could correspond to KB entities.
Candidate Generation
For each mention, a set of candidate KB entities is retrieved — using alias tables, TF-IDF, or dense retrieval (bi-encoder).
Context Encoding
The mention and its surrounding context are encoded into a dense vector that captures the meaning in this specific usage.
Entity Ranking
Candidates are ranked by similarity between the context encoding and entity representations (descriptions, type information).
NIL Detection
If no candidate exceeds a confidence threshold, the mention is classified as NIL — referring to an entity not in the knowledge base.
Current Landscape
Entity linking in 2025 is a mature NLP component used in search engines, knowledge graphs, and information extraction pipelines. The bi-encoder + cross-encoder paradigm (BLINK) provides the speed-accuracy tradeoff for production, while autoregressive methods (GENRE) handle harder cases. LLMs are increasingly used for entity linking in applications where accuracy on ambiguous mentions matters more than throughput. The field is shifting toward more challenging settings: low-resource languages, domain-specific KBs (biomedical, legal), and multimodal entity resolution.
Key Challenges
Ambiguity — 'Washington' could be a person, state, city, university, or sports team
Long-tail entities — rare entities have few mentions in training data and sparse KB descriptions
Cross-lingual linking — mentions in one language must link to entities in a multilingual KB
Knowledge base evolution — new entities appear constantly; the KB is never complete
NIL clustering — grouping mentions of the same novel entity that isn't in the KB
Quick Recommendations
Production entity linking
ReFinED (Amazon)
Fast, accurate, and maintained — best production-ready system
Research baseline
BLINK (bi-encoder + cross-encoder)
Well-documented, reproducible, and widely compared against
Zero-shot / novel entities
GPT-4 / Claude with KB context
LLMs handle ambiguity and out-of-KB entities through reasoning
Multilingual linking
mGENRE
Multilingual autoregressive entity linking across 100+ languages
What's Next
The frontier is dynamic entity linking — handling knowledge bases that change in real-time (news events, new products, emerging entities). Expect integration with retrieval-augmented generation (RAG) systems, where entity linking grounds LLM outputs in verified knowledge, and multimodal entity linking that resolves entities across text, images, and video.
Benchmarks & SOTA
No datasets indexed for this task yet.
Contribute on GitHubRelated Tasks
Something wrong or missing?
Help keep Entity Linking benchmarks accurate. Report outdated results, missing benchmarks, or errors.