Relation Extraction
Extracting relationships between entities from text.
Relation extraction identifies semantic relationships between entities in text — (Barack Obama, born_in, Honolulu) from 'Obama was born in Honolulu.' Pretrained language models (BERT, RoBERTa) established strong baselines, while LLMs now enable zero-shot relation extraction that generalizes to unseen relation types.
History
Distant supervision (Mintz et al.) automatically labels relation extraction data from knowledge bases
Piecewise CNN applies convolutional networks to relation extraction
SpanBERT and BERT-based models achieve SOTA on TACRED and SemEval relation extraction
DocRED introduces document-level relation extraction requiring cross-sentence reasoning
LUKE (Language Understanding with Knowledge-based Embeddings) achieves SOTA on multiple RE benchmarks
Prompt-based relation extraction shows competitive performance with few labeled examples
UniRE unifies named entity recognition and relation extraction in one model
GPT-4 demonstrates strong zero-shot relation extraction from unstructured text
LLM-based RE pipelines deployed in biomedical and financial information extraction
Multimodal relation extraction — identifying relationships from text, tables, and images
How Relation Extraction Works
Entity Recognition
Named entities are identified in the text — people, organizations, locations, dates, etc.
Candidate Pair Generation
All entity pairs within a sentence or document are enumerated as candidate relation instances.
Context Encoding
The sentence or passage containing the entity pair is encoded, with entity position markers ([E1], [E2]) highlighting the target entities.
Relation Classification
A classifier predicts the relation type (or 'no relation') for each entity pair based on the contextual encoding.
Post-Processing
Extracted relations are deduplicated, confidence-filtered, and optionally linked to a knowledge base schema.
Current Landscape
Relation extraction in 2025 operates in two modes: (1) supervised extraction with fine-tuned BERT-family models for domains with labeled data (news, biomedical), achieving 70-80% F1 on standard benchmarks, and (2) zero-shot extraction with LLMs for open-domain and novel relation types. The latter is increasingly preferred for production because it doesn't require schema design or labeled training data. Document-level RE remains challenging, as relations spanning multiple sentences require coreference and reasoning. The practical impact is in biomedical knowledge extraction (drug-gene interactions), financial analysis (corporate relationships), and knowledge base construction.
Key Challenges
Long-tail relations — rare relation types have few training examples but are often the most valuable to extract
Document-level RE — relations expressed across multiple sentences require coreference resolution and multi-hop reasoning
Noise in distant supervision — automatic labeling from KBs introduces significant label noise
Overlapping relations — multiple relations between the same entity pair or overlapping entity spans complicate extraction
Domain transfer — models trained on news text struggle with biomedical, legal, or financial relation extraction
Quick Recommendations
Standard relation extraction
RoBERTa/DeBERTa fine-tuned on TACRED/DocRED
Best supervised performance on established benchmarks
Zero-shot / open relation extraction
GPT-4 / Claude 3.5 with structured prompting
Extracts relations without predefined relation schemas
Biomedical RE
PubMedBERT fine-tuned on ChemProt/DDI
Domain-specific pretraining captures biomedical relation patterns
Document-level RE
ATLOP / DocuNet on DocRED
State-of-the-art methods designed for cross-sentence relation extraction
What's Next
The frontier is end-to-end knowledge graph construction from unstructured text — combining entity recognition, relation extraction, and entity linking in a unified system. Expect LLM-based extraction pipelines that handle multi-hop relations across documents, and active learning approaches that efficiently acquire labels for the most valuable relation types.
Benchmarks & SOTA
No datasets indexed for this task yet.
Contribute on GitHubRelated Tasks
Something wrong or missing?
Help keep Relation Extraction benchmarks accurate. Report outdated results, missing benchmarks, or errors.