Knowledge Base

Relation Extraction

Extracting relationships between entities from text.

1 datasets3 resultsView full task mapping →

Relation extraction identifies semantic relationships between entities in text — (Barack Obama, born_in, Honolulu) from 'Obama was born in Honolulu.' Pretrained language models (BERT, RoBERTa) established strong baselines, while LLMs now enable zero-shot relation extraction that generalizes to unseen relation types.

History

2010

Distant supervision (Mintz et al.) automatically labels relation extraction data from knowledge bases

2014

Piecewise CNN applies convolutional networks to relation extraction

2019

SpanBERT and BERT-based models achieve SOTA on TACRED and SemEval relation extraction

2019

DocRED introduces document-level relation extraction requiring cross-sentence reasoning

2020

LUKE (Language Understanding with Knowledge-based Embeddings) achieves SOTA on multiple RE benchmarks

2021

Prompt-based relation extraction shows competitive performance with few labeled examples

2022

UniRE unifies named entity recognition and relation extraction in one model

2023

GPT-4 demonstrates strong zero-shot relation extraction from unstructured text

2024

LLM-based RE pipelines deployed in biomedical and financial information extraction

2025

Multimodal relation extraction — identifying relationships from text, tables, and images

How Relation Extraction Works

Entity Recognition

Named entities are identified in the text — people, organizations, locations, dates, etc.

Candidate Pair Generation

All entity pairs within a sentence or document are enumerated as candidate relation instances.

Context Encoding

The sentence or passage containing the entity pair is encoded, with entity position markers ([E1], [E2]) highlighting the target entities.

Relation Classification

A classifier predicts the relation type (or 'no relation') for each entity pair based on the contextual encoding.

Post-Processing

Extracted relations are deduplicated, confidence-filtered, and optionally linked to a knowledge base schema.

Current Landscape

Relation extraction in 2025 operates in two modes: (1) supervised extraction with fine-tuned BERT-family models for domains with labeled data (news, biomedical), achieving 70-80% F1 on standard benchmarks, and (2) zero-shot extraction with LLMs for open-domain and novel relation types. The latter is increasingly preferred for production because it doesn't require schema design or labeled training data. Document-level RE remains challenging, as relations spanning multiple sentences require coreference and reasoning. The practical impact is in biomedical knowledge extraction (drug-gene interactions), financial analysis (corporate relationships), and knowledge base construction.

Key Challenges

Long-tail relations — rare relation types have few training examples but are often the most valuable to extract

Document-level RE — relations expressed across multiple sentences require coreference resolution and multi-hop reasoning

Noise in distant supervision — automatic labeling from KBs introduces significant label noise

Overlapping relations — multiple relations between the same entity pair or overlapping entity spans complicate extraction

Domain transfer — models trained on news text struggle with biomedical, legal, or financial relation extraction

Quick Recommendations

Standard relation extraction

RoBERTa/DeBERTa fine-tuned on TACRED/DocRED

Best supervised performance on established benchmarks

Zero-shot / open relation extraction

GPT-4 / Claude 3.5 with structured prompting

Extracts relations without predefined relation schemas

Biomedical RE

PubMedBERT fine-tuned on ChemProt/DDI

Domain-specific pretraining captures biomedical relation patterns

Document-level RE

ATLOP / DocuNet on DocRED

State-of-the-art methods designed for cross-sentence relation extraction

What's Next

The frontier is end-to-end knowledge graph construction from unstructured text — combining entity recognition, relation extraction, and entity linking in a unified system. Expect LLM-based extraction pipelines that handle multi-hop relations across documents, and active learning approaches that efficiently acquire labels for the most valuable relation types.

Benchmarks & SOTA

TACRED

TAC Relation Extraction Dataset

20173 results

Large supervised relation extraction dataset from TAC KBP, 106K examples covering 41 relation types. Standard sentence-level RE benchmark evaluated by Micro-F1 (excluding no_relation).

State of the Art

LUKE

Studio Ousia

72.7

Related Tasks

Entity Linking

Linking mentions to knowledge base entities.

Knowledge Graph Completion

Predicting missing links in knowledge graphs.

Get notified when these results update

New models drop weekly. We track them so you don't have to.

Something wrong or missing?

Help keep Relation Extraction benchmarks accurate. Report outdated results, missing benchmarks, or errors.

Back to Knowledge Base