Knowledge Graph Completion
Predicting missing links in knowledge graphs.
Knowledge graph completion predicts missing links in knowledge graphs — inferring that (Einstein, field_of_work, Physics) from known facts. Embedding methods (TransE, RotatE, ComplEx) and GNN-based approaches dominate, with LLMs increasingly used for text-enhanced KG completion on open-domain knowledge graphs.
History
TransE models relations as translations in embedding space: h + r ≈ t
ComplEx uses complex-valued embeddings to handle symmetric and antisymmetric relations
ConvE applies 2D convolutions over reshaped entity-relation embeddings
RotatE models relations as rotations in complex space, with strong theoretical properties
CompGCN applies GNNs to knowledge graphs, jointly embedding entities and relations
NodePiece reduces entity embedding memory by composing from relation anchors
KG-BERT and StAR use pretrained language models for text-enhanced KG completion
ChatGPT-based KG completion shows LLMs have implicit knowledge graph capabilities
SimKGC and other contrastive methods bridge embedding and text-based approaches
LLM-KG hybrid systems combine parametric knowledge with structured graph reasoning
How Knowledge Graph Completion Works
Entity and Relation Embedding
Each entity and relation in the KG is assigned a learned vector in a continuous embedding space.
Scoring Function
A scoring function evaluates the plausibility of a triple (h, r, t): TransE uses ‖h+r−t‖, RotatE uses ‖h∘r−t‖ in complex space.
Training
The model is trained to score true triples higher than corrupted (negative-sampled) triples using margin-based or cross-entropy losses.
Prediction
For a query (Einstein, field_of_work, ?), all candidate tail entities are scored, and top-K predictions are returned.
Ranking Evaluation
Performance is measured by Mean Reciprocal Rank (MRR) and Hits@K — how well the model ranks correct completions.
Current Landscape
Knowledge graph completion in 2025 has two active paradigms: (1) geometric embedding methods (TransE, RotatE, ComplEx) that scale well but ignore entity text, and (2) text-enhanced methods (KG-BERT, SimKGC) that leverage pretrained language models for richer entity representations. LLMs are disrupting the field — they contain implicit knowledge graphs in their parameters and can predict missing links zero-shot. But structured KG reasoning still has advantages for multi-hop inference and verifiable reasoning chains. The practical impact is in search, recommendations, and biomedical knowledge bases (drug-gene interactions, disease-symptom relations).
Key Challenges
Scalability — real-world KGs have millions of entities, making full scoring expensive
Relation types — different relation patterns (symmetric, antisymmetric, transitive, compositional) require different modeling
Long-tail entities — entities with few connections have poor embeddings due to data sparsity
Temporal dynamics — KGs change over time; facts have validity periods that most methods ignore
Open-world assumption — real KGs are incomplete, and the absence of a link doesn't mean it's false
Quick Recommendations
Standard KG completion
RotatE / ComplEx
Best balance of expressiveness, scalability, and reproducibility
Text-enhanced completion
SimKGC / KG-BERT
Leverages entity descriptions and relation text for better embedding
Large-scale KGs
NodePiece + RotatE
Memory-efficient entity representation for million-entity KGs
Open-domain KG completion
LLM + structured KG query
LLMs can reason about missing links using world knowledge
What's Next
The frontier is merging knowledge graphs with LLMs — using KGs to ground LLM reasoning in verified facts while using LLMs to fill KG gaps. Temporal knowledge graph completion (handling time-varying facts) and few-shot relation learning (completing new relation types with few examples) are key research directions.
Benchmarks & SOTA
No datasets indexed for this task yet.
Contribute on GitHubRelated Tasks
Something wrong or missing?
Help keep Knowledge Graph Completion benchmarks accurate. Report outdated results, missing benchmarks, or errors.