Graphs

Molecular Property Prediction

Molecular property prediction — estimating toxicity, solubility, binding affinity, or other properties from molecular structure — is the workhorse task of AI-driven drug discovery. GNNs operate on molecular graphs while transformer approaches (ChemBERTa, Uni-Mol) use SMILES strings or 3D coordinates. MoleculeNet (2018) and the Therapeutic Data Commons (TDC) provide standardized benchmarks, but the real bottleneck is distribution shift: models trained on known chemical space struggle with novel scaffolds, and the gap between leaderboard accuracy and actual wet-lab utility remains the field's central challenge.

1 datasets3 resultsView full task mapping →

Molecular property prediction uses graph neural networks and 3D-aware architectures to predict chemical properties (toxicity, solubility, binding affinity) from molecular structure. This is a key enabling technology for drug discovery, with 3D equivariant networks (GemNet, EquiformerV2) achieving state-of-the-art results.

History

2017

SchNet introduces continuous-filter convolutions for learning on 3D molecular geometries

2018

MPNN (Message Passing Neural Network) framework unifies GNN approaches for molecules

2019

DimeNet incorporates bond angles (directional information) into molecular graph learning

2020

OGB-LSC molecular benchmarks (PCQM4M) provide million-molecule evaluation at scale

2021

SphereNet and PaiNN add full 3D geometric information (distances, angles, dihedrals)

2022

Equivariant neural networks (EGNN, SE(3)-Transformers) respect physical symmetries by design

2022

GemNet achieves state-of-the-art on OC20 catalyst property prediction

2023

EquiformerV2 combines equivariant transformers with higher-order tensor features

2024

Molecular foundation models (Uni-Mol, MoleculeSTM) pretrained on millions of molecules

2025

AlphaFold3 and molecular generation models push toward end-to-end drug design

How Molecular Property Prediction Works

Molecular Representation

A molecule is represented as a graph (atoms = nodes, bonds = edges) with 3D coordinates. Features include atom type, charge, bond order, and geometric distances/angles.

Geometric Feature Extraction

3D-aware layers compute pairwise distances, bond angles, and dihedral angles as input features — crucial for energy and property prediction.

Equivariant Message Passing

Messages between atoms respect physical symmetries (rotation, translation, reflection invariance) through equivariant operations on spherical harmonics or vector features.

Pooling and Prediction

Atom-level features are aggregated into a molecule-level representation via sum/attention pooling, then mapped to the target property through a regression head.

Multi-Task Learning

Models are often trained to predict multiple properties simultaneously, improving generalization through shared representations.

Current Landscape

Molecular property prediction in 2025 is dominated by 3D-aware, equivariant architectures that respect the physical symmetries of molecules. The field has advanced from 2D graph methods (MPNN, GIN) to full 3D geometric networks (SchNet → DimeNet → GemNet → EquiformerV2). Foundation models pretrained on large molecular databases are emerging as the new paradigm, analogous to BERT for text. The practical impact is in drug discovery, where these models accelerate virtual screening by orders of magnitude compared to physics-based simulations.

Key Challenges

Data scarcity — experimental property measurements are expensive; many targets have <1K labeled molecules

Distribution shift — training molecules differ significantly from novel drug candidates being screened

Conformer sensitivity — many properties depend on 3D shape, but molecules exist as ensembles of conformations

Activity cliffs — structurally similar molecules can have vastly different properties, challenging smooth learned functions

Evaluation reliability — random splits overestimate real-world performance; scaffold splits are more realistic but much harder

Quick Recommendations

3D property prediction

EquiformerV2 / GemNet-OC

State-of-the-art on OC20 and molecular energy prediction tasks

2D molecular properties

GIN + virtual node / Graphormer

Strong baseline when 3D coordinates are unavailable

Drug discovery screening

Uni-Mol (pretrained) + fine-tuning

Molecular foundation model with broad applicability

Rapid prototyping

RDKit fingerprints + XGBoost

Surprisingly competitive baseline that requires no GPU training

What's Next

The frontier is generative molecular design — not just predicting properties but designing molecules with desired properties. Diffusion models for 3D molecular generation, combined with property predictors for guidance, are creating end-to-end drug design pipelines. Expect tighter integration with protein structure prediction (AlphaFold) for target-aware molecular design.

Benchmarks & SOTA

OGB ogbg-molhiv

Open Graph Benchmark - ogbg-molhiv

20203 results

Molecular property prediction: predict whether a molecule inhibits HIV replication. 41K graphs from MoleculeNet. Binary classification, scaffold split, evaluated by ROC-AUC.

State of the Art

DGN

Research

79.7

roc_auc

Related Tasks

Node Classification

Node classification — assigning labels to vertices in a graph using both node features and neighborhood structure — is the flagship task for Graph Neural Networks. GCN (Kipf & Welling, 2017) established the Cora/Citeseer/PubMed benchmark trinity, but these datasets are tiny by modern standards and results have saturated well above 85% accuracy. The field has moved toward large-scale heterogeneous graphs (ogbn-arxiv, ogbn-products from OGB) and the unsettled debate over whether simple MLPs with neighborhood features can match GNNs, as shown by SIGN and SGC ablations.

Graph Classification

Graph classification — predicting a label for an entire graph, not individual nodes — matters for molecular screening, social network analysis, and program verification. GIN (Xu et al., 2019) formalized the connection between GNN expressiveness and the Weisfeiler-Leman graph isomorphism test, and the TU datasets became standard benchmarks. Recent work on graph transformers (GPS, Exphormer) and higher-order GNNs pushes beyond WL limits, while OGB's ogbg-molhiv and ogbg-molpcba provide more rigorous large-scale evaluation than the classic small-graph benchmarks.

Link Prediction

Link prediction — inferring missing or future edges in a graph — underpins knowledge graph completion, drug-target discovery, and social network recommendation. TransE (2013) launched the knowledge graph embedding era, and the field matured through DistMult, RotatE, and CompGCN, benchmarked on FB15k-237 and WN18RR. The current frontier is inductive link prediction (generalizing to unseen entities), where GNN-based methods like NBFNet and foundation models like ULTRA (2024) show that a single model can transfer across entirely different knowledge graphs without retraining.

Get notified when these results update

New models drop weekly. We track them so you don't have to.

Something wrong or missing?

Help keep Molecular Property Prediction benchmarks accurate. Report outdated results, missing benchmarks, or errors.

Back to Graphs