Graphsgraph-ml

Graph Classification

Graph classification — predicting a label for an entire graph, not individual nodes — matters for molecular screening, social network analysis, and program verification. GIN (Xu et al., 2019) formalized the connection between GNN expressiveness and the Weisfeiler-Leman graph isomorphism test, and the TU datasets became standard benchmarks. Recent work on graph transformers (GPS, Exphormer) and higher-order GNNs pushes beyond WL limits, while OGB's ogbg-molhiv and ogbg-molpcba provide more rigorous large-scale evaluation than the classic small-graph benchmarks.

1 datasets0 resultsView full task mapping →

Graph classification assigns a label to an entire graph — predicting molecular properties, protein functions, or social network types. GIN, hierarchical pooling methods, and graph transformers are the key architectures, with molecular property prediction being the most impactful application domain.

History

2016

Patchy-SAN applies CNNs to graphs via node ordering and fixed-size receptive fields

2017

Graph-level readout functions (mean/max pooling over node features) become standard

2019

GIN (Graph Isomorphism Network) provably maximizes expressiveness among message-passing GNNs

2019

DiffPool learns hierarchical graph coarsening for graph classification

2020

OGB graph classification benchmarks (ogbg-molhiv, ogbg-molpcba) provide realistic evaluation

2021

Graph Multiset Transformer applies attention-based pooling for graph-level readout

2022

GPS (General Powerful Scalable) combines message passing with global attention

2023

Exphormer introduces sparse attention patterns for efficient graph transformers

2024

3D-aware graph networks (SchNet, DimeNet) dominate molecular graph classification

2025

Graph foundation models pretrained on large molecular datasets show strong transfer

How Graph Classification Works

Node Feature Extraction

Message-passing layers (GCN, GIN, GAT) compute learned representations for each node based on local graph structure.

Hierarchical Pooling (optional)

DiffPool or SAGPool progressively coarsens the graph, creating a hierarchy from individual nodes to clusters to the full graph.

Graph-Level Readout

Node representations are aggregated into a single graph-level vector — via mean/sum pooling, attention-based pooling, or virtual node approaches.

Classification Head

The graph-level representation is fed through an MLP classifier to predict the graph label or property.

Current Landscape

Graph classification in 2025 is driven by molecular applications — predicting toxicity, solubility, and binding affinity from molecular graphs. GIN with virtual nodes remains a strong baseline, while 3D-aware networks (SchNet, GemNet) dominate when atomic coordinates are available. Graph transformers are gaining ground by overcoming the expressiveness limitations of pure message passing. The field is small-data by nature (many datasets have <5K graphs), making data augmentation and pretraining strategies crucial.

Key Challenges

Expressiveness ceiling — message-passing GNNs are bounded by the Weisfeiler-Leman graph isomorphism test

Graph size variability — batching graphs of different sizes requires careful padding or dynamic batching

Pooling information loss — aggregating all node features into one vector inevitably discards structural information

Small dataset regimes — many graph classification datasets have only hundreds of graphs, causing high variance

3D structure — molecular properties depend on 3D geometry, not just 2D connectivity, requiring specialized architectures

Quick Recommendations

General graph classification

GIN + virtual node

Maximally expressive message-passing GNN with global information flow

Molecular property prediction

SchNet / DimeNet++ / GemNet

3D-aware architectures that capture atomic geometry

Large-scale evaluation

OGB benchmarks (ogbg-molhiv, ogbg-molpcba)

Realistic molecular datasets with proper train/test splits

Research frontier

Graph transformers (GPS, Exphormer)

Combine local message passing with global attention for higher expressiveness

What's Next

The frontier is pretrained molecular foundation models that transfer across tasks — analogous to BERT for text. Expect 3D geometry-aware pretraining on large molecular databases (PCQM4M, GEOM), with fine-tuning for specific property prediction tasks. Higher-order GNNs that go beyond pairwise message passing will improve expressiveness.

Benchmarks & SOTA

OGB (Open Graph Benchmark)

20200 results

Graph-level classification and prediction benchmark suite

No results tracked yet

Related Tasks

Node Classification

Node classification — assigning labels to vertices in a graph using both node features and neighborhood structure — is the flagship task for Graph Neural Networks. GCN (Kipf & Welling, 2017) established the Cora/Citeseer/PubMed benchmark trinity, but these datasets are tiny by modern standards and results have saturated well above 85% accuracy. The field has moved toward large-scale heterogeneous graphs (ogbn-arxiv, ogbn-products from OGB) and the unsettled debate over whether simple MLPs with neighborhood features can match GNNs, as shown by SIGN and SGC ablations.

Link Prediction

Link prediction — inferring missing or future edges in a graph — underpins knowledge graph completion, drug-target discovery, and social network recommendation. TransE (2013) launched the knowledge graph embedding era, and the field matured through DistMult, RotatE, and CompGCN, benchmarked on FB15k-237 and WN18RR. The current frontier is inductive link prediction (generalizing to unseen entities), where GNN-based methods like NBFNet and foundation models like ULTRA (2024) show that a single model can transfer across entirely different knowledge graphs without retraining.

Molecular Property Prediction

Molecular property prediction — estimating toxicity, solubility, binding affinity, or other properties from molecular structure — is the workhorse task of AI-driven drug discovery. GNNs operate on molecular graphs while transformer approaches (ChemBERTa, Uni-Mol) use SMILES strings or 3D coordinates. MoleculeNet (2018) and the Therapeutic Data Commons (TDC) provide standardized benchmarks, but the real bottleneck is distribution shift: models trained on known chemical space struggle with novel scaffolds, and the gap between leaderboard accuracy and actual wet-lab utility remains the field's central challenge.

Get notified when these results update

New models drop weekly. We track them so you don't have to.

Something wrong or missing?

Help keep Graph Classification benchmarks accurate. Report outdated results, missing benchmarks, or errors.

Back to Graphs