Graph Classification
Graph classification — predicting a label for an entire graph, not individual nodes — matters for molecular screening, social network analysis, and program verification. GIN (Xu et al., 2019) formalized the connection between GNN expressiveness and the Weisfeiler-Leman graph isomorphism test, and the TU datasets became standard benchmarks. Recent work on graph transformers (GPS, Exphormer) and higher-order GNNs pushes beyond WL limits, while OGB's ogbg-molhiv and ogbg-molpcba provide more rigorous large-scale evaluation than the classic small-graph benchmarks.
Graph classification assigns a label to an entire graph — predicting molecular properties, protein functions, or social network types. GIN, hierarchical pooling methods, and graph transformers are the key architectures, with molecular property prediction being the most impactful application domain.
History
Patchy-SAN applies CNNs to graphs via node ordering and fixed-size receptive fields
Graph-level readout functions (mean/max pooling over node features) become standard
GIN (Graph Isomorphism Network) provably maximizes expressiveness among message-passing GNNs
DiffPool learns hierarchical graph coarsening for graph classification
OGB graph classification benchmarks (ogbg-molhiv, ogbg-molpcba) provide realistic evaluation
Graph Multiset Transformer applies attention-based pooling for graph-level readout
GPS (General Powerful Scalable) combines message passing with global attention
Exphormer introduces sparse attention patterns for efficient graph transformers
3D-aware graph networks (SchNet, DimeNet) dominate molecular graph classification
Graph foundation models pretrained on large molecular datasets show strong transfer
How Graph Classification Works
Node Feature Extraction
Message-passing layers (GCN, GIN, GAT) compute learned representations for each node based on local graph structure.
Hierarchical Pooling (optional)
DiffPool or SAGPool progressively coarsens the graph, creating a hierarchy from individual nodes to clusters to the full graph.
Graph-Level Readout
Node representations are aggregated into a single graph-level vector — via mean/sum pooling, attention-based pooling, or virtual node approaches.
Classification Head
The graph-level representation is fed through an MLP classifier to predict the graph label or property.
Current Landscape
Graph classification in 2025 is driven by molecular applications — predicting toxicity, solubility, and binding affinity from molecular graphs. GIN with virtual nodes remains a strong baseline, while 3D-aware networks (SchNet, GemNet) dominate when atomic coordinates are available. Graph transformers are gaining ground by overcoming the expressiveness limitations of pure message passing. The field is small-data by nature (many datasets have <5K graphs), making data augmentation and pretraining strategies crucial.
Key Challenges
Expressiveness ceiling — message-passing GNNs are bounded by the Weisfeiler-Leman graph isomorphism test
Graph size variability — batching graphs of different sizes requires careful padding or dynamic batching
Pooling information loss — aggregating all node features into one vector inevitably discards structural information
Small dataset regimes — many graph classification datasets have only hundreds of graphs, causing high variance
3D structure — molecular properties depend on 3D geometry, not just 2D connectivity, requiring specialized architectures
Quick Recommendations
General graph classification
GIN + virtual node
Maximally expressive message-passing GNN with global information flow
Molecular property prediction
SchNet / DimeNet++ / GemNet
3D-aware architectures that capture atomic geometry
Large-scale evaluation
OGB benchmarks (ogbg-molhiv, ogbg-molpcba)
Realistic molecular datasets with proper train/test splits
Research frontier
Graph transformers (GPS, Exphormer)
Combine local message passing with global attention for higher expressiveness
What's Next
The frontier is pretrained molecular foundation models that transfer across tasks — analogous to BERT for text. Expect 3D geometry-aware pretraining on large molecular databases (PCQM4M, GEOM), with fine-tuning for specific property prediction tasks. Higher-order GNNs that go beyond pairwise message passing will improve expressiveness.
Benchmarks & SOTA
Related Tasks
Node Classification
Node classification — assigning labels to vertices in a graph using both node features and neighborhood structure — is the flagship task for Graph Neural Networks. GCN (Kipf & Welling, 2017) established the Cora/Citeseer/PubMed benchmark trinity, but these datasets are tiny by modern standards and results have saturated well above 85% accuracy. The field has moved toward large-scale heterogeneous graphs (ogbn-arxiv, ogbn-products from OGB) and the unsettled debate over whether simple MLPs with neighborhood features can match GNNs, as shown by SIGN and SGC ablations.
Link Prediction
Link prediction — inferring missing or future edges in a graph — underpins knowledge graph completion, drug-target discovery, and social network recommendation. TransE (2013) launched the knowledge graph embedding era, and the field matured through DistMult, RotatE, and CompGCN, benchmarked on FB15k-237 and WN18RR. The current frontier is inductive link prediction (generalizing to unseen entities), where GNN-based methods like NBFNet and foundation models like ULTRA (2024) show that a single model can transfer across entirely different knowledge graphs without retraining.
Molecular Property Prediction
Molecular property prediction — estimating toxicity, solubility, binding affinity, or other properties from molecular structure — is the workhorse task of AI-driven drug discovery. GNNs operate on molecular graphs while transformer approaches (ChemBERTa, Uni-Mol) use SMILES strings or 3D coordinates. MoleculeNet (2018) and the Therapeutic Data Commons (TDC) provide standardized benchmarks, but the real bottleneck is distribution shift: models trained on known chemical space struggle with novel scaffolds, and the gap between leaderboard accuracy and actual wet-lab utility remains the field's central challenge.
Something wrong or missing?
Help keep Graph Classification benchmarks accurate. Report outdated results, missing benchmarks, or errors.