Graphsgraph-ml

Node Classification

Node classification — assigning labels to vertices in a graph using both node features and neighborhood structure — is the flagship task for Graph Neural Networks. GCN (Kipf & Welling, 2017) established the Cora/Citeseer/PubMed benchmark trinity, but these datasets are tiny by modern standards and results have saturated well above 85% accuracy. The field has moved toward large-scale heterogeneous graphs (ogbn-arxiv, ogbn-products from OGB) and the unsettled debate over whether simple MLPs with neighborhood features can match GNNs, as shown by SIGN and SGC ablations.

2 datasets6 resultsView full task mapping →

Node classification assigns labels to nodes in a graph using their features and network structure. Graph Neural Networks (GNNs) — particularly GCN, GAT, and GraphSAGE — dominate the field, but recent work shows that simple MLPs with graph-augmented features can be surprisingly competitive on standard benchmarks.

History

2014

DeepWalk learns node embeddings via random walks + skip-gram, enabling downstream classification

2016

GCN (Kipf & Welling) introduces spectral graph convolutions for semi-supervised node classification

2017

GraphSAGE enables inductive node classification on unseen graphs via neighborhood sampling

2017

GAT (Graph Attention Networks) applies attention to neighbor aggregation

2019

GIN (Graph Isomorphism Network) shows most GNNs are at most as powerful as the WL test

2020

OGB (Open Graph Benchmark) provides large-scale, realistic node classification datasets

2021

GraphSAINT and Cluster-GCN enable training on million-node graphs via subgraph sampling

2023

Graph transformers (GPS, Exphormer) bring attention mechanisms to graph learning

2024

LLM-based node classification (using text features) achieves strong results on citation networks

2025

Foundation models for graphs emerge — pretrained on diverse graph data, fine-tuned for node tasks

How Node Classification Works

Graph Construction

Nodes and edges are defined from the data — citation links, social connections, molecular bonds — with feature vectors attached to each node.

Neighborhood Aggregation

Each node collects information from its neighbors via message passing — GCN uses weighted averaging, GAT uses attention, GraphSAGE uses sampling.

Feature Transformation

Aggregated neighborhood information is combined with the node's own features through learned transformations (linear layers + nonlinearities).

Multi-Hop Propagation

Multiple layers of aggregation capture information from k-hop neighborhoods, with k typically 2-4 layers (deeper risks oversmoothing).

Classification

The final node representation is passed through a classifier (MLP head) to predict the label, trained with cross-entropy on labeled nodes.

Current Landscape

Node classification in 2025 is a mature field with strong baselines. Standard GNNs (GCN, GAT, GraphSAGE) work well on homophilic graphs, while specialized architectures handle heterophilic settings. OGB has replaced Cora/Citeseer as the serious evaluation platform. The interesting frontier is integrating LLMs — using language model embeddings as node features dramatically improves performance on text-attributed graphs (citation networks, social media). Graph transformers show promise but haven't consistently outperformed well-tuned GNNs.

Key Challenges

Oversmoothing — stacking many GNN layers makes all node representations converge to the same vector

Scalability — message passing on billion-node graphs requires careful sampling and distributed training

Heterophily — standard GNNs assume connected nodes are similar, which fails on heterophilic graphs

Limited expressiveness — standard message-passing GNNs cannot distinguish certain non-isomorphic graph structures

Benchmark limitations — Cora/Citeseer/PubMed are tiny and nearly saturated; OGB provides more realistic evaluation

Quick Recommendations

Standard node classification

GAT / GraphSAGE

Reliable, well-understood, and scalable to large graphs

Large-scale graphs (millions of nodes)

GraphSAINT / ShaDow-GNN

Efficient subgraph sampling enables training on massive graphs

Heterophilic graphs

LINKX / GloGNN

Designed for graphs where connected nodes have different labels

Text-attributed graphs

LLM embeddings + GNN

Leverages rich text features from LLMs combined with graph structure

What's Next

The frontier is graph foundation models — pretrained on diverse graph structures and fine-tuned for specific node classification tasks. Expect LLM+GNN hybrid architectures to dominate text-attributed graphs, and scalable graph transformers to challenge message-passing GNNs on heterophilic and long-range dependency tasks.

Benchmarks & SOTA

Cora

Cora Citation Network

20006 results

Citation network of scientific papers. 2708 nodes, 5429 edges, 7 classes. Classic GNN benchmark.

State of the Art

ACNet

83.5

accuracy

Open Graph Benchmark

20200 results

Collection of challenging large-scale graph datasets (OGB-Arxiv, OGB-Products, etc.).

No results tracked yet

Related Tasks

Graph Classification

Graph classification — predicting a label for an entire graph, not individual nodes — matters for molecular screening, social network analysis, and program verification. GIN (Xu et al., 2019) formalized the connection between GNN expressiveness and the Weisfeiler-Leman graph isomorphism test, and the TU datasets became standard benchmarks. Recent work on graph transformers (GPS, Exphormer) and higher-order GNNs pushes beyond WL limits, while OGB's ogbg-molhiv and ogbg-molpcba provide more rigorous large-scale evaluation than the classic small-graph benchmarks.

Link Prediction

Link prediction — inferring missing or future edges in a graph — underpins knowledge graph completion, drug-target discovery, and social network recommendation. TransE (2013) launched the knowledge graph embedding era, and the field matured through DistMult, RotatE, and CompGCN, benchmarked on FB15k-237 and WN18RR. The current frontier is inductive link prediction (generalizing to unseen entities), where GNN-based methods like NBFNet and foundation models like ULTRA (2024) show that a single model can transfer across entirely different knowledge graphs without retraining.

Molecular Property Prediction

Molecular property prediction — estimating toxicity, solubility, binding affinity, or other properties from molecular structure — is the workhorse task of AI-driven drug discovery. GNNs operate on molecular graphs while transformer approaches (ChemBERTa, Uni-Mol) use SMILES strings or 3D coordinates. MoleculeNet (2018) and the Therapeutic Data Commons (TDC) provide standardized benchmarks, but the real bottleneck is distribution shift: models trained on known chemical space struggle with novel scaffolds, and the gap between leaderboard accuracy and actual wet-lab utility remains the field's central challenge.

Get notified when these results update

New models drop weekly. We track them so you don't have to.

Something wrong or missing?

Help keep Node Classification benchmarks accurate. Report outdated results, missing benchmarks, or errors.

Back to Graphs