Node Classification
Node classification — assigning labels to vertices in a graph using both node features and neighborhood structure — is the flagship task for Graph Neural Networks. GCN (Kipf & Welling, 2017) established the Cora/Citeseer/PubMed benchmark trinity, but these datasets are tiny by modern standards and results have saturated well above 85% accuracy. The field has moved toward large-scale heterogeneous graphs (ogbn-arxiv, ogbn-products from OGB) and the unsettled debate over whether simple MLPs with neighborhood features can match GNNs, as shown by SIGN and SGC ablations.
Node classification assigns labels to nodes in a graph using their features and network structure. Graph Neural Networks (GNNs) — particularly GCN, GAT, and GraphSAGE — dominate the field, but recent work shows that simple MLPs with graph-augmented features can be surprisingly competitive on standard benchmarks.
History
DeepWalk learns node embeddings via random walks + skip-gram, enabling downstream classification
GCN (Kipf & Welling) introduces spectral graph convolutions for semi-supervised node classification
GraphSAGE enables inductive node classification on unseen graphs via neighborhood sampling
GAT (Graph Attention Networks) applies attention to neighbor aggregation
GIN (Graph Isomorphism Network) shows most GNNs are at most as powerful as the WL test
OGB (Open Graph Benchmark) provides large-scale, realistic node classification datasets
GraphSAINT and Cluster-GCN enable training on million-node graphs via subgraph sampling
Graph transformers (GPS, Exphormer) bring attention mechanisms to graph learning
LLM-based node classification (using text features) achieves strong results on citation networks
Foundation models for graphs emerge — pretrained on diverse graph data, fine-tuned for node tasks
How Node Classification Works
Graph Construction
Nodes and edges are defined from the data — citation links, social connections, molecular bonds — with feature vectors attached to each node.
Neighborhood Aggregation
Each node collects information from its neighbors via message passing — GCN uses weighted averaging, GAT uses attention, GraphSAGE uses sampling.
Feature Transformation
Aggregated neighborhood information is combined with the node's own features through learned transformations (linear layers + nonlinearities).
Multi-Hop Propagation
Multiple layers of aggregation capture information from k-hop neighborhoods, with k typically 2-4 layers (deeper risks oversmoothing).
Classification
The final node representation is passed through a classifier (MLP head) to predict the label, trained with cross-entropy on labeled nodes.
Current Landscape
Node classification in 2025 is a mature field with strong baselines. Standard GNNs (GCN, GAT, GraphSAGE) work well on homophilic graphs, while specialized architectures handle heterophilic settings. OGB has replaced Cora/Citeseer as the serious evaluation platform. The interesting frontier is integrating LLMs — using language model embeddings as node features dramatically improves performance on text-attributed graphs (citation networks, social media). Graph transformers show promise but haven't consistently outperformed well-tuned GNNs.
Key Challenges
Oversmoothing — stacking many GNN layers makes all node representations converge to the same vector
Scalability — message passing on billion-node graphs requires careful sampling and distributed training
Heterophily — standard GNNs assume connected nodes are similar, which fails on heterophilic graphs
Limited expressiveness — standard message-passing GNNs cannot distinguish certain non-isomorphic graph structures
Benchmark limitations — Cora/Citeseer/PubMed are tiny and nearly saturated; OGB provides more realistic evaluation
Quick Recommendations
Standard node classification
GAT / GraphSAGE
Reliable, well-understood, and scalable to large graphs
Large-scale graphs (millions of nodes)
GraphSAINT / ShaDow-GNN
Efficient subgraph sampling enables training on massive graphs
Heterophilic graphs
LINKX / GloGNN
Designed for graphs where connected nodes have different labels
Text-attributed graphs
LLM embeddings + GNN
Leverages rich text features from LLMs combined with graph structure
What's Next
The frontier is graph foundation models — pretrained on diverse graph structures and fine-tuned for specific node classification tasks. Expect LLM+GNN hybrid architectures to dominate text-attributed graphs, and scalable graph transformers to challenge message-passing GNNs on heterophilic and long-range dependency tasks.
Benchmarks & SOTA
Cora
Cora Citation Network
Citation network of scientific papers. 2708 nodes, 5429 edges, 7 classes. Classic GNN benchmark.
State of the Art
ACNet
83.5
accuracy
Open Graph Benchmark
Open Graph Benchmark
Collection of challenging large-scale graph datasets (OGB-Arxiv, OGB-Products, etc.).
No results tracked yet
Related Tasks
Graph Classification
Graph classification — predicting a label for an entire graph, not individual nodes — matters for molecular screening, social network analysis, and program verification. GIN (Xu et al., 2019) formalized the connection between GNN expressiveness and the Weisfeiler-Leman graph isomorphism test, and the TU datasets became standard benchmarks. Recent work on graph transformers (GPS, Exphormer) and higher-order GNNs pushes beyond WL limits, while OGB's ogbg-molhiv and ogbg-molpcba provide more rigorous large-scale evaluation than the classic small-graph benchmarks.
Link Prediction
Link prediction — inferring missing or future edges in a graph — underpins knowledge graph completion, drug-target discovery, and social network recommendation. TransE (2013) launched the knowledge graph embedding era, and the field matured through DistMult, RotatE, and CompGCN, benchmarked on FB15k-237 and WN18RR. The current frontier is inductive link prediction (generalizing to unseen entities), where GNN-based methods like NBFNet and foundation models like ULTRA (2024) show that a single model can transfer across entirely different knowledge graphs without retraining.
Molecular Property Prediction
Molecular property prediction — estimating toxicity, solubility, binding affinity, or other properties from molecular structure — is the workhorse task of AI-driven drug discovery. GNNs operate on molecular graphs while transformer approaches (ChemBERTa, Uni-Mol) use SMILES strings or 3D coordinates. MoleculeNet (2018) and the Therapeutic Data Commons (TDC) provide standardized benchmarks, but the real bottleneck is distribution shift: models trained on known chemical space struggle with novel scaffolds, and the gap between leaderboard accuracy and actual wet-lab utility remains the field's central challenge.
Something wrong or missing?
Help keep Node Classification benchmarks accurate. Report outdated results, missing benchmarks, or errors.