Tabular Classification
Tabular classification — predicting discrete labels from structured rows and columns — remains the one domain where gradient-boosted trees (XGBoost, LightGBM, CatBoost) stubbornly rival deep learning. Despite years of effort, neural approaches like TabNet (2019) and FT-Transformer (2021) only match tree methods on certain splits, and a 2022 NeurIPS study by Grinsztajn et al. confirmed that trees still dominate on medium-sized datasets. The real frontier is AutoML systems (AutoGluon, FLAML) that ensemble both paradigms, and the emerging question of whether foundation models pretrained on millions of tables can finally tip the balance.
Tabular classification predicts categorical outcomes from structured row-column data — the bread and butter of real-world ML. Gradient-boosted trees (XGBoost, LightGBM, CatBoost) remain king, consistently beating deep learning approaches despite years of research into neural tabular methods.
History
XGBoost released — becomes the dominant Kaggle competition tool
LightGBM (Microsoft) introduces gradient-based one-side sampling for faster training
CatBoost (Yandex) handles categorical features natively with ordered target encoding
TabNet (Google) applies attention mechanisms to tabular data
FT-Transformer shows transformers can be competitive on tabular tasks with proper feature tokenization
Grinsztajn et al. show tree-based methods still outperform deep learning on most tabular benchmarks
TabPFN uses in-context learning to predict on small tabular datasets without training
TabR and ModernNCA show retrieval-augmented approaches improve deep tabular performance
Large-scale tabular benchmarks (TabZilla, OpenML-CC18) enable rigorous comparison
LLM-based tabular prediction emerges but still lags behind well-tuned GBDT
How Tabular Classification Works
Data Preprocessing
Handle missing values, encode categorical variables (one-hot, target encoding, CatBoost native), and optionally scale numerical features.
Feature Engineering
Create interaction features, polynomial features, and domain-specific transformations that capture known relationships.
Model Training
For GBDT: sequentially fit shallow trees where each tree corrects residual errors of the ensemble. For neural: tokenize features and process through transformer or MLP layers.
Hyperparameter Tuning
Critical for both paradigms — learning rate, tree depth, regularization (GBDT) or architecture, dropout, learning rate schedule (neural).
Ensemble and Calibration
Combine multiple models via averaging or stacking, and calibrate probabilities if needed for classification tasks.
Current Landscape
Tabular classification in 2025 remains dominated by gradient-boosted trees. Despite significant research into neural approaches (TabNet, FT-Transformer, TabR, TabPFN), well-tuned XGBoost/LightGBM still wins most head-to-head comparisons on standard benchmarks. The gap narrows on large datasets with many features, where transformers can leverage their capacity. In practice, AutoML tools (AutoGluon) that ensemble both paradigms provide the best results. The field's dirty secret: careful feature engineering and hyperparameter tuning matter more than model architecture.
Key Challenges
Trees vs. neural gap — deep learning still hasn't consistently beaten GBDT on heterogeneous tabular data
Feature engineering dependency — performance on tabular data is highly sensitive to manual feature engineering
Small datasets — many real-world tabular tasks have <10K rows, where deep learning overfits
Heterogeneous features — mixing numerical, categorical, ordinal, and text features in one model is non-trivial
Distribution shift — tabular data in production drifts over time, requiring monitoring and retraining
Quick Recommendations
Default choice for tabular
XGBoost / LightGBM / CatBoost
Consistently the best or near-best on tabular benchmarks; fast, interpretable, robust
Deep learning on tabular
FT-Transformer / TabR
Best neural approaches, competitive with GBDT when well-tuned
Very small datasets (<1K rows)
TabPFN
In-context learning approach that requires no training, works well on tiny datasets
AutoML
AutoGluon / H2O AutoML
Automatic model selection and ensembling across GBDT and neural methods
What's Next
The frontier is foundation models for tabular data — pretrained on millions of tables and fine-tuned for specific tasks. Early results (TabPFN, CARTE) show promise for small-data regimes. Expect LLM-based approaches that treat table rows as serialized text to improve, especially when tables contain text-heavy columns that benefit from language understanding.
Benchmarks & SOTA
Related Tasks
Get notified when these results update
New models drop weekly. We track them so you don't have to.
Something wrong or missing?
Help keep Tabular Classification benchmarks accurate. Report outdated results, missing benchmarks, or errors.