Tabular Classification
Tabular classification — predicting discrete labels from structured rows and columns — remains the one domain where gradient-boosted trees (XGBoost, LightGBM, CatBoost) stubbornly rival deep learning. Despite years of effort, neural approaches like TabNet (2019) and FT-Transformer (2021) only match tree methods on certain splits, and a 2022 NeurIPS study by Grinsztajn et al. confirmed that trees still dominate on medium-sized datasets. The real frontier is AutoML systems (AutoGluon, FLAML) that ensemble both paradigms, and the emerging question of whether foundation models pretrained on millions of tables can finally tip the balance.
Tabular classification predicts categorical outcomes from structured row-column data — the bread and butter of real-world ML. Gradient-boosted trees (XGBoost, LightGBM, CatBoost) remain king, consistently beating deep learning approaches despite years of research into neural tabular methods.
History
XGBoost released — becomes the dominant Kaggle competition tool
LightGBM (Microsoft) introduces gradient-based one-side sampling for faster training
CatBoost (Yandex) handles categorical features natively with ordered target encoding
TabNet (Google) applies attention mechanisms to tabular data
FT-Transformer shows transformers can be competitive on tabular tasks with proper feature tokenization
Grinsztajn et al. show tree-based methods still outperform deep learning on most tabular benchmarks
TabPFN uses in-context learning to predict on small tabular datasets without training
TabR and ModernNCA show retrieval-augmented approaches improve deep tabular performance
Large-scale tabular benchmarks (TabZilla, OpenML-CC18) enable rigorous comparison
LLM-based tabular prediction emerges but still lags behind well-tuned GBDT
How Tabular Classification Works
Data Preprocessing
Handle missing values, encode categorical variables (one-hot, target encoding, CatBoost native), and optionally scale numerical features.
Feature Engineering
Create interaction features, polynomial features, and domain-specific transformations that capture known relationships.
Model Training
For GBDT: sequentially fit shallow trees where each tree corrects residual errors of the ensemble. For neural: tokenize features and process through transformer or MLP layers.
Hyperparameter Tuning
Critical for both paradigms — learning rate, tree depth, regularization (GBDT) or architecture, dropout, learning rate schedule (neural).
Ensemble and Calibration
Combine multiple models via averaging or stacking, and calibrate probabilities if needed for classification tasks.
Current Landscape
Tabular classification in 2025 remains dominated by gradient-boosted trees. Despite significant research into neural approaches (TabNet, FT-Transformer, TabR, TabPFN), well-tuned XGBoost/LightGBM still wins most head-to-head comparisons on standard benchmarks. The gap narrows on large datasets with many features, where transformers can leverage their capacity. In practice, AutoML tools (AutoGluon) that ensemble both paradigms provide the best results. The field's dirty secret: careful feature engineering and hyperparameter tuning matter more than model architecture.
Key Challenges
Trees vs. neural gap — deep learning still hasn't consistently beaten GBDT on heterogeneous tabular data
Feature engineering dependency — performance on tabular data is highly sensitive to manual feature engineering
Small datasets — many real-world tabular tasks have <10K rows, where deep learning overfits
Heterogeneous features — mixing numerical, categorical, ordinal, and text features in one model is non-trivial
Distribution shift — tabular data in production drifts over time, requiring monitoring and retraining
Quick Recommendations
Default choice for tabular
XGBoost / LightGBM / CatBoost
Consistently the best or near-best on tabular benchmarks; fast, interpretable, robust
Deep learning on tabular
FT-Transformer / TabR
Best neural approaches, competitive with GBDT when well-tuned
Very small datasets (<1K rows)
TabPFN
In-context learning approach that requires no training, works well on tiny datasets
AutoML
AutoGluon / H2O AutoML
Automatic model selection and ensembling across GBDT and neural methods
What's Next
The frontier is foundation models for tabular data — pretrained on millions of tables and fine-tuned for specific tasks. Early results (TabPFN, CARTE) show promise for small-data regimes. Expect LLM-based approaches that treat table rows as serialized text to improve, especially when tables contain text-heavy columns that benefit from language understanding.
Benchmarks & SOTA
Related Tasks
Time Series Forecasting
Time-series forecasting exploded in 2023-2025 when foundation models crossed over from NLP. Nixtla's TimeGPT (2023), Google's TimesFM (2024), and Amazon's Chronos showed that a single pretrained model can zero-shot forecast diverse series, rivaling task-specific statistical models like ETS and ARIMA. Yet the Monash benchmark and M-competition lineage (M4, M5) reveal an uncomfortable truth: simple ensembles of statistical methods still win on many univariate tasks. The real battle now is multivariate long-horizon forecasting, where PatchTST and iTransformer compete with state-space models like Mamba.
Time Series Classification
Classifying time series patterns.
Tabular Regression
Tabular regression — predicting continuous values from structured data — powers everything from house-price estimation to demand forecasting and shares the same tree-vs-neural tension as classification. XGBoost and LightGBM remain brutally effective defaults, but recent work on differentiable trees and table-aware transformers (TabPFN, 2022) showed that meta-learned priors can beat tuned GBDTs on small datasets in seconds. The challenge is distribution shift: real-world regression targets drift over time, and most benchmarks (UCI, Kaggle) are static snapshots that hide this problem entirely.
Something wrong or missing?
Help keep Tabular Classification benchmarks accurate. Report outdated results, missing benchmarks, or errors.