Time Seriestabular-regression

Tabular Regression

Tabular regression — predicting continuous values from structured data — powers everything from house-price estimation to demand forecasting and shares the same tree-vs-neural tension as classification. XGBoost and LightGBM remain brutally effective defaults, but recent work on differentiable trees and table-aware transformers (TabPFN, 2022) showed that meta-learned priors can beat tuned GBDTs on small datasets in seconds. The challenge is distribution shift: real-world regression targets drift over time, and most benchmarks (UCI, Kaggle) are static snapshots that hide this problem entirely.

1 datasets2 resultsView full task mapping →

Tabular regression predicts continuous numerical targets from structured data — house prices, risk scores, time-to-event. Like classification, gradient-boosted trees dominate, with the added complexity of heteroscedastic noise, outlier handling, and quantile/distributional prediction for uncertainty estimation.

History

2001

Gradient Boosting Machines (Friedman) establish the algorithmic foundation

2014

XGBoost makes GBMs fast and regularized enough for production use

2017

LightGBM and CatBoost introduce histogram-based splitting and native categorical handling

2019

Neural Oblivious Decision Ensembles (NODE) bridge trees and neural networks

2021

FT-Transformer achieves competitive tabular regression via feature tokenization

2022

Multiple meta-analyses confirm GBDT superiority for tabular regression across diverse datasets

2023

Quantile regression forests and NGBoost enable distributional predictions

2024

TabR achieves new neural SOTA on regression tasks via retrieval augmentation

2024

Conformal prediction methods gain traction for calibrated uncertainty in tabular regression

2025

Foundation models for regression emerge, showing promise on small datasets

How Tabular Regression Works

Target Analysis

Analyze the target distribution — check for skewness, outliers, and heteroscedasticity that may require log-transforms or robust losses.

Feature Preparation

Handle missing values, encode categoricals, engineer interactions, and optionally select features via importance scores.

Model Training

GBDT fits sequential trees minimizing MSE/MAE/Huber loss. Neural approaches tokenize features and regress through transformer or MLP layers.

Uncertainty Estimation

Quantile regression, NGBoost, or conformal prediction provide prediction intervals alongside point estimates.

Evaluation

Assess using RMSE, MAE, and R² on held-out data, with attention to performance across different target ranges (tail behavior matters).

Current Landscape

Tabular regression in 2025 mirrors classification: GBDT dominates, with neural methods closing the gap on larger datasets. The distinctive challenge is uncertainty quantification — most real-world regression applications need prediction intervals, not just point estimates. Conformal prediction has emerged as the practical standard for calibrated uncertainty, wrapping any base model. AutoML tools handle the model selection problem well, with AutoGluon ensembles being hard to beat in practice.

Key Challenges

Outlier sensitivity — MSE loss is sensitive to extreme values, requiring robust loss functions or target transforms

Heteroscedasticity — prediction uncertainty varies across input space, requiring distributional or quantile regression

Feature interactions — tabular regression performance depends heavily on capturing the right feature interactions

Extrapolation — all methods struggle when test inputs fall outside the training distribution

Interpretability requirements — many regression applications (credit scoring, insurance) require explainable predictions

Quick Recommendations

Default tabular regression

LightGBM / XGBoost

Best performance, fastest training, built-in feature importance

Uncertainty quantification

NGBoost / Quantile Regression Forests

Principled distributional prediction for risk-sensitive applications

Calibrated intervals

Any model + conformal prediction

Distribution-free prediction intervals with guaranteed coverage

Neural approach

TabR / FT-Transformer

Best neural methods when deep learning is required (e.g., end-to-end training with other modalities)

What's Next

The frontier is distributional regression at scale — predicting full output distributions rather than point estimates, and doing so efficiently in production. Expect advances in conformal prediction methods, neural distributional regression, and foundation models that transfer regression knowledge across domains.

Benchmarks & SOTA

California Housing

19972 results

Predict median house values from California census data

State of the Art

XGBoost

DMLC

0.453

rmse

Related Tasks

Tabular Machine Learning

Classification and regression on tabular data. Benchmarked on TabArena (Elo over 51 datasets); gradient-boosted trees vs tabular foundation models like TabPFN.

Tabular Classification

Tabular classification — predicting discrete labels from structured rows and columns — remains the one domain where gradient-boosted trees (XGBoost, LightGBM, CatBoost) stubbornly rival deep learning. Despite years of effort, neural approaches like TabNet (2019) and FT-Transformer (2021) only match tree methods on certain splits, and a 2022 NeurIPS study by Grinsztajn et al. confirmed that trees still dominate on medium-sized datasets. The real frontier is AutoML systems (AutoGluon, FLAML) that ensemble both paradigms, and the emerging question of whether foundation models pretrained on millions of tables can finally tip the balance.

Get notified when these results update

New models drop weekly. We track them so you don't have to.

Something wrong or missing?

Help keep Tabular Regression benchmarks accurate. Report outdated results, missing benchmarks, or errors.

Back to Time Series