Time Seriestabular-regression

Tabular Regression

Tabular regression — predicting continuous values from structured data — powers everything from house-price estimation to demand forecasting and shares the same tree-vs-neural tension as classification. XGBoost and LightGBM remain brutally effective defaults, but recent work on differentiable trees and table-aware transformers (TabPFN, 2022) showed that meta-learned priors can beat tuned GBDTs on small datasets in seconds. The challenge is distribution shift: real-world regression targets drift over time, and most benchmarks (UCI, Kaggle) are static snapshots that hide this problem entirely.

1 datasets2 resultsView full task mapping →

Tabular regression predicts continuous numerical targets from structured data — house prices, risk scores, time-to-event. Like classification, gradient-boosted trees dominate, with the added complexity of heteroscedastic noise, outlier handling, and quantile/distributional prediction for uncertainty estimation.

History

2001

Gradient Boosting Machines (Friedman) establish the algorithmic foundation

2014

XGBoost makes GBMs fast and regularized enough for production use

2017

LightGBM and CatBoost introduce histogram-based splitting and native categorical handling

2019

Neural Oblivious Decision Ensembles (NODE) bridge trees and neural networks

2021

FT-Transformer achieves competitive tabular regression via feature tokenization

2022

Multiple meta-analyses confirm GBDT superiority for tabular regression across diverse datasets

2023

Quantile regression forests and NGBoost enable distributional predictions

2024

TabR achieves new neural SOTA on regression tasks via retrieval augmentation

2024

Conformal prediction methods gain traction for calibrated uncertainty in tabular regression

2025

Foundation models for regression emerge, showing promise on small datasets

How Tabular Regression Works

1Target AnalysisAnalyze the target distribu…2Feature PreparationHandle missing values3Model TrainingGBDT fits sequential trees …4Uncertainty EstimationQuantile regression5EvaluationAssess using RMSETabular Regression Pipeline
1

Target Analysis

Analyze the target distribution — check for skewness, outliers, and heteroscedasticity that may require log-transforms or robust losses.

2

Feature Preparation

Handle missing values, encode categoricals, engineer interactions, and optionally select features via importance scores.

3

Model Training

GBDT fits sequential trees minimizing MSE/MAE/Huber loss. Neural approaches tokenize features and regress through transformer or MLP layers.

4

Uncertainty Estimation

Quantile regression, NGBoost, or conformal prediction provide prediction intervals alongside point estimates.

5

Evaluation

Assess using RMSE, MAE, and R² on held-out data, with attention to performance across different target ranges (tail behavior matters).

Current Landscape

Tabular regression in 2025 mirrors classification: GBDT dominates, with neural methods closing the gap on larger datasets. The distinctive challenge is uncertainty quantification — most real-world regression applications need prediction intervals, not just point estimates. Conformal prediction has emerged as the practical standard for calibrated uncertainty, wrapping any base model. AutoML tools handle the model selection problem well, with AutoGluon ensembles being hard to beat in practice.

Key Challenges

Outlier sensitivity — MSE loss is sensitive to extreme values, requiring robust loss functions or target transforms

Heteroscedasticity — prediction uncertainty varies across input space, requiring distributional or quantile regression

Feature interactions — tabular regression performance depends heavily on capturing the right feature interactions

Extrapolation — all methods struggle when test inputs fall outside the training distribution

Interpretability requirements — many regression applications (credit scoring, insurance) require explainable predictions

Quick Recommendations

Default tabular regression

LightGBM / XGBoost

Best performance, fastest training, built-in feature importance

Uncertainty quantification

NGBoost / Quantile Regression Forests

Principled distributional prediction for risk-sensitive applications

Calibrated intervals

Any model + conformal prediction

Distribution-free prediction intervals with guaranteed coverage

Neural approach

TabR / FT-Transformer

Best neural methods when deep learning is required (e.g., end-to-end training with other modalities)

What's Next

The frontier is distributional regression at scale — predicting full output distributions rather than point estimates, and doing so efficiently in production. Expect advances in conformal prediction methods, neural distributional regression, and foundation models that transfer regression knowledge across domains.

Benchmarks & SOTA

Related Tasks

Something wrong or missing?

Help keep Tabular Regression benchmarks accurate. Report outdated results, missing benchmarks, or errors.

0/2000