Time Series

Predicting future trends or detecting anomalies? Benchmark forecasting accuracy and pattern recognition in sequential data.

4 tasks8 datasets82 results

Time series analysis spans forecasting, anomaly detection, and classification across finance, weather, energy, and healthcare. Foundation models like TimesFM, Chronos, and Moirai now offer zero-shot forecasting that rivals task-specific models, fundamentally changing how practitioners approach temporal data.

State of the Field (2025)

  • Foundation models dominate: Google's TimesFM (200M params, 100B time points), Amazon's Chronos (710M params), and Moirai provide zero-shot forecasting competitive with supervised baselines across diverse domains
  • PatchTST and iTransformer established transformer architectures for time series by treating temporal patches as tokens, outperforming prior deep learning methods on long-horizon forecasting benchmarks
  • Surprising baseline results: simple linear models (DLinear, RLinear) match or beat complex transformers on many forecasting benchmarks, questioning whether architectural complexity adds value for standard tasks
  • Anomaly detection shifted to foundation model embeddings: pre-trained temporal representations enable few-shot anomaly detection in manufacturing, IT operations, and financial fraud without domain-specific training

Quick Recommendations

Zero-shot forecasting (no training data available)

TimesFM or Chronos

TimesFM delivers strong zero-shot performance across energy, retail, weather, and finance domains. Chronos offers probabilistic forecasts with uncertainty quantification. Both eliminate the cold-start problem.

Long-horizon forecasting (weather, energy)

PatchTST or iTransformer

Patch-based tokenization captures local temporal patterns efficiently. iTransformer's inverted attention over variables handles multivariate dependencies. Both scale to 720+ step horizons.

Time series classification (ECG, industrial sensors)

InceptionTime or ROCKET

InceptionTime provides strong deep learning baseline with multi-scale convolutions. ROCKET achieves competitive accuracy with random convolutional kernels at 100x lower training cost.

Anomaly detection in production systems

Foundation model embeddings + isolation forest, or USAD

Pre-trained temporal embeddings from TimesFM or Chronos provide rich representations. Combine with lightweight detectors for few-shot anomaly detection. USAD offers fast unsupervised detection for streaming data.

Tasks & Benchmarks

Time Series Forecasting

Time-series forecasting exploded in 2023-2025 when foundation models crossed over from NLP. Nixtla's TimeGPT (2023), Google's TimesFM (2024), and Amazon's Chronos showed that a single pretrained model can zero-shot forecast diverse series, rivaling task-specific statistical models like ETS and ARIMA. Yet the Monash benchmark and M-competition lineage (M4, M5) reveal an uncomfortable truth: simple ensembles of statistical methods still win on many univariate tasks. The real battle now is multivariate long-horizon forecasting, where PatchTST and iTransformer compete with state-space models like Mamba.

6 datasets75 resultsSOTA tracked

Tabular Classification

Tabular classification — predicting discrete labels from structured rows and columns — remains the one domain where gradient-boosted trees (XGBoost, LightGBM, CatBoost) stubbornly rival deep learning. Despite years of effort, neural approaches like TabNet (2019) and FT-Transformer (2021) only match tree methods on certain splits, and a 2022 NeurIPS study by Grinsztajn et al. confirmed that trees still dominate on medium-sized datasets. The real frontier is AutoML systems (AutoGluon, FLAML) that ensemble both paradigms, and the emerging question of whether foundation models pretrained on millions of tables can finally tip the balance.

1 datasets5 resultsSOTA tracked

Tabular Regression

Tabular regression — predicting continuous values from structured data — powers everything from house-price estimation to demand forecasting and shares the same tree-vs-neural tension as classification. XGBoost and LightGBM remain brutally effective defaults, but recent work on differentiable trees and table-aware transformers (TabPFN, 2022) showed that meta-learned priors can beat tuned GBDTs on small datasets in seconds. The challenge is distribution shift: real-world regression targets drift over time, and most benchmarks (UCI, Kaggle) are static snapshots that hide this problem entirely.

1 datasets2 resultsSOTA tracked

Time Series Classification

Classifying time series patterns.

0 datasets0 results
Show all datasets and SOTA results

Time Series Forecasting

ETTh12021
0.59(mse)Chronos-Large
ETTh22021
0.46(mse)Chronos-Large
ETTm12021
0.56(mse)Chronos-Large
ETTm22021
0.35(mae)TimesFM
M4 Competition2018
13.95(smapi)TiDE
Weather2021
0.32(mae)DLinear

Tabular Classification

88.5(accuracy)AutoGluon-Tabular

Tabular Regression

0.45(rmse)XGBoost

Time Series Classification

No datasets indexed yet. Contribute on GitHub

Honest Takes

Simple models still beat transformers on most benchmarks

DLinear - a single linear layer - matches or beats PatchTST and Informer on standard forecasting benchmarks. Before deploying a complex transformer pipeline, test a linear baseline. You may not need the complexity, and you definitely need the baseline comparison.

Foundation models are the future, but not yet the present

TimesFM and Chronos show impressive zero-shot results, but fine-tuned task-specific models still win on domain-specific benchmarks by 5-15%. Foundation models are ideal for rapid prototyping and cold-start scenarios. For production systems with historical data, fine-tuning remains worth the effort.

The forecasting community has a benchmarking problem

Many papers evaluate on the same 7-8 ETT/Weather/Electricity datasets with inconsistent preprocessing and horizons. Results vary 5-10% based on normalization choices alone. If a paper doesn't release exact preprocessing code, treat the numbers with skepticism.

Time Series Benchmarks - CodeSOTA | CodeSOTA