Time Series

Predicting future trends or detecting anomalies? Benchmark forecasting accuracy and pattern recognition in sequential data.

3 tasks2 datasets7 results

Time series analysis spans forecasting, anomaly detection, and classification across finance, weather, energy, and healthcare. Foundation models like TimesFM, Chronos, and Moirai now offer zero-shot forecasting that rivals task-specific models, but the right baseline still depends on horizon, channel structure, and how much historical data you actually have.

Tasks & Benchmarks

State of the Field (2025)

Foundation models are now credible baselines: Google's TimesFM is a 200M-parameter decoder-only model trained on 100B real-world time points and released on GitHub/Hugging Face after ICML 2024 acceptance; Chronos and Moirai add probabilistic and multivariate alternatives
PatchTST and iTransformer established transformer architectures for time series by treating temporal patches as tokens, outperforming prior deep learning methods on long-horizon forecasting benchmarks
Surprising baseline results: simple linear models (DLinear, RLinear) match or beat complex transformers on many forecasting benchmarks, questioning whether architectural complexity adds value for standard tasks
Anomaly detection shifted to foundation model embeddings: pre-trained temporal representations enable few-shot anomaly detection in manufacturing, IT operations, and financial fraud without domain-specific training

Quick Recommendations

Zero-shot forecasting (no training data available)

TimesFM or Chronos

TimesFM delivers strong zero-shot performance across public Monash and ETT-style forecasting tests and is available as an open model. Chronos offers probabilistic forecasts with uncertainty quantification. Both give a strong first baseline before domain-specific training.

Long-horizon forecasting (weather, energy)

PatchTST or iTransformer

Patch-based tokenization captures local temporal patterns efficiently. iTransformer's inverted attention over variables handles multivariate dependencies. Both scale to 720+ step horizons.

Time series classification (ECG, industrial sensors)

InceptionTime or ROCKET

InceptionTime provides strong deep learning baseline with multi-scale convolutions. ROCKET achieves competitive accuracy with random convolutional kernels at 100x lower training cost.

Anomaly detection in production systems

Foundation model embeddings + isolation forest, or USAD

Pre-trained temporal embeddings from TimesFM or Chronos provide rich representations. Combine with lightweight detectors for few-shot anomaly detection. USAD offers fast unsupervised detection for streaming data.

Show all datasets and SOTA results

Tabular Classification

OpenML-CC182019

88.5(accuracy)AutoGluon-Tabular

Tabular Regression

California Housing1997

0.45(rmse)XGBoost

Tabular Machine Learning

No datasets indexed yet. Contribute on GitHub

Honest Takes

Simple models still beat transformers on most benchmarks

DLinear - a single linear layer - matches or beats PatchTST and Informer on standard forecasting benchmarks. Before deploying a complex transformer pipeline, test a linear baseline. You may not need the complexity, and you definitely need the baseline comparison.

Foundation models are the future, but not yet the present

TimesFM and Chronos show impressive zero-shot results, but fine-tuned task-specific models still win on some domain-specific benchmarks, especially multivariate settings with strong cross-channel structure. Foundation models are ideal for rapid prototyping and cold-start scenarios. For production systems with historical data, fine-tuning remains worth the effort.

The forecasting community has a benchmarking problem

Many papers evaluate on the same 7-8 ETT/Weather/Electricity datasets with inconsistent preprocessing and horizons. Results vary 5-10% based on normalization choices alone. If a paper doesn't release exact preprocessing code, treat the numbers with skepticism.

Get notified when these results update

New models drop weekly. We track them so you don't have to.