Time Series
Predicting future trends or detecting anomalies? Benchmark forecasting accuracy and pattern recognition in sequential data.
Time series analysis spans forecasting, anomaly detection, and classification across finance, weather, energy, and healthcare. Foundation models like TimesFM, Chronos, and Moirai now offer zero-shot forecasting that rivals task-specific models, but the right baseline still depends on horizon, channel structure, and how much historical data you actually have.
Tasks & Benchmarks
State of the Field (2025)
- Foundation models are now credible baselines: Google's TimesFM is a 200M-parameter decoder-only model trained on 100B real-world time points and released on GitHub/Hugging Face after ICML 2024 acceptance; Chronos and Moirai add probabilistic and multivariate alternatives
- PatchTST and iTransformer established transformer architectures for time series by treating temporal patches as tokens, outperforming prior deep learning methods on long-horizon forecasting benchmarks
- Surprising baseline results: simple linear models (DLinear, RLinear) match or beat complex transformers on many forecasting benchmarks, questioning whether architectural complexity adds value for standard tasks
- Anomaly detection shifted to foundation model embeddings: pre-trained temporal representations enable few-shot anomaly detection in manufacturing, IT operations, and financial fraud without domain-specific training
Quick Recommendations
Zero-shot forecasting (no training data available)
TimesFM or Chronos
TimesFM delivers strong zero-shot performance across public Monash and ETT-style forecasting tests and is available as an open model. Chronos offers probabilistic forecasts with uncertainty quantification. Both give a strong first baseline before domain-specific training.
Long-horizon forecasting (weather, energy)
PatchTST or iTransformer
Patch-based tokenization captures local temporal patterns efficiently. iTransformer's inverted attention over variables handles multivariate dependencies. Both scale to 720+ step horizons.
Time series classification (ECG, industrial sensors)
InceptionTime or ROCKET
InceptionTime provides strong deep learning baseline with multi-scale convolutions. ROCKET achieves competitive accuracy with random convolutional kernels at 100x lower training cost.
Anomaly detection in production systems
Foundation model embeddings + isolation forest, or USAD
Pre-trained temporal embeddings from TimesFM or Chronos provide rich representations. Combine with lightweight detectors for few-shot anomaly detection. USAD offers fast unsupervised detection for streaming data.
Show all datasets and SOTA results
Tabular Classification
Tabular Regression
Honest Takes
Simple models still beat transformers on most benchmarks
DLinear - a single linear layer - matches or beats PatchTST and Informer on standard forecasting benchmarks. Before deploying a complex transformer pipeline, test a linear baseline. You may not need the complexity, and you definitely need the baseline comparison.
Foundation models are the future, but not yet the present
TimesFM and Chronos show impressive zero-shot results, but fine-tuned task-specific models still win on some domain-specific benchmarks, especially multivariate settings with strong cross-channel structure. Foundation models are ideal for rapid prototyping and cold-start scenarios. For production systems with historical data, fine-tuning remains worth the effort.
The forecasting community has a benchmarking problem
Many papers evaluate on the same 7-8 ETT/Weather/Electricity datasets with inconsistent preprocessing and horizons. Results vary 5-10% based on normalization choices alone. If a paper doesn't release exact preprocessing code, treat the numbers with skepticism.
Get notified when these results update
New models drop weekly. We track them so you don't have to.