Home/Browse/Reinforcement Learning/Atari Games/Atari 2600

Atari 2600

Farama Foundation / DeepMind

Suite of 57 Atari 2600 games. Standard benchmark for deep reinforcement learning agents.

Benchmark Stats

Models16
Papers16
Metrics1

SOTA History

Mean Human-Normalized Score

Mean HNS across games. Human baseline = 100. Scores >100 exceed average human performance.

Higher is better

RankModelSourceScoreYearPaper
1go-explore

Exploration-focused agent. Score is Mean HNS (skewed by Montezuma's Revenge), not Median.

Editorial400002025Source
2agent57

Median HNS across 57 games. First to beat human baseline on ALL games.

Editorial4731.32025Source
3MEME

Mean HNS at 1B frames on Atari 57 (human=100). 95% CI: 3723–4445. Reaches human-level on all 57 games within 390M frames.

Community40872026Source
4bbos-1

Model-based optimization.

Editorial11002025
5gdi-h3

High sample efficiency.

Editorial9502025
6dreamerv3

Mastered Atari with fixed hyperparameters using world models.

Editorial8402025Source
7muzero

Model-based agent planning with learned model.

Editorial7312025Source
8EfficientZero V2

EfficientZero V2. Mean HNS 242.8%, Median 128.6% on Atari 100k (26 games, 100k steps). Surpasses BBF. Model-based RL with Gumbel search. arXiv Mar 2024.

Community242.82026Source
9rainbow-dqn

Median HNS. Combines 7 improvements to DQN.

Editorial2312025Source
10BBF (Bigger, Better, Faster)

Bigger, Better, Faster (BBF). Mean HNS 224.7% on Atari 100k (26 games, 100k steps). IQM: 104.5%, Median: 91.7%. Value-based RL with scaled networks. ICML 2023.

Community224.72026Source
11DIAMOND

DIAMOND (Diffusion World Model). Mean HNS 145.9%, IQM 64.1% on Atari 100k (26 games, 100k steps). Best agent trained entirely within a world model. NeurIPS 2024 Spotlight.

Community145.92026Source
12STORM

STORM (Stochastic Transformer World Models). Mean HNS 126.7% on Atari 100k (26 games, 100k steps). Transformer-based stochastic world model. arXiv Oct 2023.

Community126.72026Source
13Simulus

Simulus. First planning-free world model to reach human-level IQM and median HNS on Atari 100k (26 games, 100k steps). Superhuman on 13/26 games. Combines intrinsic motivation, prioritized replay, regression-as-classification. arXiv Feb 2025.

Community1102026Source
14DART

DART (Discrete Abstract Representations for Transformer-based learning). Mean HNS 102.2%, Median 79.0%, IQM 57.5% on Atari 100k (26 games, 100k steps). ICML 2024.

Community102.22026Source
15human-gamer

Professional human tester baseline.

Editorial1002025Source
16dqn

Historical baseline (2015). Median HNS.

Editorial792025Source

Submit a Result