Farama Foundation / DeepMind
Suite of 57 Atari 2600 games. Standard benchmark for deep reinforcement learning agents.
Mean HNS across games. Human baseline = 100. Scores >100 exceed average human performance.
Higher is better
| Rank | Model | Source | Score | Year | Paper |
|---|---|---|---|---|---|
| 1 | go-explore Exploration-focused agent. Score is Mean HNS (skewed by Montezuma's Revenge), not Median. | Editorial | 40000 | 2025 | Source |
| 2 | agent57 Median HNS across 57 games. First to beat human baseline on ALL games. | Editorial | 4731.3 | 2025 | Source |
| 3 | MEME Mean HNS at 1B frames on Atari 57 (human=100). 95% CI: 3723–4445. Reaches human-level on all 57 games within 390M frames. | Community | 4087 | 2026 | Source |
| 4 | bbos-1 Model-based optimization. | Editorial | 1100 | 2025 | — |
| 5 | gdi-h3 High sample efficiency. | Editorial | 950 | 2025 | — |
| 6 | dreamerv3 Mastered Atari with fixed hyperparameters using world models. | Editorial | 840 | 2025 | Source |
| 7 | muzero Model-based agent planning with learned model. | Editorial | 731 | 2025 | Source |
| 8 | EfficientZero V2 EfficientZero V2. Mean HNS 242.8%, Median 128.6% on Atari 100k (26 games, 100k steps). Surpasses BBF. Model-based RL with Gumbel search. arXiv Mar 2024. | Community | 242.8 | 2026 | Source |
| 9 | rainbow-dqn Median HNS. Combines 7 improvements to DQN. | Editorial | 231 | 2025 | Source |
| 10 | BBF (Bigger, Better, Faster) Bigger, Better, Faster (BBF). Mean HNS 224.7% on Atari 100k (26 games, 100k steps). IQM: 104.5%, Median: 91.7%. Value-based RL with scaled networks. ICML 2023. | Community | 224.7 | 2026 | Source |
| 11 | DIAMOND DIAMOND (Diffusion World Model). Mean HNS 145.9%, IQM 64.1% on Atari 100k (26 games, 100k steps). Best agent trained entirely within a world model. NeurIPS 2024 Spotlight. | Community | 145.9 | 2026 | Source |
| 12 | STORM STORM (Stochastic Transformer World Models). Mean HNS 126.7% on Atari 100k (26 games, 100k steps). Transformer-based stochastic world model. arXiv Oct 2023. | Community | 126.7 | 2026 | Source |
| 13 | Simulus Simulus. First planning-free world model to reach human-level IQM and median HNS on Atari 100k (26 games, 100k steps). Superhuman on 13/26 games. Combines intrinsic motivation, prioritized replay, regression-as-classification. arXiv Feb 2025. | Community | 110 | 2026 | Source |
| 14 | DART DART (Discrete Abstract Representations for Transformer-based learning). Mean HNS 102.2%, Median 79.0%, IQM 57.5% on Atari 100k (26 games, 100k steps). ICML 2024. | Community | 102.2 | 2026 | Source |
| 15 | human-gamer Professional human tester baseline. | Editorial | 100 | 2025 | Source |
| 16 | dqn Historical baseline (2015). Median HNS. | Editorial | 79 | 2025 | Source |