| 01 | go-explore Exploration-focused agent. Score is Mean HNS (skewed by Montezuma's Revenge), not Median. | paper | 40000 | 2025 | Source ↗ | Looks wrong? |
| 02 | LBC Mean HNS 10077.52% at 1B frames on Atari 57. Median HNS 1934% (Agent57-style max-over-training). Breaks 24 human world records. ICLR 2023 Oral. | paper | 10078 | N/A | Source ↗ | Looks wrong? |
| 03 | agent57 Median HNS across 57 games. First to beat human baseline on ALL games. | paper | 4731.3 | 2025 | Source ↗ | Looks wrong? |
| 04 | MEME Mean HNS at 1B frames on Atari 57 (human=100). 95% CI: 3723–4445. Reaches human-level on all 57 games within 390M frames. | verified | 4087 | 2026 | Source ↗ | Looks wrong? |
| 05 | Disco57 IQM (interquartile mean) = 13.86 at 200M frames on Atari 57. Metric differs from mean/median HNS in other entries — stored as IQM×100 for scale. Automated RL rule discovery. Nature, Oct 2025. | paper | 1386 | N/A | Source ↗ | Looks wrong? |
| 06 | bbos-1 Model-based optimization. | paper | 1100 | 2025 | N/A | Looks wrong? |
| 07 | gdi-h3 High sample efficiency. | paper | 950 | 2025 | N/A | Looks wrong? |
| 08 | dreamerv3 Mastered Atari with fixed hyperparameters using world models. | paper | 840 | 2025 | Source ↗ | Looks wrong? |
| 09 | muzero Model-based agent planning with learned model. | paper | 731 | 2025 | Source ↗ | Looks wrong? |
| 10 | EfficientZero V2 EfficientZero V2. Mean HNS 242.8%, Median 128.6% on Atari 100k (26 games, 100k steps). Surpasses BBF. Model-based RL with Gumbel search. arXiv Mar 2024. | paper | 242.8 | 2026 | Source ↗ | Looks wrong? |
| 11 | rainbow-dqn Median HNS. Combines 7 improvements to DQN. | paper | 231 | 2025 | Source ↗ | Looks wrong? |
| 12 | Rainbow DQN Median HNS. Combines 7 improvements to DQN. | paper | 231 | 2025 | Source ↗ | Looks wrong? |
| 13 | BBF (Bigger, Better, Faster) Bigger, Better, Faster (BBF). Mean HNS 224.7% on Atari 100k (26 games, 100k steps). IQM: 104.5%, Median: 91.7%. Value-based RL with scaled networks. ICML 2023. | unverified | 224.7 | 2026 | Source ↗ | Looks wrong? |
| 14 | DIAMOND DIAMOND (Diffusion World Model). Mean HNS 145.9%, IQM 64.1% on Atari 100k (26 games, 100k steps). Best agent trained entirely within a world model. NeurIPS 2024 Spotlight. | unverified | 145.9 | 2026 | Source ↗ | Looks wrong? |
| 15 | STORM STORM (Stochastic Transformer World Models). Mean HNS 126.7% on Atari 100k (26 games, 100k steps). Transformer-based stochastic world model. arXiv Oct 2023. | paper | 126.7 | 2026 | Source ↗ | Looks wrong? |
| 16 | Simulus Simulus. First planning-free world model to reach human-level IQM and median HNS on Atari 100k (26 games, 100k steps). Superhuman on 13/26 games. Combines intrinsic motivation, prioritized replay, regression-as-classification. arXiv Feb 2025. | paper | 110 | 2026 | Source ↗ | Looks wrong? |
| 17 | DART DART (Discrete Abstract Representations for Transformer-based learning). Mean HNS 102.2%, Median 79.0%, IQM 57.5% on Atari 100k (26 games, 100k steps). ICML 2024. | unverified | 102.2 | 2026 | Source ↗ | Looks wrong? |
| 18 | Human Professional Professional human tester baseline. | unverified | 100 | 2025 | Source ↗ | Looks wrong? |
| 19 | human-gamer Professional human tester baseline. | paper | 100 | 2025 | Source ↗ | Looks wrong? |
| 20 | DQN (Human-level) Historical baseline (2015). Median HNS. | paper | 79 | 2025 | Source ↗ | Looks wrong? |
| 21 | dqn Historical baseline (2015). Median HNS. | paper | 79 | 2025 | Source ↗ | Looks wrong? |