Home/Browse/Reinforcement Learning/Atari Games/Atari 2600

Atari 2600

Name: Atari 2600 Benchmark Results
Creator: Farama Foundation / DeepMind
License: https://creativecommons.org/licenses/by/4.0/

Farama Foundation / DeepMind

Suite of 57 Atari 2600 games. Standard benchmark for deep reinforcement learning agents.

Paper Leaderboard

Benchmark Stats

Models16

Papers16

Metrics1

SOTA History

Mean Human-Normalized Score

Mean HNS across games. Human baseline = 100. Scores >100 exceed average human performance.

Higher is better

Rank	Model	Source	Score	Year	Paper
1	go-explore Exploration-focused agent. Score is Mean HNS (skewed by Montezuma's Revenge), not Median.	Editorial	40000	2025	Source
2	agent57 Median HNS across 57 games. First to beat human baseline on ALL games.	Editorial	4731.3	2025	Source
3	MEME Mean HNS at 1B frames on Atari 57 (human=100). 95% CI: 3723–4445. Reaches human-level on all 57 games within 390M frames.	Community	4087	2026	Source
4	bbos-1 Model-based optimization.	Editorial	1100	2025	—
5	gdi-h3 High sample efficiency.	Editorial	950	2025	—
6	dreamerv3 Mastered Atari with fixed hyperparameters using world models.	Editorial	840	2025	Source
7	muzero Model-based agent planning with learned model.	Editorial	731	2025	Source
8	EfficientZero V2 EfficientZero V2. Mean HNS 242.8%, Median 128.6% on Atari 100k (26 games, 100k steps). Surpasses BBF. Model-based RL with Gumbel search. arXiv Mar 2024.	Community	242.8	2026	Source
9	rainbow-dqn Median HNS. Combines 7 improvements to DQN.	Editorial	231	2025	Source
10	BBF (Bigger, Better, Faster) Bigger, Better, Faster (BBF). Mean HNS 224.7% on Atari 100k (26 games, 100k steps). IQM: 104.5%, Median: 91.7%. Value-based RL with scaled networks. ICML 2023.	Community	224.7	2026	Source
11	DIAMOND DIAMOND (Diffusion World Model). Mean HNS 145.9%, IQM 64.1% on Atari 100k (26 games, 100k steps). Best agent trained entirely within a world model. NeurIPS 2024 Spotlight.	Community	145.9	2026	Source
12	STORM STORM (Stochastic Transformer World Models). Mean HNS 126.7% on Atari 100k (26 games, 100k steps). Transformer-based stochastic world model. arXiv Oct 2023.	Community	126.7	2026	Source
13	Simulus Simulus. First planning-free world model to reach human-level IQM and median HNS on Atari 100k (26 games, 100k steps). Superhuman on 13/26 games. Combines intrinsic motivation, prioritized replay, regression-as-classification. arXiv Feb 2025.	Community	110	2026	Source
14	DART DART (Discrete Abstract Representations for Transformer-based learning). Mean HNS 102.2%, Median 79.0%, IQM 57.5% on Atari 100k (26 games, 100k steps). ICML 2024.	Community	102.2	2026	Source
15	human-gamer Professional human tester baseline.	Editorial	100	2025	Source
16	dqn Historical baseline (2015). Median HNS.	Editorial	79	2025	Source

Submit a Result

Back to Atari Games