Reinforcement Learningreinforcement-learning

Atari Games

Atari games became the canonical RL benchmark when DeepMind's DQN (2013) learned to play Breakout from raw pixels, but the goalposts keep moving. Agent57 (2020) was the first to achieve superhuman scores on all 57 games, and recent work like BBF and MEME shows that sample efficiency — not just final performance — is the new frontier. The benchmark's age is both its strength (decades of comparable results) and weakness (it doesn't capture the open-ended reasoning modern RL needs).

Datasets

Results

human-normalized-score

Canonical metric

Canonical Benchmark

Atari 2600

Suite of 57 Atari 2600 games. Standard benchmark for deep reinforcement learning agents.

Primary metric: human-normalized-score

View full leaderboard

Top 10

Leading models on Atari 2600.

Rank	Model	human-normalized-score	Year	Source
1	go-explore	40000	2025	paper
2	agent57	4731	2025	paper
3	MEME	4087	2026	paper
4	bbos-1	1100	2025	—
5	gdi-h3	950	2025	—
6	dreamerv3	840	2025	paper
7	muzero	731	2025	paper
8	EfficientZero V2	243	2026	paper
9	rainbow-dqn	231	2025	paper
10	BBF (Bigger, Better, Faster)	225	2026	paper

What were you looking for on Atari Games?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

All datasets

1 dataset tracked for this task.

Atari 2600

CANONICAL

16results·human-normalized-score

Top: go-explore — 40000

Related tasks

Other tasks in Reinforcement Learning.

Continuous Control Offline RL

Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Atari Games? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.