Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Tasks · Atari GamesHome/Tasks/Reinforcement Learning/Atari Games
Reinforcement Learning· reinforcement-learning

Atari Games.

Atari games became the canonical RL benchmark when DeepMind's DQN (2013) learned to play Breakout from raw pixels, but the goalposts keep moving. Agent57 (2020) was the first to achieve superhuman scores on all 57 games, and recent work like BBF and MEME shows that sample efficiency — not just final performance — is the new frontier. The benchmark's age is both its strength (decades of comparable results) and weakness (it doesn't capture the open-ended reasoning modern RL needs).

1
Datasets
12
Results
human-normalized-score
Canonical metric
§ 02 · Canonical benchmark

The reference dataset.

Atari 2600

Suite of 57 Atari 2600 games. Standard benchmark for deep reinforcement learning agents.

Primary metric: human-normalized-score
View full leaderboard →
§ 03 · Top 10

Leading models.

Leading models on Atari 2600.

#Modelhuman-normalized-scoreYearSource
Go-Explore400002025paper ↗
2LBC10078paper ↗
3Agent5747312025paper ↗
4MEME4087paper ↗
5Disco571386paper ↗
6BBOS-111002025
7GDI-H39502025
8DreamerV38402025paper ↗
9MuZero7312025paper ↗
10Rainbow DQN2312025paper ↗

What were you looking for on Atari Games?

Didn't find the model, metric, or dataset you needed? Tell us in one line. We read every message and reply within 48 hours.

§ 04 · All datasets

Tracked datasets.

1 dataset tracked for this task.

Atari 2600
CANONICAL
12 results · human-normalized-score
Top: Go-Explore 40000
§ 05 · Related tasks

Other tasks in Reinforcement Learning.

Continuous ControlOffline RL
Reply within 48 hours · No newsletter

Didn't find what you came for?

Still looking for something on Atari Games? A missing model, a stale score, a benchmark we should cover — drop it here and we'll handle it.

Real humans read every message. We track what people are asking for and prioritize accordingly.