Atari Games2013n/a

Arcade Learning Environment (Atari 2600)

Suite of 57 Atari 2600 games. Standard benchmark for deep reinforcement learning agents.

Metrics:human-normalized-score
Paper / Website
Current State of the Art

Go-Explore

Uber AI

40000

human-normalized-score

human-normalized-score Progress Over Time

Showing 4 breakthroughs from Jul 2012 to Dec 2025

-3890.0008080.020050.032020.043990.0Jul 2012Dec 2016Jun 2021Dec 2025human-normalized-scoreDate

Key Milestones

Jul 2012
Human Professional

Professional human tester baseline.

100.0
Oct 2017
Rainbow DQN

Median HNS. Combines 7 improvements to DQN.

231.0
+131.0%
Mar 2020
Agent57

Median HNS across 57 games. First to beat human baseline on ALL games.

4731.3
+1948.2%
Dec 2025
Go-ExploreCurrent SOTA

Exploration-focused agent. Score is Mean HNS (skewed by Montezuma's Revenge), not Median.

40000.0
+745.4%
Total Improvement
39900.0%
Time Span
13y 8m
Breakthroughs
4
Current SOTA
40000.0

Top Models Performance Comparison

Top 9 models ranked by human-normalized-score

human-normalized-score1Go-Explore40000100.0%2Agent57473111.8%3BBOS-111002.8%4GDI-H3950.02.4%5DreamerV3840.02.1%6MuZero731.01.8%7Rainbow DQN231.00.6%8Human Professional100.00.3%9DQN (Human-level)79.00.2%0%25%50%75%100%% of best
Best Score
40000
Top Model
Go-Explore
Models Compared
9
Score Range
39921

human-normalized-scorePrimary

#ModelScorePaper / CodeDate
1
Go-ExploreOpen Source
Uber AI
40000Dec 2025
2
Agent57Open Source
DeepMind
4731.3Dec 2025
3
BBOS-1Open Source
1100
Dec 2025
4
GDI-H3Open Source
Research
950
Dec 2025
5
DreamerV3Open Source
DeepMind
840Dec 2025
6
MuZeroOpen Source
DeepMind
731Dec 2025
7
Rainbow DQNOpen Source
DeepMind
231Dec 2025
8
Human Professional
Biology
100Dec 2025
9
DQN (Human-level)Open Source
DeepMind
79Dec 2025