Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Codesota · Reinforcement Learning · Atari Games · Atari 2600Tasks/Reinforcement Learning/Atari Games
Atari Games · benchmark dataset · 2013 · N/A

Arcade Learning Environment (Atari 2600).

Suite of 57 Atari 2600 games. Standard benchmark for deep reinforcement learning agents.

Paper Submit a result
§ 01 · Leaderboard

Best published scores.

12 results indexed across 1 metric. Shaded row marks current SOTA; ties broken by submission date.


Primary
human-normalized-score · higher is better
human-normalized-score· primary
12 rows
#ModelOrgSubmittedPaper / codehuman-normalized-score
01Go-ExploreOSSUber AIDec 2025nature-paper40000
02LBCOSSTsinghua University / Baidusource10078
03Agent57OSSDeepMindDec 2025deepmind-research4731.30
04MEMEOSSGoogle DeepMindsource4087
05Disco57OSSGoogle DeepMindsource1386
06BBOS-1OSSDec 20251100
07GDI-H3OSSResearchDec 2025950
08DreamerV3OSSGoogle DeepMindDec 2025arxiv-paper840
09MuZeroOSSDeepMindDec 2025nature-paper731
10Rainbow DQNOSSDeepMindDec 2025aaai-paper231
11Human ProfessionalBiologyDec 2025baseline100
12DQN (Human-level)OSSDeepMindDec 2025nature-paper79
Fig 2 · Rows sorted by score within each metric. Shaded row marks SOTA. Dates reflect model or paper release where available, otherwise the date Codesota accessed the source.
§ 03 · Progress

1 steps
of state of the art.

Each row below marks a model that broke the previous record on human-normalized-score. Intermediate submissions are kept in the leaderboard above; only SOTA-setting entries are re-listed here.

Higher scores win. Each subsequent entry improved upon the previous best.

SOTA line · human-normalized-score
  1. Dec 18, 2025Go-ExploreUber AI40000
Fig 3 · SOTA-setting models only. 1 entries span Dec 2025 Dec 2025.
§ 06 · Contribute

Have a score that beats
this table?

Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.

Submit a result Read submission guide
What a submission needs
  • 01A public checkpoint or API endpoint
  • 02A reproduction script with frozen commit + seed
  • 03Declared evaluation environment (Python, deps)
  • 04One row per metric declared by this dataset
  • 05A contact so we can follow up on discrepancies