| 01 | TD3 Mean of HalfCheetah-v4 (9583), Hopper-v4 (3134), Walker2d-v4 (4057) at 1M steps. CleanRL verified. | verified | 5592 | 2026 | Source ↗ | Looks wrong? |
| 02 | SAC Mean of HalfCheetah-v4 (9634), Hopper-v4 (2310), Walker2d-v4 (3591) at 1M steps. CleanRL verified. | verified | 5179 | 2026 | Source ↗ | Looks wrong? |
| 03 | PPO Mean of HalfCheetah-v4 (1442), Hopper-v4 (2382), Walker2d-v4 (2287) at 1M steps. CleanRL verified. | verified | 2038 | 2026 | Source ↗ | Looks wrong? |
| 04 | TD-MPC2 (317M params) TD-MPC2, 317M-param shared model. Mean normalized score across 15 DMControl tasks, 1M steps. ICLR 2024. | paper | 960 | 2026 | Source ↗ | Looks wrong? |
| 05 | TD-MPC2 (19M params) TD-MPC2, 19M-param shared model. Mean normalized score across 15 DMControl tasks, 1M steps. ICLR 2024. | paper | 953 | 2026 | Source ↗ | Looks wrong? |
| 06 | FOWM FOWM (Foundation Online World Models). Mean normalized score, DMControl 15 tasks. NeurIPS 2024. | paper | 945 | 2026 | Source ↗ | Looks wrong? |
| 07 | BRO BRO (Best-of-N Robustness RL). Mean normalized score across DMControl tasks. ICML 2024. | paper | 941 | 2026 | Source ↗ | Looks wrong? |
| 08 | TD-MPC2 (5M params) TD-MPC2, 5M-param model. Mean normalized score across 15 DMControl tasks, 1M steps. ICLR 2024. | paper | 929 | 2026 | Source ↗ | Looks wrong? |
| 09 | DreamerV3 DreamerV3. Mean normalized score across 15 DMControl tasks, 1M steps. From TD-MPC2 Table 1 comparison. | paper | 897 | 2026 | Source ↗ | Looks wrong? |
| 10 | TD-MPC TD-MPC (original). Mean normalized score across DMControl tasks, 1M steps. ICML 2022 baseline from TD-MPC2 paper. | paper | 857 | 2026 | Source ↗ | Looks wrong? |
| 11 | DrQ-v2 DrQ-v2, pixel-based. Mean normalized score across 15 DMControl tasks, 1M steps. From TD-MPC2 Table 1. | paper | 799 | 2026 | Source ↗ | Looks wrong? |
| 12 | SAC (state-based) SAC (state-based). Mean normalized score across DMControl tasks. Classic baseline from TD-MPC2 Table 1. | paper | 777 | 2026 | Source ↗ | Looks wrong? |