Codesota · Benchmark · MuJoCoHome/Leaderboards/Robotics, Control & RL/Continuous Control/MuJoCo
Google DeepMind

MuJoCo.

Physics engine for continuous control tasks like walking, running, and manipulation.

Paper Leaderboard
§ 01 · Leaderboard

Results by metric.

Found a wrong score or missing run?
Use row edits to send a sourced correction into moderation.
Add / edit result Report issue

Average Return

Mean episodic return averaged across HalfCheetah, Hopper, and Walker2d at 1M steps.

Higher is better

Trust tiers for Average Returnverifiedpapervendorcommunityunverified
RankModelTrustScoreYearLinksFix
01TD3
Mean of HalfCheetah-v4 (9583), Hopper-v4 (3134), Walker2d-v4 (4057) at 1M steps. CleanRL verified.
verified55922026Source ↗Looks wrong?
02SAC
Mean of HalfCheetah-v4 (9634), Hopper-v4 (2310), Walker2d-v4 (3591) at 1M steps. CleanRL verified.
verified51792026Source ↗Looks wrong?
03PPO
Mean of HalfCheetah-v4 (1442), Hopper-v4 (2382), Walker2d-v4 (2287) at 1M steps. CleanRL verified.
verified20382026Source ↗Looks wrong?
04TD-MPC2 (317M params)
TD-MPC2, 317M-param shared model. Mean normalized score across 15 DMControl tasks, 1M steps. ICLR 2024.
paper9602026Source ↗Looks wrong?
05TD-MPC2 (19M params)
TD-MPC2, 19M-param shared model. Mean normalized score across 15 DMControl tasks, 1M steps. ICLR 2024.
paper9532026Source ↗Looks wrong?
06FOWM
FOWM (Foundation Online World Models). Mean normalized score, DMControl 15 tasks. NeurIPS 2024.
paper9452026Source ↗Looks wrong?
07BRO
BRO (Best-of-N Robustness RL). Mean normalized score across DMControl tasks. ICML 2024.
paper9412026Source ↗Looks wrong?
08TD-MPC2 (5M params)
TD-MPC2, 5M-param model. Mean normalized score across 15 DMControl tasks, 1M steps. ICLR 2024.
paper9292026Source ↗Looks wrong?
09DreamerV3
DreamerV3. Mean normalized score across 15 DMControl tasks, 1M steps. From TD-MPC2 Table 1 comparison.
paper8972026Source ↗Looks wrong?
10TD-MPC
TD-MPC (original). Mean normalized score across DMControl tasks, 1M steps. ICML 2022 baseline from TD-MPC2 paper.
paper8572026Source ↗Looks wrong?
11DrQ-v2
DrQ-v2, pixel-based. Mean normalized score across 15 DMControl tasks, 1M steps. From TD-MPC2 Table 1.
paper7992026Source ↗Looks wrong?
12SAC (state-based)
SAC (state-based). Mean normalized score across DMControl tasks. Classic baseline from TD-MPC2 Table 1.
paper7772026Source ↗Looks wrong?
§ 04 · Submit a result

Add to the leaderboard.

← Back to Continuous Control