Who leads the MuJoCo benchmark?

TD3 currently leads MuJoCo with a score of 5592 on Average Return.

What is the state-of-the-art score on MuJoCo?

The state-of-the-art result on MuJoCo is 5592 (Average Return), achieved by TD3 as of 2026.

How many models are tracked on MuJoCo?

Codesota tracks 12 models on MuJoCo.

When was the MuJoCo leaderboard last updated?

The MuJoCo leaderboard on Codesota includes results through 2026.

Codesota · Benchmark · MuJoCoHome/Leaderboards/Robotics, Control & RL/Continuous Control/MuJoCo

Google DeepMind

MuJoCo.

Name: MuJoCo Benchmark Results
Creator: Google DeepMind
Published: 2026-01-01
License: https://creativecommons.org/licenses/by/4.0/

Physics engine for continuous control tasks like walking, running, and manipulation.

Paper ↗Leaderboard ↓

§ 01 · Leaderboard

Results by metric.

Found a wrong score or missing run?

Use row edits to send a sourced correction into moderation.

Add / edit result ↗Report issue ↗

Average Return

Mean episodic return averaged across HalfCheetah, Hopper, and Walker2d at 1M steps.

Higher is better

Trust tiers for Average Returnverifiedpapervendorcommunityunverified

Muted rows were not state of the art when published — an earlier or same-year result already scored better.

Rank	Model	Trust	Score	Year	Links	Fix
01	TD3 Mean of HalfCheetah-v4 (9583), Hopper-v4 (3134), Walker2d-v4 (4057) at 1M steps. CleanRL verified.	verified	5592	2026	Source ↗	Looks wrong?
02	SAC Mean of HalfCheetah-v4 (9634), Hopper-v4 (2310), Walker2d-v4 (3591) at 1M steps. CleanRL verified.	verified	5179	2026	Source ↗	Looks wrong?
03	PPO Mean of HalfCheetah-v4 (1442), Hopper-v4 (2382), Walker2d-v4 (2287) at 1M steps. CleanRL verified.	verified	2038	2026	Source ↗	Looks wrong?
04	TD-MPC2 (317M params) TD-MPC2, 317M-param shared model. Mean normalized score across 15 DMControl tasks, 1M steps. ICLR 2024.	paper	960	2026	Source ↗	Looks wrong?
05	TD-MPC2 (19M params) TD-MPC2, 19M-param shared model. Mean normalized score across 15 DMControl tasks, 1M steps. ICLR 2024.	paper	953	2026	Source ↗	Looks wrong?
06	FOWM FOWM (Foundation Online World Models). Mean normalized score, DMControl 15 tasks. NeurIPS 2024.	paper	945	2026	Source ↗	Looks wrong?
07	BRO BRO (Best-of-N Robustness RL). Mean normalized score across DMControl tasks. ICML 2024.	paper	941	2026	Source ↗	Looks wrong?
08	TD-MPC2 (5M params) TD-MPC2, 5M-param model. Mean normalized score across 15 DMControl tasks, 1M steps. ICLR 2024.	paper	929	2026	Source ↗	Looks wrong?
09	DreamerV3 DreamerV3. Mean normalized score across 15 DMControl tasks, 1M steps. From TD-MPC2 Table 1 comparison.	paper	897	2026	Source ↗	Looks wrong?
10	TD-MPC TD-MPC (original). Mean normalized score across DMControl tasks, 1M steps. ICML 2022 baseline from TD-MPC2 paper.	paper	857	2026	Source ↗	Looks wrong?
11	DrQ-v2 DrQ-v2, pixel-based. Mean normalized score across 15 DMControl tasks, 1M steps. From TD-MPC2 Table 1.	paper	799	2026	Source ↗	Looks wrong?
12	SAC (state-based) SAC (state-based). Mean normalized score across DMControl tasks. Classic baseline from TD-MPC2 Table 1.	paper	777	2026	Source ↗	Looks wrong?

§ 04 · Submit a result

Add to the leaderboard.

← Back to Continuous Control