Robotics
Robotics is an interdisciplinary field of study involving computer science, engineering, and technology to design, construct, operate, and utilize machines known as robots. These programmable machines are built to replicate, substitute, or assist in human actions, performing a vast array of tasks in industries from manufacturing and healthcare to exploration and entertainment.
General-purpose robotics combines perception, planning, and control to build machines that manipulate objects and navigate the physical world. Foundation models (RT-2, π0) are transforming the field by enabling language-conditioned robot behavior learned from internet-scale data combined with robot demonstrations.
History
Levine et al. demonstrate large-scale robotic grasping with deep learning (800K grasps)
OpenAI Dactyl solves Rubik's Cube with a robot hand using sim-to-real transfer
RoboNet provides diverse multi-robot video data for learning visual dynamics
SayCan (Google) grounds language models in robot affordances for task planning
Inner Monologue uses LLM reasoning for closed-loop robotic task execution
RT-2 (Robotic Transformer 2) directly maps vision and language to robot actions using a VLM backbone
Mobile ALOHA enables low-cost bimanual mobile manipulation with teleoperation learning
Physical Intelligence π0 — foundation model for robotics trained on diverse manipulation data
Figure 01 and Tesla Optimus demonstrate humanoid robots performing warehouse tasks
Open X-Embodiment dataset enables cross-robot transfer learning at scale
How Robotics Works
Perception
Cameras, depth sensors, and proprioception provide the robot's understanding of the scene — object positions, shapes, and spatial relationships.
Task Specification
The robot receives a task via natural language ('pick up the red cup'), goal images, or learned reward functions.
Planning
High-level planning decomposes the task into subtasks; low-level planning computes motion trajectories that avoid collisions and respect physical constraints.
Control Execution
Joint torques or position commands are sent to actuators, with real-time feedback correction for disturbances.
Learning and Adaptation
The robot improves through demonstration data, simulation experience, and real-world trial-and-error, building generalizable manipulation skills.
Current Landscape
Robotics in 2025 is being revolutionized by foundation models — large networks pretrained on internet data and fine-tuned on robot demonstrations. RT-2 showed that VLMs can directly output robot actions, and π0 demonstrated cross-task generalization. The hardware landscape is diversifying from industrial arms to humanoids (Figure, Tesla), low-cost bimanual systems (ALOHA), and mobile manipulators. Data remains the bottleneck: Open X-Embodiment and similar initiatives are trying to create the 'ImageNet moment' for robotics.
Key Challenges
Data scarcity — robot interaction data is 1000x harder to collect than internet text or images
Sim-to-real gap — policies trained in simulation often fail on real hardware due to unmodeled dynamics
Generalization — handling novel objects, lighting, and environments remains extremely difficult
Safety — robots operating near humans must be provably safe, adding hard constraints on learned policies
Hardware cost — research-grade robot arms cost $20K-100K, limiting accessibility
Quick Recommendations
Research platform
Mobile ALOHA / low-cost bimanual setup
Best cost-performance ratio for manipulation research
Language-conditioned manipulation
RT-2 / Octo
State-of-the-art in mapping language instructions to robot actions
Foundation model approach
π0 (Physical Intelligence)
Most general-purpose robot foundation model as of 2025
Simulation development
Isaac Sim + MuJoCo
Best combination of speed (MuJoCo) and realism (Isaac Sim) for sim-to-real pipelines
What's Next
The frontier is general-purpose household robots — systems that can handle diverse manipulation tasks in unstructured environments with minimal task-specific training. Key enablers: (1) larger robot foundation models trained on cross-embodiment data, (2) fast sim-to-real transfer for new tasks, (3) natural language interfaces for non-expert users.
Benchmarks & SOTA
LIBERO
LIBERO is a benchmark for knowledge transfer in lifelong robot learning. It provides high-quality human teleoperation demonstrations for four task suites. This benchmark aims to serve as a common ground for the machine learning and robotics communities to develop and evaluate new lifelong learning algorithms.
No results tracked yet
CALVIN ABCD to D
CALVIN (Composing Actions from Language and Vision)
CALVIN (Composing Actions from Language and Vision) is an open-source simulated benchmark for long-horizon, language-conditioned robot manipulation. It provides a multi-environment manipulation suite designed to evaluate agents that must follow natural language instructions and compose many short skills into longer instruction chains. The benchmark contains four distinct simulated manipulation environments (denoted A, B, C, D), a set of 34 base tasks, and about 1,000 language instructions (including instruction chains). Tasks are designed to require long-horizon composition (instruction chains up to length 5). Observations support flexible sensor suites (e.g., static RGB, gripper-mounted RGB, proprioceptive/gripper state), and the action interface used in published baselines includes delta-action / continuous control variants. Standard evaluation in the paper uses 500 rollouts and reports metrics such as the average length of successfully completed subtasks (max value 5) and variants of multi-task / long-horizon task completion (MTLC / LH-MTLC). The benchmark and code are provided by the authors (GitHub and project website) and the accompanying paper is available on arXiv and was published as an IEEE Robotics and Automation Letters submission.
No results tracked yet
SimplerEnv WidowX
SIMPLER / SimplerEnv is an open-source collection of simulated manipulation evaluation environments and a workflow for creating new real-to-sim evaluation suites intended to assess generalist robot manipulation policies. The WidowX environment matches common real-robot setups (WidowX+Bridge) and exposes controlled distribution shifts in lighting, textures, colors, and camera viewpoints.
No results tracked yet
RLBench
Large-scale robot learning benchmark with 100 manipulation tasks
No results tracked yet
SIMPLER
Simulated manipulation benchmark for evaluating robot learning policies
No results tracked yet
Related Tasks
Get notified when these results update
New models drop weekly. We track them so you don't have to.
Something wrong or missing?
Help keep Robotics benchmarks accurate. Report outdated results, missing benchmarks, or errors.