Otherrobotics

Robotics

Robotics is an interdisciplinary field of study involving computer science, engineering, and technology to design, construct, operate, and utilize machines known as robots. These programmable machines are built to replicate, substitute, or assist in human actions, performing a vast array of tasks in industries from manufacturing and healthcare to exploration and entertainment.

5 datasets0 resultsView full task mapping →

General-purpose robotics combines perception, planning, and control to build machines that manipulate objects and navigate the physical world. Foundation models (RT-2, π0) are transforming the field by enabling language-conditioned robot behavior learned from internet-scale data combined with robot demonstrations.

History

2016

Levine et al. demonstrate large-scale robotic grasping with deep learning (800K grasps)

2018

OpenAI Dactyl solves Rubik's Cube with a robot hand using sim-to-real transfer

2019

RoboNet provides diverse multi-robot video data for learning visual dynamics

2022

SayCan (Google) grounds language models in robot affordances for task planning

2022

Inner Monologue uses LLM reasoning for closed-loop robotic task execution

2023

RT-2 (Robotic Transformer 2) directly maps vision and language to robot actions using a VLM backbone

2023

Mobile ALOHA enables low-cost bimanual mobile manipulation with teleoperation learning

2024

Physical Intelligence π0 — foundation model for robotics trained on diverse manipulation data

2024

Figure 01 and Tesla Optimus demonstrate humanoid robots performing warehouse tasks

2025

Open X-Embodiment dataset enables cross-robot transfer learning at scale

How Robotics Works

Perception

Cameras, depth sensors, and proprioception provide the robot's understanding of the scene — object positions, shapes, and spatial relationships.

Task Specification

The robot receives a task via natural language ('pick up the red cup'), goal images, or learned reward functions.

Planning

High-level planning decomposes the task into subtasks; low-level planning computes motion trajectories that avoid collisions and respect physical constraints.

Control Execution

Joint torques or position commands are sent to actuators, with real-time feedback correction for disturbances.

Learning and Adaptation

The robot improves through demonstration data, simulation experience, and real-world trial-and-error, building generalizable manipulation skills.

Current Landscape

Robotics in 2025 is being revolutionized by foundation models — large networks pretrained on internet data and fine-tuned on robot demonstrations. RT-2 showed that VLMs can directly output robot actions, and π0 demonstrated cross-task generalization. The hardware landscape is diversifying from industrial arms to humanoids (Figure, Tesla), low-cost bimanual systems (ALOHA), and mobile manipulators. Data remains the bottleneck: Open X-Embodiment and similar initiatives are trying to create the 'ImageNet moment' for robotics.

Key Challenges

Data scarcity — robot interaction data is 1000x harder to collect than internet text or images

Sim-to-real gap — policies trained in simulation often fail on real hardware due to unmodeled dynamics

Generalization — handling novel objects, lighting, and environments remains extremely difficult

Safety — robots operating near humans must be provably safe, adding hard constraints on learned policies

Hardware cost — research-grade robot arms cost $20K-100K, limiting accessibility

Quick Recommendations

Research platform

Mobile ALOHA / low-cost bimanual setup

Best cost-performance ratio for manipulation research

Language-conditioned manipulation

RT-2 / Octo

State-of-the-art in mapping language instructions to robot actions

Foundation model approach

π0 (Physical Intelligence)

Most general-purpose robot foundation model as of 2025

Simulation development

Isaac Sim + MuJoCo

Best combination of speed (MuJoCo) and realism (Isaac Sim) for sim-to-real pipelines

What's Next

The frontier is general-purpose household robots — systems that can handle diverse manipulation tasks in unstructured environments with minimal task-specific training. Key enablers: (1) larger robot foundation models trained on cross-embodiment data, (2) fast sim-to-real transfer for new tasks, (3) natural language interfaces for non-expert users.

Benchmarks & SOTA

LIBERO

0 results

LIBERO is a benchmark for knowledge transfer in lifelong robot learning. It provides high-quality human teleoperation demonstrations for four task suites. This benchmark aims to serve as a common ground for the machine learning and robotics communities to develop and evaluate new lifelong learning algorithms.

No results tracked yet

CALVIN ABCD to D

CALVIN (Composing Actions from Language and Vision)

0 results

CALVIN (Composing Actions from Language and Vision) is an open-source simulated benchmark for long-horizon, language-conditioned robot manipulation. It provides a multi-environment manipulation suite designed to evaluate agents that must follow natural language instructions and compose many short skills into longer instruction chains. The benchmark contains four distinct simulated manipulation environments (denoted A, B, C, D), a set of 34 base tasks, and about 1,000 language instructions (including instruction chains). Tasks are designed to require long-horizon composition (instruction chains up to length 5). Observations support flexible sensor suites (e.g., static RGB, gripper-mounted RGB, proprioceptive/gripper state), and the action interface used in published baselines includes delta-action / continuous control variants. Standard evaluation in the paper uses 500 rollouts and reports metrics such as the average length of successfully completed subtasks (max value 5) and variants of multi-task / long-horizon task completion (MTLC / LH-MTLC). The benchmark and code are provided by the authors (GitHub and project website) and the accompanying paper is available on arXiv and was published as an IEEE Robotics and Automation Letters submission.

No results tracked yet

SimplerEnv WidowX

0 results

SIMPLER / SimplerEnv is an open-source collection of simulated manipulation evaluation environments and a workflow for creating new real-to-sim evaluation suites intended to assess generalist robot manipulation policies. The WidowX environment matches common real-robot setups (WidowX+Bridge) and exposes controlled distribution shifts in lighting, textures, colors, and camera viewpoints.

No results tracked yet

RLBench

20200 results

Large-scale robot learning benchmark with 100 manipulation tasks

No results tracked yet

SIMPLER

20240 results

Simulated manipulation benchmark for evaluating robot learning policies

No results tracked yet

Related Tasks

Other

Other tasks that don't fit into specific categories

Get notified when these results update

New models drop weekly. We track them so you don't have to.

Something wrong or missing?

Help keep Robotics benchmarks accurate. Report outdated results, missing benchmarks, or errors.

Back to Other