Recent studyBlind TTS Elo is live. Compare two anonymous voice samples, vote after listening, and help separate real preference signal from noise.Vote in the study ->
Robots

Robot Manipulation

Robot manipulation — grasping, placing, and using tools — is where sim-to-real and foundation models meet physical dexterity. DexNet (2017) pioneered data-driven grasp planning, but the field accelerated when contact-rich manipulation was tackled with RL in simulation (DexterousHands, 2023) and then transferred to real hardware. Current state-of-the-art combines diffusion policies (Chi et al., 2023) with large pretrained vision encoders to achieve robust 6-DOF manipulation from a handful of demonstrations, though deformable objects and multi-step assembly remain unsolved.

1 datasets5 resultsView full task mapping →

Robot manipulation focuses on grasping, moving, and assembling objects with robotic grippers and hands. Diffusion policies and vision-language-action models have dramatically improved generalization, but dexterous multi-finger manipulation and deformable object handling remain open challenges.

History

2016

Levine et al. scale robotic grasping to 800K attempts across 14 robots

2017

Dex-Net 2.0 plans grasps from depth images using a GQ-CNN

2018

QT-Opt learns vision-based grasping from 580K real-world grasps at 96% success

2019

OpenAI Dactyl solves Rubik's Cube with dexterous in-hand manipulation

2022

Transporter Networks learn pick-and-place from few demonstrations

2023

Diffusion Policy models multi-modal action distributions for complex manipulation

2023

RT-2 demonstrates language-conditioned manipulation via VLM backbone

2024

π0 achieves cross-task manipulation transfer from folding to packing to cleaning

2024

DexCap enables learning dexterous hand manipulation from human hand capture data

2025

Multi-finger dexterous manipulation with tactile sensing reaches practical reliability

How Robot Manipulation Works

1Scene PerceptionRGB-D cameras observe the w…2Grasp PlanningThe system selects a grasp …3Motion PlanningA collision-free trajectory…4Closed-Loop ExecutionDuring execution5Skill CompositionComplex manipulation tasks …Robot Manipulation Pipeline
1

Scene Perception

RGB-D cameras observe the workspace, and the robot segments objects, estimates poses, and identifies grasp candidates.

2

Grasp Planning

The system selects a grasp configuration — where and how to grip the object — based on geometry, physics, and task requirements.

3

Motion Planning

A collision-free trajectory is computed from the current configuration to the grasp pose and then to the placement target.

4

Closed-Loop Execution

During execution, force/torque and visual feedback enable real-time adjustments for robust grasping and placement.

5

Skill Composition

Complex manipulation tasks chain multiple primitive skills: reach, grasp, lift, transport, orient, insert, release.

Current Landscape

Robot manipulation in 2025 has advanced dramatically through two paradigm shifts: (1) diffusion policies that model multi-modal action distributions for complex contact-rich tasks, and (2) vision-language-action models that enable language-conditioned manipulation. Simple pick-and-place is commercially deployed (Amazon, logistics), while research pushes toward dexterous multi-finger manipulation, deformable object handling, and tool use. The data bottleneck is being addressed through teleoperation (ALOHA) and simulation (Isaac Gym).

Key Challenges

Deformable objects (cloth, rope, food) have infinite-dimensional state spaces that resist standard planning

Dexterous manipulation with multi-finger hands requires controlling 20+ degrees of freedom simultaneously

Tool use — using objects as tools (spatulas, screwdrivers) requires understanding physics beyond contact

Tactile sensing integration — combining vision and touch for reliable manipulation in occlusion

Long-horizon assembly — multi-step assembly tasks with tight tolerances remain extremely challenging

Quick Recommendations

General manipulation research

Diffusion Policy + 6-DOF robot

Best framework for learning multi-modal manipulation from demonstrations

Language-conditioned manipulation

RT-2 / Octo

Map natural language instructions to manipulation actions

Dexterous hand manipulation

DexCap + Isaac Gym sim-to-real

State-of-the-art pipeline for learning hand manipulation skills

Industrial pick-and-place

Dex-Net 4.0 / Contact-GraspNet

Proven in production for bin-picking applications

What's Next

The frontier is reliable dexterous manipulation in unstructured environments — folding laundry, cooking meals, assembling furniture. Key advances needed: (1) better tactile sensing integration, (2) manipulation foundation models trained on diverse cross-task data, (3) real-time adaptation to novel objects through in-context learning.

Benchmarks & SOTA

Related Tasks

Get notified when these results update

New models drop weekly. We track them so you don't have to.

Something wrong or missing?

Help keep Robot Manipulation benchmarks accurate. Report outdated results, missing benchmarks, or errors.

0/2000