Robotics · Grasping & bin-pickingFrom the bin to the beltIssue: May 29, 2026
Grasp detection · suction vs jaw · bin-picking · 2026

Every robot grasping benchmark that matters, in one map — from GraspNet-1Billion to the warehouse belt.

GraspNet-1Billion, Dex-Net, Contact-GraspNet, AnyGrasp — the datasets, success rates, suction-versus-jaw trade-offs, and the bin-picking throughput metrics that decide what actually ships. Tabletop detection is near-solved; reliably clearing a heap of unseen objects is where the field — and industrial robotics — is still being won.

Read the benchmarks Bin-picking metricsEvery number cited · SIM / HW flagged
Interactive · WebGL · a story in three acts

From toil to autonomy.

Parts ride the belt; you sort them by hand — triangles to green, the rogue circle to blue. It is tedious and you start dropping picks. Then press Automate this toil and a multi-joint arm (FABRIK IK — a 6-axis robot, not one pivot) clears the line on its own. That hand-off is the whole story of warehouse AI.

Act I · You are on the line. Sort the parts by hand.
🔺 → green · 🔵 → blue · hover a part
sorted
0
missed
0
toy MPPH
0
Act I · The toil

A human sorts the line by hand. It works — until the parts come faster than two hands can move, and picks start hitting the floor.

Act II · The hand-off

“Automate this toil.” The job is described once, then handed to a policy. No reprogramming per part — it generalises to the rogue circle too.

Act III · Autonomy

GripAI clears the belt tirelessly at a steadier picks-per-hour. The human moves up the stack — from doing the toil to defining it.

Fig · Real-time multi-joint IK + parallel-jaw pick-and-sort on a conveyor. Play by hand, then press “Automate this toil” to watch the policy take over. WebGL (three.js), custom shaders.
Opinion · State of robot grasping, 2026

Grasping is solved on the bench and unsolved in the bin.

Kacper Wikiel · CodeSOTA · 29 May 2026

Read the leaderboards and grasping looks finished. Cornell sits at 99%, Jacquard at 95%, GraspNet has a clean leaderboard with a clear winner. Then you put a robot in front of a tote of mixed retail SKUs and it drops one pick in ten. The gap between those two facts is the entire field, and it is worth being honest about where it actually lives.

First: detection accuracy is a vanity metric. A 99% image-wise score on Cornell means a predicted rectangle overlaps a labeled one. It does not mean a robot lifted anything. A decade of papers optimized a number that never touched a gripper. The metrics that predict a working system — grasp success rate on hardware, declutter rate, and mean picks per hour — are far less flattering, and far more honest.

Suction quietly wins the warehouse. The research glamour goes to dexterous hands; the picks that ship run on a vacuum cup.

Second: the gripper debate is already over. Dexterous five-finger hands dominate the papers and almost none of the deployments. The lesson of Dex-Net 4.0 is blunt: the highest reliability comes from an ambidextrous policy that defaults to suction and falls back to a jaw — 95% reliability at 300 picks per hour. Suction is unglamorous, geometry-tolerant, and fast. In a warehouse, that wins.

Third: the bottleneck moved from grasping to perception. Modern grasp planners are good. What fails is the point cloud feeding them — a single viewpoint occludes most of a pile, glass and metal punch holes straight through the depth map, and clutter turns every scene into a guess. The next ten points of reliability will not come from a better grasp sampler. They will come from better perception: multi-view capture, shape completion, NeRF-based depth for transparent objects.

Fourth, and the one that matters commercially: generalization is the product. A system that hits 95% on a known catalogue is worthless if it needs a week of re-tuning for the next warehouse's SKUs. “Reliable from the first day on unfamiliar items” is not a marketing line — it is the actual unsolved research problem, and the survey literature is converging on the same answer: it is a data and generalization bottleneck, not a grasp-geometry one.

So: grasping is solved on the bench and unsolved in the bin. The interesting work for the next few years is not a new grasp representation. It is making a policy that walks up to a pile of objects it has never seen, in a warehouse it has never been deployed in, and clears it reliably from minute one. Everything else on this page is the scaffolding for that one problem.

Original experiment · run on our own GPU

We measured the clutter gap ourselves.

Rather than only cite the literature, we ran a bin-picking grasp simulation in PyBullet on an RTX 3090 and measured grasp success rate as the bin fills up. The degradation is real and reproducible.

MEASURED GRASP SUCCESS RATE · 1,440 SIMULATED GRASPS0%25%50%75%100%86.7%175.3%264.2%453.3%8objects in bin (clutter) · whiskers = range over 3 seeds

A free-floating parallel-jaw gripper attempts a top-down grasp on a randomly chosen object dropped into a tray. We sweep the number of objects from one to eight and record whether the target is lifted clear — 1,440 grasp attempts in total (3 seeds × 120 trials × 4 clutter levels).

Grasp success rate falls from 86.7% on an isolated object to 53.3% in an eight-object pile — the same packed-versus-pile collapse the literature reports (GIGA, VGN), reproduced here with our own numbers. Clutter, not grasp geometry, is what breaks picking.

setup · PyBullet · floating parallel-jaw · YCB-style + primitive objects
protocol · random target · lift ≥ 10 cm = success · 3 seeds
Honest caveat: this is simulation with a simplified gripper and our own protocol — the absolute numbers are not comparable to any paper. The trend is the result.
Fig B · Measured grasp success rate vs clutter. Mean of 3 seeds; whiskers show the per-seed range. Run on a single RTX 3090 in PyBullet (DIRECT), May 2026.
§ 00 · Primer

What a grasp actually is.

Two representations split the field. Older benchmarks predict a flat grasp rectangle on an image; modern clutter models predict a full 6-DoF gripper pose on a raw point cloud.

PLANAR GRASP · CORNELL / JACQUARD · (x, y, θ, w)θx, yw · gripper opening6-DoF GRASP · GRASPNET / CONTACT-GRASPNET · (3T + 3R)approach vectorpredicted on raw point cloud — no fixed table plane
Fig A · The two dominant grasp representations. Left: the 4-parameter planar grasp rectangle (x, y, θ, width) of Cornell and Jacquard. Right: a 6-DoF grasp pose — three translation, three rotation — regressed directly on the point cloud, as in GraspNet-1Billion and Contact-GraspNet.
Interactive · drag the parameters
grasp = (x, y, θ=-22°, w=150)
Build a grasp

A planar grasp is four numbers on an image. Drag θ and the opening — this is exactly what Cornell- and Jacquard-trained networks predict.

Fig A.1 · Live grasp builder. Switch between the planar and 6-DoF representations and drag the sliders to see exactly what a grasp-detection network has to predict.
§ 01 · Stability

The physics of a hold.

Before any network, a grasp is a mechanics problem: will the object stay in the gripper under gravity and motion? Two analytic ideas underpin almost every benchmark on this page — force closure for fingers, and seal-plus-wrench-resistance for suction.

Learning-based grasping did not replace this physics — it learned to predict it from pixels and points. Understanding the underlying model is what separates tuning a network from diagnosing why it drops a part.

FORCE CLOSURE · PARALLEL-JAWgrasp line ⊂ both friction conescone ±arctan μ⇒ no slip under any wrench
Fig 1.1 · Force closure. A parallel-jaw grasp is stable when the line between the two contact points lies inside both friction cones — then no external wrench can cause slip. The cone half-angle is set by the friction coefficient μ.
seal holding forcem·gτ dynamicSUCTION · SEAL FORMATION + WRENCH RESISTANCEseal quality ∝ surface flatness × (1 − porosity)
Fig 1.2 · Suction as physics. A vacuum grasp must first form a seal on the local surface, then that seal must resist the wrench from gravity and motion. Dex-Net 3.0 modeled both stages analytically; seal quality falls with surface curvature and porosity.
§ 02 · Datasets

Grasp detection, benchmarked.

The datasets the grasping field actually agrees on, in rough order of how much they drive the current frontier. GraspNet-1Billion and the Dex-Net family anchor clutter and industrial picking; Cornell and Jacquard are the saturated tabletop-detection classics.


SIM = simulation result · HW = physical hardware. Image-wise accuracy is detection quality, not real-robot pick success.

Interactive register · May 2026
BenchmarkSourceYearScaleGripperModalityBest-known result
Grasp-Anything Vuong et al., ICRA 202420241M samples · 3M+ objects · text descriptions · foundation-model-generatedParallel-jawRGB + languageLanguage-driven grasp synthesis · open-vocabulary scenes
SuctionNet-1Billion Cao et al., RA-L 20212021190 scenes · 88 objects · 97,280 images · ~1.1B suction annotationsSuctionRGB-DHW: 80.65% grasp success · 100% object clearance (their method)
ACRONYM Eppner et al., ICRA 2021202117.7M grasps · 8,872 objects · 262 categories · FleX physicsParallel-jaw (Franka)Simulation-onlySIM: 59.21% of generated grasps succeed (label generation)
GIGA Jiang et al., RSS 20212021Built on VGN synthetic setup · affordance + implicit geometryParallel-jawTSDF + implicitHW: 83.3% packed · 86.9% pile · SIM: 87.9% / 69.8%
GraspNet-1Billion Fang et al., CVPR 2020202097,280 RGB-D images · 190 cluttered scenes · 88 objects · ~1.1B grasp posesParallel-jawRGB-D · point cloudDe-facto clutter benchmark · AnyGrasp current SOTA (AP)
VGN Breyer et al., CoRL 20202020~2M synthetic grasps · 303 training meshesParallel-jaw (Franka)TSDF (from depth)HW: 80% grasp success · 92% clutter clearance · ~10 ms plan
EGAD! Morrison et al., RA-L 202020202,000+ evolved objects · 49 diverse 3D-printable eval objectsParallel-jawMesh · depthDiagnostic set (geometry × difficulty) · no single SOTA number
Dex-Net 4.0 Mahler et al., Science Robotics 201920195M+ synthetic grasps · 1,664 objects in simulated heapsAmbidextrous (jaw + suction)DepthHW: 95% reliability · 300 MPPH (ABB YuMi)
Dex-Net 3.0 Mahler et al., ICRA 201820182.8M point clouds · 1,500 models · analytic suction-seal labelsSuctionDepth · point cloudHW: 98% basic · 82% typical · 58% adversarial
Jacquard Depierre et al., IROS 2018201850,000+ images · ~11,000 objects · ~1.1M successful graspsParallel-jawRGB-D (synthetic trials)~95% image-wise (GR-ConvNet-class)
Dex-Net 2.0 Mahler et al., RSS 201720176.7M synthetic point clouds + grasps from thousands of 3D modelsParallel-jawDepthHW: 93% on adversarial · 99% precision on 40 novel objects (YuMi)
YCB Object & Model Set Calli et al., IEEE R&A Magazine 2015201577 physical objects + RGB-D scans & meshesObject setRGB-D meshesStandard physical object set — not a scored benchmark
Cornell Grasp Lenz et al., IJRR / RSS 2013–152011–13885 RGB-D images · 240 objects · 8,019 labeled grasp rectanglesParallel-jawRGB-D~99% image-wise accuracy — saturated benchmark
13 of 13 benchmarks · filter by gripper, search by tag, click a column to sort. Every benchmark links to a detail page.
Fig 2 · Grasp-detection register — filter by gripper, search, sort, and click any benchmark for its detail page. Shaded rows mark the three that define the current frontier: GraspNet-1Billion (clutter), Dex-Net 4.0 (ambidextrous industrial), SuctionNet-1Billion (suction).
§ 03 · Models

The grasp, predicted.

From a single depth image or point cloud to a gripper pose. The lineage runs from grasp-quality CNNs (Dex-Net) to dense 6-DoF generators (Contact-GraspNet, AnyGrasp) that clear bins of unseen objects close to human throughput.

Read the lane column carefully: HW is a physical-robot success rate; image-wise is detection accuracy on a labeled dataset and does not imply a real pick.

RGB-D / depthsensorPoint cloud / TSDFrepresentationGrasp samplercandidatesQuality netGQ-CNN scoreRanked 6-DoFbest graspPick & placeexecuteINFERENCE · ONE OBSERVATION → ONE GRASPsynthetic training (ACRONYM · 17.7M grasps) feeds the sampler + quality net
Fig 2.1 · The grasp-detection pipeline. Most modern systems share this inference path; the quality network (e.g. GQ-CNN) is what learns to rank candidates.
ModelSourceInputReported resultLane
AnyGraspFang et al. · IEEE T-RO 2023Point cloud93.3% bin-clearing · >900 MPPH single-armCleared bins of 300+ unseen objects "on par with humans"HW
Contact-GraspNetSundermeyer et al. (NVIDIA) · ICRA 2021Depth · point cloud>90% on unseen objects in structured clutterTrained on ~17M simulated grasps (ACRONYM); ~halves failure rateHW
6-DoF GraspNetMousavian et al. (NVIDIA) · ICCV 2019Depth · point cloud~88% success across varied objectsVAE grasp sampler + learned evaluatorHW
Dex-Net GQ-CNNMahler et al. · RSS 2017Depth93% on known adversarial objectsGrasp-quality CNN trained on 6.7M synthetic graspsHW
GG-CNNMorrison, Corke, Leitner · RSS 2018Depth83% adversarial · 81% in dynamic clutterLightweight, closed-loop up to 50 HzHW
GR-ConvNet v2Kumra et al. · 2020 / 2022RGB-D98.8% Cornell · 95.1% Jacquard · 97.4% GraspNetImage-wise detection accuracy — not real-robot pick successimage-wise
Physical-hardware grasp success · % (varied settings)
AnyGrasp
93.3%
Contact-GraspNet
90%
6-DoF GraspNet
88%
Dex-Net GQ-CNN
93%
GG-CNN
83%
VGN
80%
SuctionNet-1B
80.65%
Fig 3 · Headline physical-hardware grasp-success figures. Each was measured under different objects, clutter, and gripper conditions — read as orders of magnitude, not a ranked head-to-head. Dex-Net GQ-CNN's 93% is on known adversarial objects; AnyGrasp's 93.3% is bin-clearing of 300+ unseen objects.
Dex-Net 2.0GQ-CNN · jaw2017Dex-Net 3.0 · GG-CNNsuction · real-time2018Dex-Net 4.0 · 6-DoF GraspNetambidextrous · VAE2019GraspNet-1B · VGNclutter benchmark2020Contact-GraspNet · GIGAdense 6-DoF2021AnyGrasp93.3% · >900 MPPH2023
Fig 2.2 · Method lineage. The jaw → suction → ambidextrous arc of Dex-Net runs alongside the dense 6-DoF arc from GG-CNN to AnyGrasp.
§ 04 · Grippers

Suction, jaw, or both.

The end-effector decision is the first thing an industrial picking policy makes. Suction is faster on flat packaging; a parallel jaw generalizes across geometry; an ambidextrous system chooses per object to push reliability higher than either alone.

Dex-Net 4.0 made the ambidextrous case quantitatively: 95% reliability at 300 MPPH by learning when to suck and when to pinch.

SUCTIONvacuumair-seal on one flat facePARALLEL-JAWforce closure · antipodal contactsAMBIDEXTROUSpolicysuckpinchchoose per object → 95% · 300 MPPH
Fig 4.1 · The three end-effector strategies. Suction seals a flat face; the parallel jaw closes on antipodal contacts; the ambidextrous policy picks whichever is more reliable per object.
GripperPrincipleBest forFails onBenchmarks
Parallel-jawForm / force closure across two opposing contactsRigid objects of varied geometry · most grasp benchmarksLarge flat faces · heavy smooth surfaces with no graspable edgeGraspNet-1B · Dex-Net 2.0 · Cornell · Jacquard
Suction (vacuum)Air-seal on a single sufficiently flat, non-porous surfaceBoxes · flat packaging · fast top-down picksPorous · perforated · highly curved or deformable surfacesDex-Net 3.0 · SuctionNet-1B
AmbidextrousPolicy chooses jaw or suction per object from depthMixed warehouse SKUs · maximizing reliability across a binAdded mechanical + planning complexityDex-Net 4.0 (95% · 300 MPPH)
Multi-finger / dexterousHigh-DoF hand · in-hand reorientation possibleResearch · tools · complex in-hand manipulationNot yet a warehouse-throughput standard · sim-to-real gapShadow Hand · DexMV · DexArt
Fig 4 · The gripper-choice spectrum. The ambidextrous row is where most warehouse SOTA now sits.
§ 05 · Metrics

What a pick is worth.

A grasp success rate and a throughput number are different currencies. The warehouse cares about picks per hour at a reliability it can trust; the paper usually reports a grasp success rate under controlled conditions.

Commercial single-arm systems publicly cluster in roughly the 300–900 MPPH band; the high end is a research figure and real deployments are item-dependent. No single authoritative cross-vendor SOTA number exists — treat it as a range.

MPPHMean Picks Per Hour

Successful grasps — object actually transported — per hour. The dominant industrial throughput metric.

GSRGrasp Success Rate

Successful grasps ÷ grasp attempts. The headline academic number; says nothing about speed.

DeclutterClearance rate

Fraction of objects in a bin removed before the policy gives up or fails. Exposes the pile-vs-packed gap.

ReliabilitySustained pick reliability

Success held over long autonomous runs, usually paired with a human-intervention rate.

Perceive binDetect graspsScorePickPlacerepeat untilbin emptydeclutter rate · MPPH

Industrial picking is a closed loop, not a single grasp. Throughput (MPPH) is the loop rate; the declutter rate is how much of the bin it clears before giving up. A high grasp success rate that re-perceives slowly still loses on MPPH.Fig 5.1 · The perceive → detect → score → pick → place cycle.

Publicly reported throughput
SystemThroughputReliabilitySource
Dex-Net 4.030095% reliabilityScience Robotics 2019 · HW
AnyGrasp>90093.3% bin clearingIEEE T-RO 2023 · HW, controlled
Covariant (vendor)~515 picks/hr<0.1% orders need humancovariant.ai · vendor claim
Fig 5 · Reported industrial picking figures. Vendor numbers (Covariant) are deployment claims under differing conditions, not peer-reviewed — do not compare directly against the academic rows.
§ 06 · Open problems

Where grasping still breaks.

A near-99% tabletop detection score hides where industrial picking actually fails — clutter, glass, deformables, and the gap between simulation and a real bin.

PACKED · ORDEREDGSR 87.9%PILE · HEAPEDGSR 69.8%−18 ptsclutter & occlusionGIGA · sim · RSS 2021
Fig 6 · The clutter gap, quantified. The same model loses ~18 points of grasp success rate moving from an ordered, packed bin to a heaped pile — occlusion and contact ambiguity are the dominant industrial failure mode.
depth camera■ observed surface▨ occluded — must be inferred
Perception · occlusion

A single view sees only one side

A depth camera mounted over a bin observes only the surfaces facing it. The back and underside of every object — and anything beneath the top layer — is simply missing from the point cloud. The grasp planner is reasoning about a partial, one-sided reconstruction of the scene.

This is why packed-versus-pile success rates diverge so sharply: in a heap, most graspable surface is occluded, and the model must infer geometry it has never measured. Multi-view capture and shape completion help, but every extra view costs cycle time the warehouse counts against MPPH.

Perception · transparency

Glass and metal break the depth sensor

Structured-light and time-of-flight depth sensors assume light reflects diffusely off a surface. Transparent objects let the infrared pattern pass straight through and read the background; specular metal scatters it. Both produce holes and false readings exactly where an object is.

Because the grasp planner never sees valid geometry there, it cannot propose a grasp at all. The research answer is to infer the missing surface: ClearGrasp learns transparent geometry from synthetic data, and Dex-NeRF reconstructs it with a neural radiance field before handing depth back to a standard grasp model.

IR DEPTH · GLASS & SPECULARIRglass → refracts, reads backgroundmetal → specular scatter, noisedepth map□ holes — no valid depthfix: ClearGrasp · Dex-NeRF
SIMULATION · RANDOMIZEDlightingfriction μtexturesobject poserandomizerobustpolicydeployREAL BINthe reality gap: unmodeled friction · sensor noise · deformation · actuator latency
Transfer · sim-to-real

Simulation success is not a hardware number

Almost all grasp data is synthetic, because real labels require real picks. The danger is the reality gap: unmodeled friction, sensor noise, soft deformation, and actuator latency all diverge from the simulator, and a policy that scores perfectly in MuJoCo can fail on the arm.

Domain randomization — training across randomized lighting, textures, friction, and poses — is the standard mitigation, and physics-based labels (ACRONYM) transfer better than analytic ones. But the gap is narrowed, never closed, which is why this page tags every number as SIM or HW.

Mechanics · entanglement

Lifting one item drags another

Real bins contain hangers, cables, and interlocking parts. A grasp can be geometrically perfect and still fail because the target is physically linked to its neighbor — lifting one drags or jams the other, causing a double-pick or a drop mid-transport.

There is no saturated benchmark for this; it is a documented, still-open failure mode. Robust systems detect it after the fact (a weight or vision check on lift) and re-plan — which only works if the perception-to-execution loop is fast enough to retry.

ENTANGLEMENT · MULTI-OBJECTlift one ⇒ the linked item comes too — double-pick or jam
More failure modes
Clutter & occlusion

The core driver of the pile-vs-packed gap. GIGA drops from 87.9% (packed) to 69.8% GSR (pile) in simulation; VGN shows the same collapse. Picking from a heap is a different problem from picking from a surface.

Transparent & reflective

Depth sensors fail on glass and shiny plastic. ClearGrasp (ICRA 2020) infers transparent geometry from 50k+ synthetic frames; Dex-NeRF (CoRL 2021) renders depth via a NeRF density field and feeds Dex-Net; Evo-NeRF (CoRL 2022) grasps them in sequence.

Suction-seal modeling

Suction success is a physics problem: will the seal hold the wrench? Dex-Net 3.0 introduced an analytic quasi-static seal model; SuctionNet-1Billion evaluates seal formation and wrench resistance at billion-scale.

Deformables & entanglement

Cloth, bags, cables, and interlocked items resist rigid grasp models. No saturated benchmark exists; lifting two entangled objects at once is a well-documented, still-open failure mode.

Sim-to-real gap

Most grasp training is synthetic. Domain randomization (Dex-Net 4.0) and physics-based labels (ACRONYM, which transfer better than analytic labels) narrow the gap — they do not close it. A simulation GSR is not a hardware GSR.

Detection ≠ pick success

Cornell and Jacquard 95–99% are image-wise detection scores, not real-robot reliability. Confusing the two is the single most common way grasping numbers get overstated.

§ 07 · Competitions

The picking challenges.

The Amazon Picking / Robotics Challenge (2015–17) defined warehouse bin-picking as a field and seeded much of the talent now in industry. There is no single flagship successor — GraspNet-1Billion and SuctionNet-1Billion now serve as the standardized leaderboards.

YearEventWinnerApproach
2015Amazon Picking ChallengeTeam RBO · TU Berlin148 pts · compliant soft hand + suction · ICRA Seattle
2016Amazon Picking ChallengeTeam Delft · TU DelftWon Pick & Stow · 3D cameras + hybrid suction/two-finger gripper
2017Amazon Robotics ChallengeTeam ACRV · "Cartman"Low-cost (<$24k) Cartesian gantry · rotating suction + jaw · $80k prize
2025Multi-Object Grasping benchmarkChen et al. · arXiv 2503.20820Grasping multiple objects per attempt, in pile and on surface
Fig 6 · The competitions that built industrial bin-picking. Winners converged early on hybrid suction + two-finger grippers — the same ambidextrous logic that now defines warehouse SOTA.
§ 08 · Bottlenecks

Where the research goes next.

The 2025 manipulation survey by Bai et al. frames the open problems as three bottlenecks — collection, utilization, generalization. Grasping sits squarely inside all three.

Data collection

Real grasp data is expensive — every label is a physical pick. The field leans on synthetic generation (ACRONYM, Dex-Net) and foundation-model synthesis (Grasp-Anything), but each trades realism for scale. Closing that trade-off is the central data problem.

Data utilization

Even with billion-scale corpora, models under-use them: most grasp detectors still train per-embodiment and discard cross-task structure. Shared geometry/affordance representations (GIGA) and 3D/implicit inputs are early attempts to extract more signal per sample.

Generalization

A policy that clears one bin distribution often fails on the next — new SKUs, new clutter statistics, transparent or deformable items. Generalization across objects, scenes, and embodiments is the bottleneck that decides whether a system works "from day one" in a new warehouse.

§ 09
Methodology

How we read grasping numbers.

Grasping is the most over-claimed corner of robotics benchmarking, because a single word — “success” — hides four different measurements. We separate them.

First, detection is not picking. Cornell and Jacquard 95–99% are image-wise accuracy: does a predicted rectangle overlap a labeled one. That is not a robot lifting an object, and we never present it as one.

Second, simulation is not hardware. ACRONYM is sim-only; its 59% is a label-quality figure, not a pick rate. Dex-Net, Contact-GraspNet and AnyGrasp headline numbers are physical, and we tag the lane on every row.

Third, success rate is not throughput. A 93% grasp success rate and 900 MPPH answer different questions; a warehouse buys the second at a reliability it can trust. We keep GSR and MPPH in separate columns.

Fourth, vendor claims are flagged. Deployment numbers from commercial picking companies are measured under conditions they choose; we mark them and never rank them against peer-reviewed results.

§ 10 · Related

Read next, around the register.

/robotics

Robotics register

The full robot-learning register — VLA models, simulators, sim-to-real, the IKEA horizon.

/vision

Vision register

Detection, segmentation, depth — the perception stack every grasp model sees through.

/methodology

Methodology

How every number on Codesota is reproduced, dated, and preserved under regression.

/hardware

Hardware register

The GPUs and edge silicon grasp policies run on — from B200 to Jetson Orin.

/browse

Browse benchmarks

The full benchmark catalogue — by area, by modality, by size.

/agentic

Agentic AI

Planning and tool-use benchmarks — the cognitive layer above the motor policy.