Robots

Robot Navigation

Autonomous navigation — moving through unstructured environments while avoiding obstacles — spans indoor service robots to outdoor last-mile delivery. Classical SLAM (simultaneous localization and mapping) methods like ORB-SLAM still dominate mapping, but end-to-end learning approaches using habitat simulators (Habitat 2.0, iGibson) show promise for semantic navigation ("go to the kitchen"). The Habitat Challenge results reveal that modular pipelines (map → plan → act) consistently beat monolithic learned policies, suggesting that full end-to-end navigation is still years away from displacing classical stacks in production.

0 datasets0 resultsView full task mapping →

Robot navigation enables autonomous movement through environments — from indoor rooms to outdoor terrain — using SLAM, path planning, and learned policies. Vision-language navigation (VLN) is the hot frontier, where robots follow natural language instructions to reach goals in novel environments.

History

2016

End-to-end learning for self-driving (Bojarski et al., NVIDIA) maps pixels to steering

2018

Room-to-Room (R2R) dataset establishes vision-language navigation benchmark

2019

Habitat simulator enables fast training of navigation agents in photorealistic 3D scans

2020

PointNav is largely solved in simulation (~99% SPL) but struggles in real environments

2021

REVERIE and SOON extend VLN to object-finding with fine-grained language goals

2022

CoW (CLIP on Wheels) uses CLIP for zero-shot object goal navigation

2023

SayNav and LM-Nav use LLMs for semantic reasoning about navigation goals

2024

VoxPoser and SpatialVLM enable 3D-aware navigation from language

2024

Embodied AI competitions (Habitat Challenge) push toward realistic navigation+manipulation

2025

Real-world deployments expand: delivery robots, warehouse AMRs, and assistive navigation

How Robot Navigation Works

1LocalizationThe robot determines its po…2Environment MappingSensors (LiDAR3Goal SpecificationThe navigation goal is spec…4Path PlanningA collision-free path is pl…5Obstacle AvoidanceReal-time sensor feedback e…Robot Navigation Pipeline
1

Localization

The robot determines its position and orientation using GPS (outdoor), visual SLAM, or pre-built maps.

2

Environment Mapping

Sensors (LiDAR, cameras, depth) build a representation of the environment — occupancy grids, semantic maps, or neural implicit representations.

3

Goal Specification

The navigation goal is specified as coordinates, an object to find ('go to the refrigerator'), or natural language instructions ('turn left past the couch').

4

Path Planning

A collision-free path is planned from current position to goal, using classical planners (A*, RRT) or learned policies.

5

Obstacle Avoidance

Real-time sensor feedback enables dynamic obstacle avoidance as the robot follows the planned path.

Current Landscape

Robot navigation in 2025 operates in two regimes: classical (SLAM + path planning) is mature and deployed in warehouses, delivery, and cleaning robots; while learned navigation (VLN, semantic navigation) is advancing rapidly in research but not yet reliable for production. The integration of LLMs and VLMs enables robots to understand natural language navigation instructions and reason about semantic goals. The key trend is closing the loop between navigation and manipulation — robots that can navigate to an object and then interact with it.

Key Challenges

Sim-to-real gap — navigation policies trained in simulation often fail in real buildings due to sensor noise and dynamic obstacles

Dynamic environments — people, doors, and moving objects require constant re-planning

Language grounding — connecting natural language descriptions to specific locations and landmarks

Long-horizon navigation — navigating through large buildings with many rooms requires sustained spatial reasoning

Elevator/stairs/doors — physical obstacles that require manipulation capabilities alongside navigation

Quick Recommendations

Research platform

Habitat 3.0 + Matterport3D

Most realistic simulation for indoor navigation research

Vision-language navigation

DUET / BEVBert

State-of-the-art on R2R and REVERIE benchmarks

Real-world deployment

ROS2 + Nav2 stack

Production-proven navigation stack for real robots

Zero-shot object navigation

CLIP-based semantic navigation

Navigate to novel objects without task-specific training

What's Next

The frontier is socially-aware navigation in human environments — robots that understand personal space, traffic flow, and social context. Expect integration with household manipulation for 'navigate-then-manipulate' tasks, and outdoor navigation in unstructured terrain for last-mile delivery and agriculture.

Benchmarks & SOTA

No datasets indexed for this task yet.

Contribute on GitHub

Related Tasks

Robotics

End-to-end robotics — learning perception, planning, and control in a single model — entered a new era with vision-language-action (VLA) models. Google's RT-2 (2023) showed that a web-pretrained VLM could directly output robot actions, and the open-source Open X-Embodiment dataset (2023) unified data from 22 robot types across 21 institutions. The key tension is generalization: lab demos on specific robots are plentiful, but a single policy that transfers across embodiments, tasks, and environments remains the holy grail, with π₀ (Physical Intelligence, 2024) and Google's RT-X pushing this frontier.

Robot Manipulation

Robot manipulation — grasping, placing, and using tools — is where sim-to-real and foundation models meet physical dexterity. DexNet (2017) pioneered data-driven grasp planning, but the field accelerated when contact-rich manipulation was tackled with RL in simulation (DexterousHands, 2023) and then transferred to real hardware. Current state-of-the-art combines diffusion policies (Chi et al., 2023) with large pretrained vision encoders to achieve robust 6-DOF manipulation from a handful of demonstrations, though deformable objects and multi-step assembly remain unsolved.

Sim-to-Real Transfer

Sim-to-real transfer — training policies in simulation and deploying on physical hardware — is the bridge between unlimited virtual data and messy reality. Domain randomization (Tobin et al., 2017) was the first scalable approach, and OpenAI's Rubik's cube hand (2019) showed it could work for dexterous manipulation. The modern toolkit combines photorealistic rendering (Isaac Sim, MuJoCo MJX on GPU), system identification, and real-world fine-tuning, but the gap persists for contact-rich tasks where simulation physics diverge from reality. Narrowing this gap is existential for robotics — it determines whether lab results actually work in factories and homes.

Something wrong or missing?

Help keep Robot Navigation benchmarks accurate. Report outdated results, missing benchmarks, or errors.

0/2000
Robot Navigation Benchmarks - Robots - CodeSOTA | CodeSOTA