Monocular navigation crosses robot bodies with one envelope

Image / arXiv Humanoid Robot Query
AgniNav is a configuration driven local navigation framework that standardizes transfer across platforms by tying perception and planning to a shared collision envelope. Each robot is described by four numbers: height, front length, rear length, and half width. The height parameter conditions an image to scan network that predicts a one dimensional collision aware pseudo laserscan from a monocular color image, while the remaining footprint parameters configure a dimension aware local planner for collision checking. Training uses height conditioned column minimum scan labels generated from paired color depth data, letting the same image supervise different safety envelopes without collecting robot specific data. To the best of the authors' knowledge, AgniNav is the first monocular local navigation framework that jointly conditions perception and planning on a shared collision envelope for zero retraining deployment across wheeled, quadruped, and humanoid platforms. Real robot experiments on a Turtlebot2, Unitree Go2, and Accelerated Evolution K1 achieve 39/40, 18/20, and 18/20 successes with 0/40, 1/20, and 2/20 collisions, respectively, while running at 30 Hz on Jetson Orin.
The project argues that monocular local navigation remains attractive for lightweight robots, but earlier perception policies tied to a specific body or camera geometry hinder cross platform transfer. AgniNav flips that dependency by funneling all body differences into a four parameter envelope that governs both the predicted collision signal and the planner that uses it. The perception path converts a single image into a one dimensional proxy of a scans view that highlights near obstacles, while the planning path respects the robot’s footprint defined by the four numbers. The paired color depth data used during training ensures the system learns how a given image maps to collisions for different envelope shapes, so a single data log can support multiple hardware configurations without bespoke data collection. In practice, the approach delivered robust, real world results across three quite different bodies while sustaining a real time footprint, 30 frames per second, on a common edge device.
For operators and investors, the key takeaway is the potential to decouple robot fleet upgrades from perception re engineering. If a factory or campus deploys a mixed fleet, AgniNav promises that a monocular camera stack paired with a geometric envelope can be reused as is when swapping wheels for legs or updating a platform with a newer chassis. The four parameter envelope acts as the single contract between perception and motion, constraining how the system sees the world and how it behaves in it. The reported success rates across a spectrum of platforms, nearly flawless on the Turtlebot2, and high reliability on the two more capable bodies, underline the practicality of a zero retraining path for cross embodiment navigation in low speed, collision aware contexts. The researchers also demonstrate that the local planner can operate at a stable cadence on commodity hardware, a non trivial requirement for fleet scale deployment.
Two practitioner insights emerge.
If AgniNav scales beyond the current trio of test bodies, the cross embodiment modeling paradigm could shorten the path from lab to production for monocular navigation. Engineers will be watching whether the envelope alone can preserve safety margins in denser real world traffic and whether the system can gracefully handle dynamic obstacles without resorting to richer sensors. In the meantime, the work offers a concrete engineering artifact to anchor cross platform deployment: a single, measurable geometry that makes a camera driven planner work on many bodies.
- AgniNav: Configuration-Driven Cross-Embodiment Local Planning for Robot NavigationarXiv Humanoid Robot Query / Primary source / Published JUN 09, 2026 / Accessed JUN 09, 2026