Skip to content
TUESDAY, JUNE 16, 2026
Humanoids

VENOM tracks multiple humanoids across bodies in simulation

By Sophia Chen3 min read

A single GPT-based motion tracker now follows every limb across different humanoids in simulation.

VENOM, short for Versatile Embodied Network for Omni-bodied Motion tracking, enters the lab as a cross-embodiment full-body tracker designed for humanoids in simulated environments. The approach eschews the long-standing split between upper and lower body control, instead using a single, unified model trained on a multi-humanoid dataset. The core idea is to learn a shared representation that can be applied across morphologies, enabling stable tracking of entire bodies even when the kinematics differ from one humanoid to another. The authors emphasize that VENOM is trained with a GPT-based architecture and is evaluated against baselines on a dedicated VENOM dataset that includes states, actions, and rewards from multiple humanoids. In their tests, VENOM proves more capable than a multinode, supervised learning model trained on the same multi-humanoid data, maintaining stable full-body tracking across varied morphologies.

The work positions VENOM as a significant step beyond traditional decoupled control schemes. By training on diverse humanoid data, the model learns to map sensor observations to actionable motion commands in a way that generalizes across body plans. Importantly, the team reports that VENOM succeeds even without direct reward feedback during training, approaching the tracking proficiency of experts trained with asymmetric-actor critic reinforcement learning. That comparison underscores a meaningful parity between end-to-end, data-driven tracking and policy-learning approaches that rely on explicit reward shaping, at least within simulated environments.

The VENOM dataset itself is a central contribution, housing states, actions, and rewards collected from multiple humanoid platforms. The dataset enables cross-embodiment evaluation and provides a common ground for comparing different learning strategies. In the reported experiments, VENOM’s performance is described as stable across different humanoid configurations, signaling the potential for a universal motion-tracking backbone that can adapt to new bodies without bespoke tuning. This is particularly relevant for operators who grapple with onboarding diverse robot forms into a shared control pipeline, as it hints at reduced per-robot customization and a more scalable way to manage motion capture and imitation tasks in simulation.

From a practitioner’s standpoint, the results highlight two key implications. First, the quality and breadth of the multi-humanoid dataset are critical; diverse joint limits, limb lengths, and mass distributions appear essential for cross-embodiment generalization. Second, even if a single model can in principle track multiple bodies, the practical deployment question centers on compute and latency. GPT-based trackers can carry substantial inference costs, so optimizing for real-time performance or distilling the model for embedded hardware will be a necessary route before real-world use. A broader concern is sim-to-real transfer: success in simulation does not guarantee smooth operation on physical robots, where sensor noise, friction, payload shifts, and actuator nonlinearities can erode tracking fidelity. The absence of reward feedback in training is impressive here, but translating that robustness to real hardware will likely demand careful calibration and perhaps targeted reward shaping or fine-tuning with real-world demonstrations.

Looking ahead, VENOM points to a path where a single motion-tracking backbone can support a family of humanoid forms, reducing the friction of bringing new bodies online. Industry observers will be watching for real-world demonstrations and a broader morphologies corpus, plus methods to reduce latency and bridge the sim-to-real gap. If VENOM scales to physical robots and more diverse morphologies, it could reshape how humanoid labs and operators approach data collection, controller design, and cross-robot collaboration in motion-tracking tasks.

Deployment stage: lab with simulation-based validation and future steps expected toward pilot testing on physical humanoids.

Sources
  1. VENOM: Versatile Embodied Network for Omni-bodied Motion tracking
    arXiv Humanoid/Bipedal Query / Primary source / Published JUN 15, 2026 / Accessed JUN 16, 2026

Newsletter

The Robotics Briefing

A daily front-page digest delivered around noon Central Time, with the strongest headlines linked straight into the full stories.

No spam. Unsubscribe anytime. Read our privacy policy for details.