Nemotron 3 Ultra Debuts on SageMaker JumpStart

Nemotron 3 Ultra just landed on SageMaker JumpStart, delivering 5x faster inference for long running agent work.

NVIDIA’s Nemotron 3 Ultra is an open large language model built for frontier reasoning and orchestration in autonomous agents, and the day zero release on JumpStart marks a tangible step toward practical, scalable agentic AI. The model stacks up to 550 billion total parameters, with 55 billion active parameters used per forward pass, and it runs in the NVFP4 precision. The hybrid Transformer-Mamba MoE architecture is designed to let agents plan, call tools, delegate sub-agents, and loop across hundreds of turns without paying a proportional compute tax. In plain terms, Nemotron 3 Ultra can sustain intricate planning and iterative self-correction while keeping throughput high even as context lengths stretch toward an astonishing one million tokens. Benchmarks indicate the model is 5 times faster for long-running agent workflows and can cut costs by up to 30 percent on complex agentic tasks, a meaningful lever for organizations building autonomous agents that operate across many steps.

The JumpStart deployment story is simple but consequential: a one-click path to stand up a model that is purpose built for orchestrating actions rather than merely answering queries. For enterprise teams, the result is less friction in moving from research to production, and a more predictable cost profile when agents engage in tool use, verification loops, and multi-turn reasoning. The Nemotron 3 Ultra team emphasizes an important nuance of agentic AI: every turn adds tokens and compute, so the metrics that matter are task completion with useful accuracy, time to finish, and cost per task. The MoE design advances this goal by activating only 55B of the 550B total parameters per forward pass, preserving throughput when the agent context balloons to millions of tokens.

From an engineering standpoint, the implications are meaningful. First, the 1M-token context length enables agent platforms to nest planning horizons far beyond typical chat interactions, reducing the need to retrace steps or regenerate context for long-running tasks. Second, the NVFP4 precision and the MoE gating combine to deliver a combination of scale and efficiency that would be hard to obtain with dense models of equivalent quality. The result is an engine that can sustain planning, tool calls, and self-correction loops across hundreds of turns while keeping the per-task cost in check.

For practitioners, three takeaways stand out. Takeaway one, MoE gating matters: the performance gains hinge on how well the routing of tokens across experts is managed, as latency can become irregular if routing becomes a bottleneck. Takeaway two, cost is task dependent: while aggregate numbers show up to 30 percent lower costs for complex agent work, real savings depend on token flow, tool usage, and the number of planning cycles per task. Takeaway three, deployment friction is lower but not zero: JumpStart removes much of the integration burden, yet teams still need to assemble tool interfaces, guardrails, and monitoring for autonomous workflows to function safely in production.

In parallel but separate, JumpStart is expanding beyond frontier agents. Fundamental’s NEXUS, a Large Tabular Model optimized for structured data, is now available on JumpStart as well, signaling a broader push to bring enterprise-ready AI capabilities to production. For teams working with tabular data, NEXUS promises deterministic predictions and native tabular understanding, reinforcing the idea that JumpStart is becoming a multi-domain pedestal for enterprise AI.

As the field edges toward more capable agentic systems, Nemotron 3 Ultra embodies a practical engineering stance: push for higher context and smarter routing, but measure success in concrete production metrics, time to task completion, cost per task, and reliability across long-running workflows.

Nemotron 3 Ultra Debuts on SageMaker JumpStart

The Robotics Briefing