Nemotron 3 Ultra lands on SageMaker JumpStart

NVIDIA Nemotron 3 Ultra lands on SageMaker JumpStart with 5x faster inference for long-running agent workflows.

The team reports that Nemotron 3 Ultra is an open large language model built for frontier reasoning and orchestration in autonomous agents, delivering five times the throughput and up to 30 percent lower cost for agentic workloads. The model is optimized for the NVFP4 format, which helps it run faster and cheaper to host. The paper shows that Nemotron 3 Ultra has 550 billion total parameters, with 55 billion active parameters per forward pass, and supports context lengths of up to one million tokens. This combination is designed to sustain planning, tool calling, and self-correction loops that span hundreds of turns, a crucial requirement for persistent, goal-directed agent behavior.

Architecturally, Nemotron 3 Ultra is a hybrid Transformer-Mamba mixture-of-experts design. The MoE approach activates only 55 billion of the full 550 billion parameters on each forward pass, preserving high throughput even as the conversation length balloons. In practice, this means agents can persist through multi-turn planning while still benefiting from a large candidate pool of experts to handle diverse tasks. Benchmarks indicate the model achieves its claimed speed and cost targets while maintaining useful accuracy across agentic tasks such as tool calls and sub-agent delegation. The paper shows that such a configuration can deliver frontier intelligence at a fraction of the compute cost of comparable dense models, a critical lever for real-world deployment under budget and latency constraints.

One-click deployment on Amazon SageMaker JumpStart further lowers the bar for adoption. The launch positions Nemotron 3 Ultra as a go-to option for teams building long-horizon autonomous workflows, where agents must continually plan, act, verify results, and adjust course. The NVFP4 precision choice is part of the engineering push to reduce hosting costs without sacrificing the depth of reasoning required for multi-turn tasks. In the context of agentic AI, where a single misstep can cascade into hundreds of wasted tokens, the ability to sustain hundreds of turns with a predictable compute footprint matters.

From a practitioner's lens, the Nemotron 3 Ultra release highlights several concrete tradeoffs and watch-outs. First, the MoE routing layer is what enables the 55B active parameter footprint to deliver high throughput, but it also imposes routing overhead and memory management considerations that teams must address in production. Second, the ultra-long context length of up to a million tokens shifts the bottleneck toward memory bandwidth and offload strategies, making caching policies and memory architecture critical for stable latency. Third, while cost-per-task improves, the exact economics will hinge on the task mix, how aggressively agents call tools, and how often self-correction loops must be executed. Finally, deploying on JumpStart reduces setup friction, but teams should still validate end-to-end task completion time and accuracy in their own workloads, especially when orchestrating multiple sub-agents with external tools.

Looking ahead, industry watchers will want to see how Nemotron 3 Ultra performs in live agentic scenarios: how often tool calls succeed on first try, how resilient planning loops are to noisy tool responses, and how cost scales as task complexity grows. For now, the combination of a massive but selectively active parameter count, a long-context MoE architecture, and a mission-focused deployment path makes Nemotron 3 Ultra a noteworthy engineering choice for teams building autonomous agents that must think, plan, and operate over hundreds of turns.

Nemotron 3 Ultra lands on SageMaker JumpStart

The Robotics Briefing