Skip to content
SUNDAY, JUNE 7, 2026
AI & Machine Learning3 min read

SFT and DPO boost tool calling accuracy on SageMaker

By Alexander Cole

SFT and DPO boost tool calling accuracy on SageMaker

Image / AWS Machine Learning

A tiny fine tuning trick slashes tool calling errors.

As AI agents move from pilot programs to production, picking the right tool at the right moment is everything. The latest guidance from AWS shows how a simple, deliberate pairing of supervised fine-tuning and direct preference optimization can lift a small language model’s tool calling accuracy when running on SageMaker AI training jobs. The post walks through why tool selection and parameter formatting matter, and how to measure improvement in a way that matters for real tasks.

The core idea is engineering first. Tool calling is not a cosmetic feature; it underpins task completion time, reliability, and the cost of support. When an agent misroutes a request to a tool or formats its parameters incorrectly, workflows stall, error rates rise, and users notice. The team reports that their approach targets the heart of the problem: teaching the model to recognize which tool to invoke and how to talk to it correctly within a given workflow. The example is concrete: training jobs on Amazon SageMaker AI serve as the testbed so engineers can focus on the training code rather than infrastructure management. In this setup, the model is evaluated on its ability to call tools correctly and maintain smooth workflow chains rather than merely producing plausible text.

The recipe blends two techniques that complement one another. Supervised fine-tuning builds a high quality dataset that mirrors the model’s intended function, with explicit examples of how the model should interact with tools, how to phrase commands, and what constraints to observe. This data curation step is the backbone of teaching the model the language specifics of tool usage, from recognizing tool names to understanding argument formats. Direct Preference Optimization then refines these interactions by injecting human feedback or clearly defined objectives into the training loop. In practice, DPO nudges the model toward preferred behaviors by emphasizing a “like this, not like that” pattern, aligning the output with target outcomes more tightly than pure imitation would.

Benchmarking is less glamorous than a flashy demo but far more relevant for production. The team reports evaluating tool calling accuracy and comparing a base model against several fine tuned variants. The emphasis is practical: how often does the agent pick the correct tool, how accurately does it format parameters, and how robust is the chain of calls when a task spans multiple steps? The takeaway from the published guidance is clear: the combined SFT and DPO approach yields measurable gains over a straight base model, at least in the SageMaker training job scenario used for illustration. That translates to fewer tool selection mistakes, fewer broken workflows, and lower downstream support needs when teams push agents toward real tasks.

For practitioners, the article offers several tangible implications. First, set up the problem as an engineering constraint and prototype around your current tool stack; the SageMaker example is a concrete blueprint you can adapt. Second, expect a tradeoff between data quality and effort. SFT requires curated examples that cover tool-specific language and constraints, while DPO requires ongoing feedback or objective tuning, which can be resource intensive but pays off in reliable behavior. Third, anticipate failure modes common to tool calls: tool selection errors, misformatted or missing parameters, and brittle multi-step workflows when tool chains are interrupted. Finally, watch for how well the approach generalizes beyond the initial tool set. The philosophy is transferable, but the implementation details, dataset design, preference signals, and evaluation criteria will need tailoring to each production environment.

In short, the post frames a pragmatic path to more reliable AI agents: train with clear, tool-specific data, then nudge behavior with human-guided preferences. The result is not a magic wand but an engineering win, one that reduces erroneous tool calls and tightens end-to-end automation.

Sources
  1. Improve your agent’s tool-calling accuracy with SFT and DPO on Amazon SageMaker AI
    AWS Machine Learning / Primary / Published JUN 03, 2026 / Accessed JUN 07, 2026

Newsletter

The Robotics Briefing

A daily front-page digest delivered around noon Central Time, with the strongest headlines linked straight into the full stories.

No spam. Unsubscribe anytime. Read our privacy policy for details.