Tool calling gets sharper with SFT and DPO

A tiny language model now calls the right tool on the first go.

AI agents can juggle multi step tasks, but the difference between finishing a job and flubbing a tool call often comes down to accuracy in tool selection and parameter handling. The blog from Amazon SageMaker AI shows that when agents pick the wrong tool, or format parameters incorrectly, task times creep upward, errors spike, and the system becomes harder to support. As production deployments proliferate, the ability to reliably choose and invoke the right tool becomes a core engineering constraint, not a nicety.

The team reports that a combination of Supervised Fine Tuning and Direct Preference Optimization can lift tool calling accuracy for a small language model. The approach starts with Supervised Fine Tuning to align the model with explicit examples of how to interact with specific tools, including the expected commands and constraints. This creates a solid foundation where the model understands tool language and usage patterns. But the key hinge is Direct Preference Optimization, which weaves human feedback or predefined objectives directly into the training loop. In DPO, the model learns to prefer the desired tool usage and outcomes over less desirable alternatives, shaping its behavior toward the target workflows rather than just mimicking examples.

Benchmarks indicate the gains come from more than just bigger data or longer training runs. The blog frames the evaluation around tool calling accuracy and contrasts a base model against several fine tuned variants. The example used for illustration is Amazon SageMaker AI training jobs, offering a realistic, production oriented setting where engineers can focus on training code rather than infrastructure. This makes the approach attractive for teams looking to tighten the reliability of agent based automation without getting bogged down in tooling complexity.

From a practitioner perspective the core takeaway is that proper alignment of the model with tool semantics matters as much as the model size. The team reports that SFT helps the model recognize tool specific language and constraints, while DPO steers its outputs toward preferred tool usage patterns. When combined, these methods reduce the likelihood of selecting a wrong tool, misformatted parameters, or broken workflow chains that can derail a multi step task.

Two to four concrete practitioner insights emerge for teams considering this path. First, treat tool calling accuracy as a first class metric for production ready agents. Small improvements in how a model maps requests to tools can cascade into faster task completion and lower support costs, especially as the tool catalog grows. Second, invest in curated training data that mirrors real world tool interactions. SFT’s strength rests on high quality examples that reflect the exact commands, parameter schemas, and sequencing rules agents must follow. Third, design objective aligned feedback loops for DPO rather than relying on generic loss functions. The “like this, not like that” signal helps the model prefer correct tool interactions in edge cases where multiple tools could seem applicable. Fourth, consider the production context early. Using SageMaker AI training jobs, as the example does, can simplify pipelines by letting teams focus on training code while shipping robust tooling for deployment readiness.

The upshot for product teams is pragmatic: aligning agents to tool semantics with SFT and DPO can yield measurable gains in reliability and speed for tool driven automation. It sets a controllable path from pilot to production where the bottlenecks are not the model size but the quality of tool interaction. As tool catalogs expand and workflows become more intricate, ongoing fine tuning and evaluation will be essential to keep the system from drifting away from intended behavior.

Tool calling gets sharper with SFT and DPO

The Robotics Briefing