AI Agents LLM Anthropic

February 18, 2026 7 min read AI Automation

Anthropic Just Killed Tool Calling - Here's What You Need to Know

Traditional JSON-based tool calling is becoming obsolete. Anthropic's Set 46 release introduces programmatic tool calling - reducing token costs by 30-80% while improving agent accuracy. Learn how this fundamentally changes AI agent architecture and why it's likely to become the new industry standard.

Anthropic programmatic tool calling explanation

The Tool Calling Problem

AI agents have been struggling with inefficient tool calling architectures that waste tokens and reduce accuracy. Traditional approaches load all tool definitions, intermediate results, and conversation history into the context window - polluting it with unnecessary data.

As shown at 2:15 in the video, this becomes especially problematic with protocols like MCP where multiple tools are available. Each tool call adds more data to the context window, creating a snowball effect that consumes valuable tokens with irrelevant information.

Context pollution problem: A typical agent might use 70-80% of its context window just for tool definitions and intermediate results, leaving little room for actual problem-solving. This forces developers to choose between expensive larger context windows or reduced functionality.

Programmatic Solution

Anthropic's programmatic tool calling solves this by letting agents write and execute code in a sandbox environment. Instead of JSON-based tool calling that loads everything into context, the agent:

Identifies needed tools
Writes code to invoke them in sequence
Executes the code in a sandbox
Only receives final processed results

This approach leverages LLMs' natural coding ability while avoiding their weakness with synthetic JSON formats. As the transcript explains at 4:30, "LLMs are trained on billions of lines of code... they can produce and understand code but barely any synthetic JSON tool calling formats."

Token savings: Cloudflare reported 30-80% reductions in token usage, while Anthropic saw 24% fewer input tokens alongside 11% accuracy improvements in benchmarks.

Industry Adoption Timeline

Programmatic tool calling didn't emerge overnight. The transcript reveals a clear progression:

September 2025: Cloudflare publishes "Code Mode: The Better Way to Use MCP"
November 2025: Anthropic releases "Code Execution with MCP" paper
Late November 2025: Tropic adds advanced tool use including tool search
December 2025: Open-source implementations emerge (Blocks Goose, LightLLM)
February 2026: Anthropic makes programmatic calling GA with Set 46

This mirrors Anthropic's previous innovations like MCP and agent skills that became industry standards. At 6:45, the video notes: "Just like anything released by Anthropic, the usage kind of exploded within the open source community."

Benchmark Results

Anthropic tested programmatic tool calling on two key benchmarks with impressive results:

BrowserComp: Tests navigation ability across websites. Sonnet improved from 33% to 46% accuracy (13 point gain) while using fewer tokens.

The second benchmark, Deep Search QA (finding multiple correct answers), saw F1 scores improve from 52% to 59% for Sonnet 46. Both benchmarks demonstrated that programmatic calling doesn't just save tokens - it actually improves agent capabilities.

However, as noted at 9:20 in the video: "Token cost will vary depending on how much code the model needs to write... OPUS 46 saw increased price-weighted tokens because it wrote more filtering code." This highlights the importance of balancing code complexity against token savings.

Implementation Details

For web search (one of the first generally available implementations), Anthropic automatically applies dynamic filtering when:

Using their search API
Data fetching is enabled
The agent writes filtering code

The transcript explains at 10:30: "Claude writes code to do post-processing on query results... only putting relevant results in the context window." This happens before injection, preventing pollution.

For custom tools, developers provide:

Tool name and description
Input/output schemas
Code execution capability

Anthropic provides detailed documentation and cookbooks, making adoption easier. As with previous innovations, this will likely see rapid ecosystem adoption.

Watch the Full Tutorial

For a deeper dive into programmatic tool calling, including specific examples of dynamic filtering in action (demonstrated at 7:15), watch the full video tutorial below.

Anthropic programmatic tool calling tutorial

Key Takeaways

Programmatic tool calling represents a fundamental shift in how AI agents interact with tools. By leveraging LLMs' natural coding ability instead of forcing JSON-based calling, developers can achieve both cost savings and accuracy improvements.

In summary: Anthropic's approach reduces token costs by 30-80%, improves accuracy by up to 13%, and will likely become an industry standard - just as their previous innovations have. Early adopters can gain significant competitive advantage.

Frequently Asked Questions

Common questions about programmatic tool calling

What is programmatic tool calling?

Programmatic tool calling is a new approach where AI agents write and execute code to invoke specific tools in a sandbox environment, rather than using traditional JSON-based tool calling.

This method reduces token usage by 30-80% while improving accuracy, since LLMs are better at writing code than parsing synthetic JSON formats.

Eliminates context window pollution
Leverages LLMs' natural coding ability
Only returns final processed results

How does programmatic tool calling save tokens?

Traditional tool calling loads all tool definitions and intermediate results into the context window. Programmatic calling keeps intermediate processing within a sandbox, only returning final results to the LLM.

Anthropic saw 24% fewer input tokens and 11% accuracy improvements in benchmarks. Cloudflare reported even greater savings of 30-80% in some implementations.

No tool definition overhead
Intermediate results stay in sandbox
Only relevant data enters context

Which companies are adopting this approach?

Anthropic, Cloudflare, and Tropic have all implemented programmatic tool calling. Cloudflare reported 30-80% token savings, while Anthropic's Sonnet 46 shows 13% accuracy improvements in browser navigation benchmarks.

OpenAI's GPT 5.2 also added support for 20+ tools using this method, and Google's Gemini has offered similar capabilities since Gemini 2.0.

Anthropic leads with Set 46 release
Cloudflare showed early success
Open-source projects rapidly adopting

Does this always reduce token costs?

Not always. Anthropic found that while Sonnet 46 reduced price-weighted tokens, OPUS 46 saw increased costs because it wrote more filtering code.

The tradeoff depends on how much code the model writes versus how much irrelevant data it filters. Complex filtering may increase costs despite reducing final output tokens.

Code writing has its own cost
Simple filters save most tokens
Balance complexity against savings

What benchmarks show improvements?

BrowserComp (testing navigation ability) showed Sonnet improving from 33% to 46% accuracy. Deep Search QA (finding multiple correct answers) improved F1 scores from 52% to 59%.

Both benchmarks saw significant token reductions alongside accuracy gains. BrowserComp's 13-point accuracy jump is particularly notable - equivalent to a major model upgrade.

BrowserComp: +13% accuracy
Deep Search QA: +7% F1
24% fewer input tokens

How do I implement this with Anthropic's API?

For web search, simply enable data fetching in Anthropic's search API - the model will automatically use dynamic filtering.

For custom tools, provide tool definitions with code execution capabilities in a sandbox environment. Anthropic provides detailed documentation and cookbooks for implementation.

Web search: Enable data fetching
Custom tools: Define with code execution
Use provided sandbox environment

Will this become an industry standard?

Given Anthropic's track record with MCP and agent skills (which became industry standards), and rapid adoption by Cloudflare, Tropic, and open-source projects like LightLLM, programmatic tool calling will likely see wide adoption.

OpenAI and Google have already implemented similar capabilities, suggesting this approach will become the norm for efficient agent architectures.

Follows MCP adoption pattern
Multiple major players implementing
Solves fundamental efficiency problem

How can GrowwStacks help implement this for your business?

GrowwStacks helps businesses implement programmatic tool calling and other AI agent optimizations. We design custom solutions that reduce token costs while improving accuracy, integrating with your existing tools.

Our team can implement Anthropic's latest features or build hybrid solutions combining multiple providers. We'll analyze your specific workflows to maximize efficiency gains from these new approaches.

Custom programmatic calling implementations
Token cost reduction analysis
Free 30-minute consultation

Ready to Reduce Your AI Agent Costs by 30-80%?

Traditional tool calling wastes tokens and reduces accuracy. GrowwStacks can implement Anthropic's programmatic calling to optimize your AI workflows.

Book Free Consultation → Read More Articles