Anthropic is making another big move, launching Claude 4.5 Sonnet and boldly claiming it’s the world’s strongest coding model. Not only that, this release is like a full “combo meal” of updates—Claude Code got upgraded, plus a brand new Claude Agent SDK, VS Code extension, and a bunch of other new features.

How Strong is Sonnet 4.5?

Let’s talk about the star of the show: Claude Sonnet 4.5. Anthropic says this new version is more stable and reliable when executing instructions and refactoring code. How stable? Let the numbers speak.

In the SWE-Bench Verified benchmark test, widely recognized in the industry, Sonnet 4.5 scored an impressive 77.2%, and even reached 82% in parallel execution mode. What does that mean? It’s like scoring 77 to 82 points on a 100-point exam—solid A-student territory.

What’s even more interesting is that in certain specific scenarios, like handling financial services industry problems, Sonnet 4.5’s performance actually exceeds their own flagship model Opus 4.1. It’s like the second child in the family suddenly surpassing the eldest in a certain field—quite unexpected.

In the OSWorld benchmark test, Sonnet 4.5 really stands out, achieving a 61.4% success rate. For comparison: the previous generation Sonnet 4 only hit 43.9%, and their own flagship Opus 4.1 is around 44%. This improvement is nothing short of a quantum leap.

Running 30 Hours Without Fatigue?

For complex tasks that require long runtimes, Sonnet 4.5 can now run continuously for 30 hours, far exceeding Opus 4’s 7 hours. It’s like a marathon runner who could only go 7 kilometers before getting tired, now able to run 30 kilometers while maintaining peak condition.

Anthropic officially states that Sonnet 4.5 can “maintain focus and high performance” throughout its entire operation. That sounds pretty mystical, but whether it actually holds up will depend on real-world user experience.

How Does It Compare to Other Models?

In most coding benchmark tests, Sonnet 4.5 beats mainstream competitors like GPT-4o and Gemini 2.5 Pro. However, it’s not dominant in all areas—in visual reasoning tasks, Anthropic’s model is still a bit weaker and hasn’t managed to turn the tables.

New Features and Pricing

This update brings many new features, including several advanced capabilities previously exclusive to Claude Code, such as virtual machine access, memory management, stronger context control, and multi-agent support.

Pricing-wise, Sonnet 4.5 remains consistent with the previous Sonnet 4: $3 per million input tokens and $15 per million output tokens.

Here’s an interesting detail: Anthropic revealed that Sonnet 4.5 is their first model capable of completely rebuilding the Claude.ai website application. The entire process took about 5.5 hours and involved over 3,000 tool calls. It’s like having AI build itself a house—pretty cool indeed.

What’s New in Claude Code?

Claude Code, the coding assistant, naturally upgraded to the latest Sonnet 4.5 model. Beyond the underlying model improvements, there are several noteworthy new features.

First is the native Visual Studio Code extension. Developers can see Claude Code’s changes in real-time through inline diffs. It’s like collaborative document editing where you can see exactly what others changed—crystal clear.

In the terminal, Claude Code’s status display is also clearer, with a new searchable prompt history. Want to find a question you asked before? Just search for it.

Another practical new feature is the “checkpoint” mechanism. If Claude Code goes off track, developers can more easily roll back to a previous state. It’s like saving your game—if something goes wrong, you can reload from a save point.

The All-New Claude Agent SDK

If you’re interested in building your own AI agents based on Claude Code’s underlying capabilities, then the newly launched Claude Agent SDK is worth checking out.

This SDK uses the same underlying architecture as Claude Code but gives developers much more freedom to build various types of agents. It provides a complete suite of core functions including agent orchestration, memory management, context control, tool invocation, and permission management.

From an API perspective, developers get a memory tool that helps agents maintain context coherence when executing long-cycle tasks. Additionally, Anthropic has added automatic context management functionality, allowing Claude to dynamically adjust the context window as needed and clean up outdated data.

It’s like giving AI a smart notebook that can remember important things while automatically cleaning up unimportant content, keeping the brain fresh.

Final Thoughts

The release of Claude 4.5 Sonnet marks another push by Anthropic in the coding AI field. Looking at the benchmark data, there are indeed some solid improvements this time. But at the end of the day, whether a model is good or not depends on its performance in actual projects.

For developers, there are more and more AI coding tools to choose from, and competition is getting fiercer. That’s obviously good news for users—better tools, same prices, more choices.

As for whether Sonnet 4.5 can firmly hold the position of “world’s strongest coding model,” that remains to be validated by time and users. But based on current data and features, it’s definitely worth trying out for developers.