Have you noticed that the same prompt produces different results with Claude 3.5 Sonnet versus Claude 3 Opus?

This isn’t an illusion. Just like teaching two students with different personalities—one impatient, one patient—you need different methods. Claude Code also has specific “teaching methods” for different models. Today we’re talking about this secret of “teaching according to the model.”

Two Students: Sonnet and Opus

Claude 3.5 Sonnet and Claude 3 Opus are like two students with different personalities.

Sonnet is the “clever but careless” student:

  • Fast response, answers immediately
  • Cheap price, no hesitation to use
  • Sensitive to instructions, veers off track if unclear
  • Good for daily tasks, handles most assignments

Opus is the “steady but slow” student:

  • Most capable, thinks through complex problems
  • Slow response, you have to wait
  • Expensive, need to use sparingly
  • Good for difficult problems, overkill for simple tasks

It’s like having two calculators at home: one is fast but容易按错键 (prone to typos), the other is slow but accurate. Which do you use for simple bills? Which for taxes?

Different “Scripts”

Since the two students have different personalities, their “scripts” must also differ.

Sonnet’s Script—be specific, give examples, specify format:

Sonnet tends to “overthink” or “underthink,” you need to give clear instructions.

Bad prompt (for Sonnet):

Help me optimize this code

Good prompt (for Sonnet):

Please optimize this Go code with the following requirements:
1. Reduce memory allocations
2. Use more efficient algorithms
3. Keep original functionality unchanged
4. Output the optimized code with explanations

Code:
[your code]

See? For Sonnet you need to:

  • Clear steps: Tell it what to do first, then what next
  • Few-shot examples: Show it a few examples
  • Strict format: Specify output format clearly
  • Scope limits: Tell it where the boundaries are

Opus’s Script—can delegate, can give space:

Opus is highly capable, you can give it more autonomy.

For Opus you can:

This is a complex distributed systems problem.

Background: [description]
Constraints: [conditions]
Goal: [effect to achieve]

Please analyze and propose solutions.

For Opus you can:

  • Concise explanations: It understands implied meaning
  • Open questions: Let it explore solutions independently
  • Flexible format: Don’t enforce output format
  • Complex reasoning: Give it problems requiring deep thought

Underlying Principles

Why do the same model types need different prompts?

It’s like different cars have different “personalities”:

  • Sports car (Sonnet): Fast acceleration but容易打滑 (prone to skidding). You need precise throttle control, can’t turn wildly.

  • Off-road vehicle (Opus): Powerful, good at climbing. You can let it find its own path, but have to be patient as it slowly makes its way up.

Technically speaking, Claude Code injects different “tuning instructions” into system prompts based on the selected model:

Sonnet’s extra instructions:

Please carefully follow instructions and execute step by step.
If uncertain, ask rather than guess.
Use provided tools to complete tasks.

Opus’s extra instructions:

You may decide autonomously how to complete tasks.
If a task is complex, you may plan first then execute.
You may use tools or answer directly.

These instructions get merged into system prompts, affecting model behavior.

A/B Testing: Data Speaks

Claude Code’s prompts aren’t designed on a whim—they’re “discovered” through A/B testing.

What is A/B testing? Simply put, it’s a “controlled experiment”:

Testing Process:

Design two prompts (A and B)
Divide users into two groups (50% use A, 50% use B)
Run for a period (e.g., one week)
Check data: which group has higher completion rate, which users more satisfied
Winner gets rolled out to everyone

What Metrics to Test:

  • Task Completion Rate: Did the user’s problem get solved?
  • User Satisfaction: High user ratings?
  • Error Rate: How many times did AI mess up?
  • Token Consumption: Was cost high?
  • Response Time: Did users wait long?

A Real Example:

Anthropic tested two tool descriptions:

Version A: “Search file contents” Version B: “Search file contents, use when you need to find specific text patterns”

Result: Version B’s tool call accuracy improved by 15%. Why? Because B explicitly told the model “when to use it.”

This is the value of A/B testing—let data speak, not guesswork.

Prompt Version Management

Claude Code’s prompts have “version numbers,” just like software.

Version Naming: prompts-v1.2.3

  • v1: Major version, prompt structure changed
  • .2: Minor version, added several modules
  • .3: Patch, optimized a few wordings

Why Version Management?

Imagine if you changed a prompt and AI performance got worse—what do you do?

With version management, you can:

  • Rollback: Quickly return to previous version
  • Compare: See what changed that caused the problem
  • Gradual Rollout: First trial with small user percentage

Gradual Rollout Process:

New version development complete
Internal testing (employees use first)
Small-scale public test (1% users)
Expand scope (10% users)
Full rollout (100% users)
Monitor data, rollback if problems

This is like new drug launch: animal trials first, then human trials, then finally approved for sale.

Practical: How to Choose Models

Understanding this, how do you choose models in actual use?

Daily Development (Use Sonnet):

  • Write simple functions
  • Read code to understand logic
  • Search and find
  • Format code

Complex Tasks (Use Opus):

  • Architecture design
  • Complex bug investigation
  • Algorithm optimization
  • Cross-file refactoring

Cost Considerations:

  • Sonnet is cheap, use freely
  • Opus is expensive, save for critical moments

Specify Model in CLAUDE.md:

You can write in CLAUDE.md:

Use Claude 3.5 Sonnet by default
Use Claude 3 Opus for complex tasks

Claude Code reads this configuration and automatically selects the appropriate model.

Implications for You

Understanding model tuning helps you:

1. Choose the Right Tool:

  • Use Sonnet for simple tasks, saves money and is faster
  • Use Opus for complex tasks, worth the cost

2. Adjust Prompts:

  • Be specific, provide steps for Sonnet
  • Give space, give challenges for Opus

3. Understand “Why They’re Different”:

  • Same request getting different results from different models is normal
  • Not the model being broken—their “personalities” are different

4. Provide Feedback:

  • If you notice a model performing abnormally, you can feedback to Anthropic
  • Your feedback might become A/B testing data

Summary

Model-specific tuning embodies the core idea of “mastery engineering”: there’s no best model, only the most appropriate usage.

  • Sonnet is like a sports car—want speed, accuracy, fuel efficiency
  • Opus is like an off-road vehicle—can climb hills, can cross ravines
  • A/B testing ensures prompts actually work
  • Version management allows optimization rollbacks

Understanding this lets you choose the appropriate model for tasks and get the best results at minimal cost.