Different Models, Different "Scripts"——Model-Specific Tuning Revealed

Have you noticed that the same prompt produces different results with Claude 3.5 Sonnet versus Claude 3 Opus?

This isn’t an illusion. Just like teaching two students with different personalities—one impatient, one patient—you need different methods. Claude Code also has specific “teaching methods” for different models. Today we’re talking about this secret of “teaching according to the model.”

Two Students: Sonnet and Opus

Claude 3.5 Sonnet and Claude 3 Opus are like two students with different personalities.

Sonnet is the “clever but careless” student:

Fast response, answers immediately
Cheap price, no hesitation to use
Sensitive to instructions, veers off track if unclear
Good for daily tasks, handles most assignments

Opus is the “steady but slow” student:

Most capable, thinks through complex problems
Slow response, you have to wait
Expensive, need to use sparingly
Good for difficult problems, overkill for simple tasks

It’s like having two calculators at home: one is fast but容易按错键 (prone to typos), the other is slow but accurate. Which do you use for simple bills? Which for taxes?

Different “Scripts”

Since the two students have different personalities, their “scripts” must also differ.

Sonnet’s Script—be specific, give examples, specify format:

Sonnet tends to “overthink” or “underthink,” you need to give clear instructions.

Bad prompt (for Sonnet):

Help me optimize this code

Good prompt (for Sonnet):

Please optimize this Go code with the following requirements:
1. Reduce memory allocations
2. Use more efficient algorithms
3. Keep original functionality unchanged
4. Output the optimized code with explanations

Code:
[your code]

See? For Sonnet you need to:

Clear steps: Tell it what to do first, then what next
Few-shot examples: Show it a few examples
Strict format: Specify output format clearly
Scope limits: Tell it where the boundaries are

Opus’s Script—can delegate, can give space:

Opus is highly capable, you can give it more autonomy.

For Opus you can:

This is a complex distributed systems problem.

Background: [description]
Constraints: [conditions]
Goal: [effect to achieve]

Please analyze and propose solutions.

For Opus you can:

Concise explanations: It understands implied meaning
Open questions: Let it explore solutions independently
Flexible format: Don’t enforce output format
Complex reasoning: Give it problems requiring deep thought

Underlying Principles

Why do the same model types need different prompts?

It’s like different cars have different “personalities”:

Sports car (Sonnet): Fast acceleration but容易打滑 (prone to skidding). You need precise throttle control, can’t turn wildly.
Off-road vehicle (Opus): Powerful, good at climbing. You can let it find its own path, but have to be patient as it slowly makes its way up.

Technically speaking, Claude Code injects different “tuning instructions” into system prompts based on the selected model:

Sonnet’s extra instructions:

Please carefully follow instructions and execute step by step.
If uncertain, ask rather than guess.
Use provided tools to complete tasks.

Opus’s extra instructions:

You may decide autonomously how to complete tasks.
If a task is complex, you may plan first then execute.
You may use tools or answer directly.

These instructions get merged into system prompts, affecting model behavior.

A/B Testing: Data Speaks

Claude Code’s prompts aren’t designed on a whim—they’re “discovered” through A/B testing.

What is A/B testing? Simply put, it’s a “controlled experiment”:

Testing Process:

Design two prompts (A and B)
  ↓
Divide users into two groups (50% use A, 50% use B)
  ↓
Run for a period (e.g., one week)
  ↓
Check data: which group has higher completion rate, which users more satisfied
  ↓
Winner gets rolled out to everyone

What Metrics to Test:

Task Completion Rate: Did the user’s problem get solved?
User Satisfaction: High user ratings?
Error Rate: How many times did AI mess up?
Token Consumption: Was cost high?
Response Time: Did users wait long?

A Real Example:

Anthropic tested two tool descriptions:

Version A: “Search file contents” Version B: “Search file contents, use when you need to find specific text patterns”

Result: Version B’s tool call accuracy improved by 15%. Why? Because B explicitly told the model “when to use it.”

This is the value of A/B testing—let data speak, not guesswork.

Prompt Version Management

Claude Code’s prompts have “version numbers,” just like software.

Version Naming: prompts-v1.2.3

v1: Major version, prompt structure changed
.2: Minor version, added several modules
.3: Patch, optimized a few wordings

Why Version Management?

Imagine if you changed a prompt and AI performance got worse—what do you do?

With version management, you can:

Rollback: Quickly return to previous version
Compare: See what changed that caused the problem
Gradual Rollout: First trial with small user percentage

Gradual Rollout Process:

New version development complete
  ↓
Internal testing (employees use first)
  ↓
Small-scale public test (1% users)
  ↓
Expand scope (10% users)
  ↓
Full rollout (100% users)
  ↓
Monitor data, rollback if problems

This is like new drug launch: animal trials first, then human trials, then finally approved for sale.

Practical: How to Choose Models

Understanding this, how do you choose models in actual use?

Daily Development (Use Sonnet):

Write simple functions
Read code to understand logic
Search and find
Format code

Complex Tasks (Use Opus):

Architecture design
Complex bug investigation
Algorithm optimization
Cross-file refactoring

Cost Considerations:

Sonnet is cheap, use freely
Opus is expensive, save for critical moments

Specify Model in CLAUDE.md:

You can write in CLAUDE.md:

Use Claude 3.5 Sonnet by default
Use Claude 3 Opus for complex tasks

Claude Code reads this configuration and automatically selects the appropriate model.

Implications for You

Understanding model tuning helps you:

1. Choose the Right Tool:

Use Sonnet for simple tasks, saves money and is faster
Use Opus for complex tasks, worth the cost

2. Adjust Prompts:

Be specific, provide steps for Sonnet
Give space, give challenges for Opus

3. Understand “Why They’re Different”:

Same request getting different results from different models is normal
Not the model being broken—their “personalities” are different

4. Provide Feedback:

If you notice a model performing abnormally, you can feedback to Anthropic
Your feedback might become A/B testing data

Summary

Model-specific tuning embodies the core idea of “mastery engineering”: there’s no best model, only the most appropriate usage.

Sonnet is like a sports car—want speed, accuracy, fuel efficiency
Opus is like an off-road vehicle—can climb hills, can cross ravines
A/B testing ensures prompts actually work
Version management allows optimization rollbacks

Understanding this lets you choose the appropriate model for tasks and get the best results at minimal cost.

Two Students: Sonnet and Opus#

Different “Scripts”#

Underlying Principles#

A/B Testing: Data Speaks#

Prompt Version Management#

Practical: How to Choose Models#

Implications for You#

Summary#