Have you noticed that with Claude Code, sometimes responses are fast, sometimes slow like the AI is contemplating life? This often relates to “cache hit rate” - cache hits mean instant responses; misses require recalculation.

Today we’ll discuss Claude Code’s seven “cache optimization secrets” and how it pushes cache hit rates to the extreme.

Core Philosophy of Cache Optimization

The essence of cache optimization is: Reduce change, maintain stability.

Imagine ordering at a restaurant:

  • Cache hit: Waiter remembers what you like, orders directly
  • Cache miss: Has to ask what you want every time

API cache works the same way. Every request sends a large chunk of “system prompt + tool descriptions” to the server. If this content changes, the server must reprocess - time and money wasted.

So the optimization direction is: Keep this content as unchanged as possible, or minimize the impact of changes.

Secret #1: Date Memoization - Don’t Invalidate Across Midnight

A problem困扰 many AI applications: crossing midnight.

Say at 11:59 PM you ask AI a question, system prompt reads “Today is 2026-04-01”. Then at 12:01 AM you ask another, the date becomes “2026-04-02”.

Just this one character change invalidates the entire system prompt cache - about 11,000 tokens need recalculation.

Claude Code’s solution: Date memoization.

getSessionStartDate = Remember the first date obtained, use this forever

This way, even after midnight, the date in the system prompt remains “yesterday” - cache doesn’t invalidate.

How does AI know “today” changed? Claude Code appends the new date at the message tail. Tail changes don’t affect prefix cache.

It’s like telling the waiter: “I always order the same dish,” just with dessert added today - the main course doesn’t need reintroduction.

Secret #2: Monthly Granularity - From Daily to Monthly Changes

System prompt handled, but tool prompts also need time info. Using full dates (2026-04-01) changes daily.

Solution: Use only month (April 2026).

Change frequency drops from daily to monthly - 30x stability improvement.

It’s like buying a monthly pass instead of daily tickets.

The division of labor for the two time precisions:

  • System prompt: Accurate to day, but memoized (once per session)
  • Tool prompt: Accurate to month (once per month)

Secret #3: Moving Agent List from Tool Description to Message Attachment

This is the highest-impact optimization - eliminating 10.2% of full cache reconstruction cost.

Where’s the problem? AgentTool descriptions list all available agents (including those from MCP servers). But this list is dynamic:

  • MCP servers connect asynchronously
  • Plugin refreshes
  • Permission mode changes

Each list change alters the AgentTool schema, invalidating the entire tool schema cache.

Solution: Move the dynamic list out of tool descriptions, into message attachments.

Tool descriptions become static, only describing functionality; available agent list appended as attachment at message tail.

It’s like:

  • Before: Menu printed with “Today’s specials depend on chef’s mood”
  • Now: Menu fixed, waiter tells you today’s specials verbally

Menu stays the same, cache stable; special info goes at tail, doesn’t affect prefix.

Secret #4: Skill List Budget - 1% Cap Control

SkillTool faces a similar problem: skill list changes dynamically.

Solution: Strictly cap at 1% of context window.

For a 200K window, that’s 8,000 characters. If the list exceeds this, it’s truncated.

Benefits:

  1. Limits tool description size, reducing byte-match difficulty
  2. When budget is full, new skills aren’t added, list doesn’t change, cache doesn’t break

It’s like your closet: when full, stop buying new clothes, keep it organized.

Secret #5: $TMPDIR Placeholder - Eliminating User Differences

BashTool prompts need to tell AI where the temp directory is. Usually /private/tmp/claude-{UID}/, with different UIDs for different users.

This means: Different users see different prompts, can’t share global cache.

Solution: Use $TMPDIR placeholder instead of specific paths.

Claude Code’s sandbox sets the $TMPDIR variable, AI references temp directory with $TMPDIR works the same.

But all users see identical prompts - all say $TMPDIR - cache can be shared.

It’s like:

  • Before: Menu printed with “Our address: Beijing Chaoyang District xxx” (each branch prints different)
  • Now: Menu printed with “Our address: See store sign” (all branches share same menu)

Secret #6: Conditional Paragraph Omission - Better Silent Than Sorry

Some paragraphs in system prompts are conditional: add an explanation when a feature is enabled.

Problem: If this condition changes mid-session (like remote config update), paragraph appearing/disappearing changes the prompt, cache breaks.

Solution: Better silent than sorry.

Specific approaches:

  • If unimportant, always include or always omit
  • If must be conditional, put in dynamic region (doesn’t participate in global cache)
  • Use attachment mechanism instead of inline conditionals

It’s like promising a friend: “Come if you can, if unsure don’t commit” - better than committing then backing out.

Secret #7: Tool Schema Cache - Session-Level Locking

Tool schema generation involves multiple runtime decisions:

  • GrowthBook feature flags
  • Dynamic tool descriptions
  • MCP tool schemas

If regenerated every request, any GrowthBook change causes schema changes.

Solution: Session-level caching.

TOOL_SCHEMA_CACHE = new Map()

First request: Generate schema, store in cache
Subsequent requests: Use cache directly, no regeneration

This way, even if GrowthBook refreshes mid-session, schema stays fixed.

It’s like carefully reading the menu on your first restaurant visit; subsequent visits you order your familiar dishes.

A detail: StructuredOutput tool name is fixed, but different calls pass different schemas. So cache key can’t use name alone - must include schema content.

There was once a bug: caching by name only, caused different workflows to use wrong schemas, error rate skyrocketed from 5.4% to 51%.

The Common Essence of Seven Secrets

Reviewing these optimizations, a decision flow emerges:

Discover dynamic content
Must be in prefix?
    ↓No→ Move to message tail/attachment
    ↓Yes
Can eliminate differences?
    ↓Yes→ Use placeholder/standardization
    ↓No
Can reduce change frequency?
    ↓Yes→ Memoize/reduce precision/session cache
    ↓No
Can limit change magnitude?
    ↓Yes→ Budget control/conditional omission
    ↓No
Mark as dynamic region (doesn't participate in global cache)

Four core principles:

1. Push Dynamic Content Toward Request Tail

Content earlier causes more damage when it changes. So:

  • Date memoization locks system prompt
  • Agent list moves to message attachment
  • Conditional paragraphs ensure prefix doesn’t jitter

2. Reduce Change Frequency

If it must be in prefix, reduce change frequency:

  • Date changes from daily to monthly
  • Skill list controlled by budget
  • Schema cache from per-request to per-session

3. Eliminate User Dimension Differences

Global cache requires all users see identical prefix:

  • $TMPDIR placeholder eliminates UID differences
  • Date memoization eliminates timezone differences

4. Measure First, Then Optimize

Every optimization came from data:

  • 10.2% cost from agent list changes
  • 77% of tool changes are single schema changes
  • GrowthBook flips causing interruption

Without monitoring, these optimizations wouldn’t be discovered.

Practical: How to Improve Your Cache Hit Rate

Understanding these patterns, here’s how to optimize your own AI application:

1. Audit System Prompts

Find dynamic content:

  • Date/time → Memoize or reduce precision
  • Username/paths → Use placeholders
  • Config values → Move to dynamic region or attachments

2. Lock Tool Schemas

Tool definitions should remain unchanged within a session. If dynamic changes are necessary:

  • Consider message attachments instead
  • Use session-level caching
  • Delay-load MCP tools

3. Monitor cache_read_input_tokens

This is the only metric for judging cache effectiveness:

  • If consistently declining, cache interruptions occurring
  • Correlate with logs to find change source
  • Apply appropriate optimization patterns

4. Understand Prefix Ordering

Content before cache_control breakpoints changes invalidate that breakpoint’s cache. When constructing requests:

  • Put most stable content first
  • Put dynamic content later or at tail

Common Pitfalls

PitfallCauseSolution
Timestamps embedded in system promptChanges every requestMemoize
Dynamic tool listMCP connection changesAttachment mechanism
User-specific pathsDifferent for different usersEnvironment variable placeholders
Feature flags affecting schemaRemote config refreshSession-level cache
Frequent model switchingModel is part of cache keyFix model selection

Summary

Cache optimization is Claude Code’s key to cost reduction:

  • Seven secrets: Date memoization, monthly granularity, agent list attachment, skill budget, $TMPDIR placeholder, conditional omission, schema caching
  • Core philosophy: Reduce change, maintain stability
  • Four principles: Push to suffix, reduce frequency, eliminate differences, data-driven
  • Practical points: Audit prompts, lock schemas, monitor metrics, understand prefix

It’s like running a restaurant:

  • Stable menu, customers order quickly (cache hit)
  • Special info told verbally, doesn’t affect menu (attachment mechanism)
  • Daily ingredients fresh, but menu unchanged (memoization)
  • Use data to optimize, which dishes are popular (measurement-driven)

Understanding these patterns lets you:

  • Diagnose cache issues in your AI application
  • Apply corresponding optimization strategies
  • Significantly reduce API costs

Next up: Permission System - Installing “Safety Brakes” on AI.