Cache Optimization Patterns - Making Cache Hit Rates Soar

Table of Contents
- Core Philosophy of Cache Optimization
- Secret #1: Date Memoization - Don’t Invalidate Across Midnight
- Secret #2: Monthly Granularity - From Daily to Monthly Changes
- Secret #3: Moving Agent List from Tool Description to Message Attachment
- Secret #4: Skill List Budget - 1% Cap Control
- Secret #5: $TMPDIR Placeholder - Eliminating User Differences
- Secret #6: Conditional Paragraph Omission - Better Silent Than Sorry
- Secret #7: Tool Schema Cache - Session-Level Locking
- The Common Essence of Seven Secrets
- Practical: How to Improve Your Cache Hit Rate
- Common Pitfalls
- Summary
Have you noticed that with Claude Code, sometimes responses are fast, sometimes slow like the AI is contemplating life? This often relates to “cache hit rate” - cache hits mean instant responses; misses require recalculation.
Today we’ll discuss Claude Code’s seven “cache optimization secrets” and how it pushes cache hit rates to the extreme.
Core Philosophy of Cache Optimization
The essence of cache optimization is: Reduce change, maintain stability.
Imagine ordering at a restaurant:
- Cache hit: Waiter remembers what you like, orders directly
- Cache miss: Has to ask what you want every time
API cache works the same way. Every request sends a large chunk of “system prompt + tool descriptions” to the server. If this content changes, the server must reprocess - time and money wasted.
So the optimization direction is: Keep this content as unchanged as possible, or minimize the impact of changes.
Secret #1: Date Memoization - Don’t Invalidate Across Midnight
A problem困扰 many AI applications: crossing midnight.
Say at 11:59 PM you ask AI a question, system prompt reads “Today is 2026-04-01”. Then at 12:01 AM you ask another, the date becomes “2026-04-02”.
Just this one character change invalidates the entire system prompt cache - about 11,000 tokens need recalculation.
Claude Code’s solution: Date memoization.
getSessionStartDate = Remember the first date obtained, use this forever
This way, even after midnight, the date in the system prompt remains “yesterday” - cache doesn’t invalidate.
How does AI know “today” changed? Claude Code appends the new date at the message tail. Tail changes don’t affect prefix cache.
It’s like telling the waiter: “I always order the same dish,” just with dessert added today - the main course doesn’t need reintroduction.
Secret #2: Monthly Granularity - From Daily to Monthly Changes
System prompt handled, but tool prompts also need time info. Using full dates (2026-04-01) changes daily.
Solution: Use only month (April 2026).
Change frequency drops from daily to monthly - 30x stability improvement.
It’s like buying a monthly pass instead of daily tickets.
The division of labor for the two time precisions:
- System prompt: Accurate to day, but memoized (once per session)
- Tool prompt: Accurate to month (once per month)
Secret #3: Moving Agent List from Tool Description to Message Attachment
This is the highest-impact optimization - eliminating 10.2% of full cache reconstruction cost.
Where’s the problem? AgentTool descriptions list all available agents (including those from MCP servers). But this list is dynamic:
- MCP servers connect asynchronously
- Plugin refreshes
- Permission mode changes
Each list change alters the AgentTool schema, invalidating the entire tool schema cache.
Solution: Move the dynamic list out of tool descriptions, into message attachments.
Tool descriptions become static, only describing functionality; available agent list appended as attachment at message tail.
It’s like:
- Before: Menu printed with “Today’s specials depend on chef’s mood”
- Now: Menu fixed, waiter tells you today’s specials verbally
Menu stays the same, cache stable; special info goes at tail, doesn’t affect prefix.
Secret #4: Skill List Budget - 1% Cap Control
SkillTool faces a similar problem: skill list changes dynamically.
Solution: Strictly cap at 1% of context window.
For a 200K window, that’s 8,000 characters. If the list exceeds this, it’s truncated.
Benefits:
- Limits tool description size, reducing byte-match difficulty
- When budget is full, new skills aren’t added, list doesn’t change, cache doesn’t break
It’s like your closet: when full, stop buying new clothes, keep it organized.
Secret #5: $TMPDIR Placeholder - Eliminating User Differences
BashTool prompts need to tell AI where the temp directory is. Usually /private/tmp/claude-{UID}/, with different UIDs for different users.
This means: Different users see different prompts, can’t share global cache.
Solution: Use $TMPDIR placeholder instead of specific paths.
Claude Code’s sandbox sets the $TMPDIR variable, AI references temp directory with $TMPDIR works the same.
But all users see identical prompts - all say $TMPDIR - cache can be shared.
It’s like:
- Before: Menu printed with “Our address: Beijing Chaoyang District xxx” (each branch prints different)
- Now: Menu printed with “Our address: See store sign” (all branches share same menu)
Secret #6: Conditional Paragraph Omission - Better Silent Than Sorry
Some paragraphs in system prompts are conditional: add an explanation when a feature is enabled.
Problem: If this condition changes mid-session (like remote config update), paragraph appearing/disappearing changes the prompt, cache breaks.
Solution: Better silent than sorry.
Specific approaches:
- If unimportant, always include or always omit
- If must be conditional, put in dynamic region (doesn’t participate in global cache)
- Use attachment mechanism instead of inline conditionals
It’s like promising a friend: “Come if you can, if unsure don’t commit” - better than committing then backing out.
Secret #7: Tool Schema Cache - Session-Level Locking
Tool schema generation involves multiple runtime decisions:
- GrowthBook feature flags
- Dynamic tool descriptions
- MCP tool schemas
If regenerated every request, any GrowthBook change causes schema changes.
Solution: Session-level caching.
TOOL_SCHEMA_CACHE = new Map()
First request: Generate schema, store in cache
Subsequent requests: Use cache directly, no regeneration
This way, even if GrowthBook refreshes mid-session, schema stays fixed.
It’s like carefully reading the menu on your first restaurant visit; subsequent visits you order your familiar dishes.
A detail: StructuredOutput tool name is fixed, but different calls pass different schemas. So cache key can’t use name alone - must include schema content.
There was once a bug: caching by name only, caused different workflows to use wrong schemas, error rate skyrocketed from 5.4% to 51%.
The Common Essence of Seven Secrets
Reviewing these optimizations, a decision flow emerges:
Discover dynamic content
↓
Must be in prefix?
↓No→ Move to message tail/attachment
↓Yes
Can eliminate differences?
↓Yes→ Use placeholder/standardization
↓No
Can reduce change frequency?
↓Yes→ Memoize/reduce precision/session cache
↓No
Can limit change magnitude?
↓Yes→ Budget control/conditional omission
↓No
Mark as dynamic region (doesn't participate in global cache)
Four core principles:
1. Push Dynamic Content Toward Request Tail
Content earlier causes more damage when it changes. So:
- Date memoization locks system prompt
- Agent list moves to message attachment
- Conditional paragraphs ensure prefix doesn’t jitter
2. Reduce Change Frequency
If it must be in prefix, reduce change frequency:
- Date changes from daily to monthly
- Skill list controlled by budget
- Schema cache from per-request to per-session
3. Eliminate User Dimension Differences
Global cache requires all users see identical prefix:
$TMPDIRplaceholder eliminates UID differences- Date memoization eliminates timezone differences
4. Measure First, Then Optimize
Every optimization came from data:
- 10.2% cost from agent list changes
- 77% of tool changes are single schema changes
- GrowthBook flips causing interruption
Without monitoring, these optimizations wouldn’t be discovered.
Practical: How to Improve Your Cache Hit Rate
Understanding these patterns, here’s how to optimize your own AI application:
1. Audit System Prompts
Find dynamic content:
- Date/time → Memoize or reduce precision
- Username/paths → Use placeholders
- Config values → Move to dynamic region or attachments
2. Lock Tool Schemas
Tool definitions should remain unchanged within a session. If dynamic changes are necessary:
- Consider message attachments instead
- Use session-level caching
- Delay-load MCP tools
3. Monitor cache_read_input_tokens
This is the only metric for judging cache effectiveness:
- If consistently declining, cache interruptions occurring
- Correlate with logs to find change source
- Apply appropriate optimization patterns
4. Understand Prefix Ordering
Content before cache_control breakpoints changes invalidate that breakpoint’s cache. When constructing requests:
- Put most stable content first
- Put dynamic content later or at tail
Common Pitfalls
| Pitfall | Cause | Solution |
|---|---|---|
| Timestamps embedded in system prompt | Changes every request | Memoize |
| Dynamic tool list | MCP connection changes | Attachment mechanism |
| User-specific paths | Different for different users | Environment variable placeholders |
| Feature flags affecting schema | Remote config refresh | Session-level cache |
| Frequent model switching | Model is part of cache key | Fix model selection |
Summary
Cache optimization is Claude Code’s key to cost reduction:
- Seven secrets: Date memoization, monthly granularity, agent list attachment, skill budget, $TMPDIR placeholder, conditional omission, schema caching
- Core philosophy: Reduce change, maintain stability
- Four principles: Push to suffix, reduce frequency, eliminate differences, data-driven
- Practical points: Audit prompts, lock schemas, monitor metrics, understand prefix
It’s like running a restaurant:
- Stable menu, customers order quickly (cache hit)
- Special info told verbally, doesn’t affect menu (attachment mechanism)
- Daily ingredients fresh, but menu unchanged (memoization)
- Use data to optimize, which dishes are popular (measurement-driven)
Understanding these patterns lets you:
- Diagnose cache issues in your AI application
- Apply corresponding optimization strategies
- Significantly reduce API costs
Next up: Permission System - Installing “Safety Brakes” on AI.
