Cache Optimization Patterns - Making Cache Hit Rates Soar

Have you noticed that with Claude Code, sometimes responses are fast, sometimes slow like the AI is contemplating life? This often relates to “cache hit rate” - cache hits mean instant responses; misses require recalculation.

Today we’ll discuss Claude Code’s seven “cache optimization secrets” and how it pushes cache hit rates to the extreme.

Core Philosophy of Cache Optimization

The essence of cache optimization is: Reduce change, maintain stability.

Imagine ordering at a restaurant:

Cache hit: Waiter remembers what you like, orders directly
Cache miss: Has to ask what you want every time

API cache works the same way. Every request sends a large chunk of “system prompt + tool descriptions” to the server. If this content changes, the server must reprocess - time and money wasted.

So the optimization direction is: Keep this content as unchanged as possible, or minimize the impact of changes.

Secret #1: Date Memoization - Don’t Invalidate Across Midnight

A problem困扰 many AI applications: crossing midnight.

Say at 11:59 PM you ask AI a question, system prompt reads “Today is 2026-04-01”. Then at 12:01 AM you ask another, the date becomes “2026-04-02”.

Just this one character change invalidates the entire system prompt cache - about 11,000 tokens need recalculation.

Claude Code’s solution: Date memoization.

getSessionStartDate = Remember the first date obtained, use this forever

This way, even after midnight, the date in the system prompt remains “yesterday” - cache doesn’t invalidate.

How does AI know “today” changed? Claude Code appends the new date at the message tail. Tail changes don’t affect prefix cache.

It’s like telling the waiter: “I always order the same dish,” just with dessert added today - the main course doesn’t need reintroduction.

Secret #2: Monthly Granularity - From Daily to Monthly Changes

System prompt handled, but tool prompts also need time info. Using full dates (2026-04-01) changes daily.

Solution: Use only month (April 2026).

Change frequency drops from daily to monthly - 30x stability improvement.

It’s like buying a monthly pass instead of daily tickets.

The division of labor for the two time precisions:

System prompt: Accurate to day, but memoized (once per session)
Tool prompt: Accurate to month (once per month)

Secret #3: Moving Agent List from Tool Description to Message Attachment

This is the highest-impact optimization - eliminating 10.2% of full cache reconstruction cost.

Where’s the problem? AgentTool descriptions list all available agents (including those from MCP servers). But this list is dynamic:

MCP servers connect asynchronously
Plugin refreshes
Permission mode changes

Each list change alters the AgentTool schema, invalidating the entire tool schema cache.

Solution: Move the dynamic list out of tool descriptions, into message attachments.

Tool descriptions become static, only describing functionality; available agent list appended as attachment at message tail.

It’s like:

Before: Menu printed with “Today’s specials depend on chef’s mood”
Now: Menu fixed, waiter tells you today’s specials verbally

Menu stays the same, cache stable; special info goes at tail, doesn’t affect prefix.

Secret #4: Skill List Budget - 1% Cap Control

SkillTool faces a similar problem: skill list changes dynamically.

Solution: Strictly cap at 1% of context window.

For a 200K window, that’s 8,000 characters. If the list exceeds this, it’s truncated.

Benefits:

Limits tool description size, reducing byte-match difficulty
When budget is full, new skills aren’t added, list doesn’t change, cache doesn’t break

It’s like your closet: when full, stop buying new clothes, keep it organized.

Secret #5: $TMPDIR Placeholder - Eliminating User Differences

BashTool prompts need to tell AI where the temp directory is. Usually /private/tmp/claude-{UID}/, with different UIDs for different users.

This means: Different users see different prompts, can’t share global cache.

Solution: Use $TMPDIR placeholder instead of specific paths.

Claude Code’s sandbox sets the $TMPDIR variable, AI references temp directory with $TMPDIR works the same.

But all users see identical prompts - all say $TMPDIR - cache can be shared.

It’s like:

Before: Menu printed with “Our address: Beijing Chaoyang District xxx” (each branch prints different)
Now: Menu printed with “Our address: See store sign” (all branches share same menu)

Secret #6: Conditional Paragraph Omission - Better Silent Than Sorry

Some paragraphs in system prompts are conditional: add an explanation when a feature is enabled.

Problem: If this condition changes mid-session (like remote config update), paragraph appearing/disappearing changes the prompt, cache breaks.

Solution: Better silent than sorry.

Specific approaches:

If unimportant, always include or always omit
If must be conditional, put in dynamic region (doesn’t participate in global cache)
Use attachment mechanism instead of inline conditionals

It’s like promising a friend: “Come if you can, if unsure don’t commit” - better than committing then backing out.

Secret #7: Tool Schema Cache - Session-Level Locking

Tool schema generation involves multiple runtime decisions:

GrowthBook feature flags
Dynamic tool descriptions
MCP tool schemas

If regenerated every request, any GrowthBook change causes schema changes.

Solution: Session-level caching.

TOOL_SCHEMA_CACHE = new Map()

First request: Generate schema, store in cache
Subsequent requests: Use cache directly, no regeneration

This way, even if GrowthBook refreshes mid-session, schema stays fixed.

It’s like carefully reading the menu on your first restaurant visit; subsequent visits you order your familiar dishes.

A detail: StructuredOutput tool name is fixed, but different calls pass different schemas. So cache key can’t use name alone - must include schema content.

There was once a bug: caching by name only, caused different workflows to use wrong schemas, error rate skyrocketed from 5.4% to 51%.

The Common Essence of Seven Secrets

Reviewing these optimizations, a decision flow emerges:

Discover dynamic content
    ↓
Must be in prefix?
    ↓No→ Move to message tail/attachment
    ↓Yes
Can eliminate differences?
    ↓Yes→ Use placeholder/standardization
    ↓No
Can reduce change frequency?
    ↓Yes→ Memoize/reduce precision/session cache
    ↓No
Can limit change magnitude?
    ↓Yes→ Budget control/conditional omission
    ↓No
Mark as dynamic region (doesn't participate in global cache)

Four core principles:

1. Push Dynamic Content Toward Request Tail

Content earlier causes more damage when it changes. So:

Date memoization locks system prompt
Agent list moves to message attachment
Conditional paragraphs ensure prefix doesn’t jitter

2. Reduce Change Frequency

If it must be in prefix, reduce change frequency:

Date changes from daily to monthly
Skill list controlled by budget
Schema cache from per-request to per-session

3. Eliminate User Dimension Differences

Global cache requires all users see identical prefix:

$TMPDIR placeholder eliminates UID differences
Date memoization eliminates timezone differences

4. Measure First, Then Optimize

Every optimization came from data:

10.2% cost from agent list changes
77% of tool changes are single schema changes
GrowthBook flips causing interruption

Without monitoring, these optimizations wouldn’t be discovered.

Practical: How to Improve Your Cache Hit Rate

Understanding these patterns, here’s how to optimize your own AI application:

1. Audit System Prompts

Find dynamic content:

Date/time → Memoize or reduce precision
Username/paths → Use placeholders
Config values → Move to dynamic region or attachments

2. Lock Tool Schemas

Tool definitions should remain unchanged within a session. If dynamic changes are necessary:

Consider message attachments instead
Use session-level caching
Delay-load MCP tools

3. Monitor cache_read_input_tokens

This is the only metric for judging cache effectiveness:

If consistently declining, cache interruptions occurring
Correlate with logs to find change source
Apply appropriate optimization patterns

4. Understand Prefix Ordering

Content before cache_control breakpoints changes invalidate that breakpoint’s cache. When constructing requests:

Put most stable content first
Put dynamic content later or at tail

Common Pitfalls

Pitfall	Cause	Solution
Timestamps embedded in system prompt	Changes every request	Memoize
Dynamic tool list	MCP connection changes	Attachment mechanism
User-specific paths	Different for different users	Environment variable placeholders
Feature flags affecting schema	Remote config refresh	Session-level cache
Frequent model switching	Model is part of cache key	Fix model selection

Summary

Cache optimization is Claude Code’s key to cost reduction:

Seven secrets: Date memoization, monthly granularity, agent list attachment, skill budget, $TMPDIR placeholder, conditional omission, schema caching
Core philosophy: Reduce change, maintain stability
Four principles: Push to suffix, reduce frequency, eliminate differences, data-driven
Practical points: Audit prompts, lock schemas, monitor metrics, understand prefix

It’s like running a restaurant:

Stable menu, customers order quickly (cache hit)
Special info told verbally, doesn’t affect menu (attachment mechanism)
Daily ingredients fresh, but menu unchanged (memoization)
Use data to optimize, which dishes are popular (measurement-driven)

Understanding these patterns lets you:

Diagnose cache issues in your AI application
Apply corresponding optimization strategies
Significantly reduce API costs

Next up: Permission System - Installing “Safety Brakes” on AI.

Core Philosophy of Cache Optimization#

Secret #1: Date Memoization - Don’t Invalidate Across Midnight#

Secret #2: Monthly Granularity - From Daily to Monthly Changes#

Secret #3: Moving Agent List from Tool Description to Message Attachment#

Secret #4: Skill List Budget - 1% Cap Control#

Secret #5: $TMPDIR Placeholder - Eliminating User Differences#

Secret #6: Conditional Paragraph Omission - Better Silent Than Sorry#

Secret #7: Tool Schema Cache - Session-Level Locking#

The Common Essence of Seven Secrets#

1. Push Dynamic Content Toward Request Tail#

2. Reduce Change Frequency#

3. Eliminate User Dimension Differences#

4. Measure First, Then Optimize#

Practical: How to Improve Your Cache Hit Rate#

1. Audit System Prompts#

2. Lock Tool Schemas#

3. Monitor cache_read_input_tokens#

4. Understand Prefix Ordering#

Common Pitfalls#

Summary#