Shortcomings - What Claude Code Still Does Imperfectly

After discussing so many of Claude Code’s strengths, you might think it’s flawless. But the truth is: no system is perfect, only systems that keep improving.

Today, let’s talk about Claude Code’s shortcomings - not to pour cold water, but to objectively understand its boundaries so you can use it better.

Shortcoming 1: Context Window Ceiling

Have you encountered this: the project is too large, Claude Code can’t read all the code, can only read part by part, resulting in limited global perspective?

Is 200K Tokens Really Enough?

Claude Code’s context window is 200K tokens, which sounds like a lot, but actual consumption is faster than imagined:

System prompts: 15-20K
Skill list (100 skills): ~8K
One file read (2000 lines): 5-20K
Code search results (10 results): 10-30K
After several rounds of tool calls: already half used

For large codebases (like Linux kernel, Chromium browser), 200K can’t even fit the “directory tree,” let alone detailed content.

Limitations of Existing Solutions

Claude Code uses various tricks to mitigate this problem:

Compression: turns old conversations into summaries, freeing space
Paginated reading: large files read partially only
Selective restoration: after compression, only restore recently used files

But these are all “making do within limited space,” not truly expanding capacity. When project scale exceeds a certain threshold, Claude Code can only “admire a leopard through a tube,” unable to see the whole picture.

This is like: giving you a 200-page capacity folder to organize a library - no matter how skilled the technique, what doesn’t fit simply doesn’t fit.

Shortcoming 2: Tool Latency Accumulation

Have you felt that Claude Code sometimes “thinks” for quite a while? Especially in complex tasks, one round of conversation takes dozens of seconds.

Where Does Latency Come From

Every tool call has latency:

API round-trip: send request → model generates → return result (2-10 seconds)
Tool execution: read file, search code, execute command (0.1-5 seconds)
Multi-round iteration: complex tasks require multiple tool calls (10-50 rounds)

A task “help me refactor this module” might need:

Search related files (3-5 rounds)
Read critical code (5-10 rounds)
Edit multiple files (5-10 rounds)
Run tests to verify (2-5 rounds)

Each round 2-10 seconds, total is several minutes. This is the smooth case - if errors occur in the middle and retries are needed, even longer.

Limitations of Parallelization

Claude Code supports parallel tool calls (send multiple tools at once), but this only solves “width” not “depth.” If tasks have dependencies (must find file before editing), parallelization doesn’t help.

This is like: you have a super-smart consultant, but each question takes a few seconds to answer, and they can only process one step at a time. A smart brain is dragged down by slow “hands and feet.”

Shortcoming 3: Cost-Quality Tradeoff

Have you thought about how much it costs to write code with Claude Code?

Real Cost of Token Consumption

Claude Code API calls aren’t free:

Input tokens: usually more expensive (because contains lots of context)
Output tokens: relatively cheap
Cache hit: 90% cheaper
Cache miss: full price

One moderately complex task might consume:

Input: 500K-2M tokens
Output: 50K-200K tokens
At current prices: a few dimes to a few yuan

Doesn’t sound like much? But if you use it writing code for 8 hours every day:

Daily: dozens to hundreds of tasks
Monthly: hundreds to thousands of yuan

Complexity from Optimization

To control costs, Claude Code does lots of optimizations:

Prompt caching (cache_creation inputs 90% cheaper)
Smart compression (reduces input tokens)
Token budget (limits tool result size)

But these optimizations also increase system complexity. Cache break detection, compression strategy tuning, budget allocation - none of these are “free,” requiring engineering investment and maintenance costs.

This is like: driving a high-performance car, but fuel consumption isn’t low. You can use various tricks to save gas, but either sacrifice speed or increase complexity.

Shortcoming 4: Lack of Offline Capability

Have you encountered network disconnection where Claude Code is completely unusable?

Limitations of Complete Cloud Dependency

Claude Code is “cloud-native” - all model inference happens on Anthropic’s servers. This means:

No network means no work: can’t use offline locally
API failure means stoppage: if Anthropic servers have issues, you suffer too
Data must be uploaded: code must be sent to the cloud to process

For certain scenarios, this is a hard limitation:

No WiFi on airplanes, can’t write code if you want to
Company intranet isolation, can’t connect externally
Involving sensitive code, don’t want to upload to third parties

Gap with Local Models

Someone might say: why not run an open-source model locally?

Theoretically possible, but practically the gap is large:

Capability gap: local models’ code ability usually weaker than Claude
Tool integration: no tool system as complete as Claude Code
Context length: local models usually can’t support 200K context

This is like: you have a super-smart remote assistant, but they can only work remotely. Once the network cuts out, you’re on your own.

Shortcoming 5: Complex Logic Limitations

Have you noticed that Claude Code handles simple tasks smoothly, but “flops” when encountering complex algorithms?

Model Capability Boundaries

Claude Code is powered by large language models, which have inherent limitations:

Weak symbolic reasoning: complex math proofs, algorithm derivations often error-prone
Poor long-range dependencies: complex logic relationships spanning multiple files easily “forgotten”
Boundary condition blind spots: easily miss exceptional cases, boundary conditions

For example:

“Implement a red-black tree” - might write the basic structure, but balance operations often error-prone
“Optimize this SQL query” - might give suggestions, but complex query plan analysis not necessarily accurate
“Refactor this concurrency module” - might introduce race conditions

Necessity of Verification Mechanisms

Because of these limitations, Claude Code needs:

Verification Agent: specifically verifies implementation correctness
Test requirements: run tests to verify modifications
YOLO Classifier: requests confirmation for high-risk operations

These aren’t “icing on the cake,” but “necessary safety nets.”

This is like: you hired a smart but somewhat careless assistant. Handles daily affairs well, but for important documents you must review them yourself.

Shortcoming 6: Memory System Boundaries

Have you noticed that Claude Code’s “memory” is sometimes unreliable? After switching sessions, some details are forgotten.

Limitations of Cross-Session Memory

Claude Code has a cross-session memory system (Memdir, Extract Memories, Auto-Dream), but it has boundaries:

Granularity issue: memory is coarse-grained topic files, not fine-grained conversation records
Latency issue: Auto-Dream is overnight consolidation, new information doesn’t take effect immediately
Accuracy issue: automatic extraction might miss critical information or extract incorrectly
Privacy issue: sensitive information might be written to memory files

Gap with “Real Memory”

Human memory is:

Immediate effect: just learned something and remembered
Fine-grained: can recall specific details
Rich associations: related content automatically connected

Claude Code’s memory is:

Delayed effect: must wait for next session or overnight consolidation
Coarse-grained: only summaries, no details
Passive retrieval: doesn’t automatically associate

This is like: your assistant has a notebook recording important matters, but after shift change, the new assistant can only see summaries from the notebook, not the detailed discussions from before.

How to View These Shortcomings

Shortcomings Are Results of Design Tradeoffs

These limitations aren’t “bugs,” but results of design tradeoffs:

Limitation	Design Choice	What If Flipped
Limited context	Controllable cost and latency	Unlimited context = unlimited cost + latency
Cloud dependent	Use strongest models	Local running = greatly reduced capability
Non-zero cost	High-quality service	Free = unsustainable service quality
Complex logic weak	Strong general capability	Specialized symbolic reasoning = weaker natural language

Behind each “shortcoming” is “sufficient in another dimension.”

Using the Right Scenario Matters Most

Claude Code is suitable for:

Medium-sized projects (main code fits in context)
Iterative development (can accept multi-round latency)
Cost-sensitive but controllable (willing to pay for efficiency)
Has network environment (cloud dependency acceptable)
Assisted not replaced (humans still review)

Claude Code is not suitable for:

Very large projects (need global understanding)
Extremely high real-time requirements (latency unacceptable)
Extremely cost-sensitive (free is the only option)
Completely offline environments (can’t connect to network)
Zero-error scenarios (cannot have any errors)

Future Improvement Directions

Short-Term Achievable

Larger context windows: as models upgrade, context might expand to 500K or even 1M
Faster inference speed: optimize model architecture and inference infrastructure
Better local model support: maybe someday can run near-cloud-quality models locally
Smarter memory systems: more precise extraction, more timely consolidation

Long-Term Potentially Achievable

True persistent state: like humans maintaining complete context across sessions
Zero-latency tool calls: local execution + cloud inference hybrid architecture
Enhanced symbolic reasoning: combining neural networks and symbolic systems
Cost approaching zero: marginal cost reduction from technological progress

Summary

Claude Code’s six major shortcomings:

Shortcoming	Core Manifestation	Coping Strategy
Context ceiling	200K tokens can’t hold large projects	Modular development, batch processing
Tool latency accumulation	Complex tasks need multiple rounds, time accumulates	Parallelization, task splitting
Cost-quality tradeoff	High quality = high cost	Cache optimization, budget control
Lack of offline capability	No network means no work	Plan ahead, offline backup plan
Complex logic limitations	Algorithms, boundary conditions easily err	Verification mechanisms, human review
Memory system boundaries	Cross-session memory coarse-grained, delayed	Proactive memory management, CLAUDE.md supplement

These shortcomings don’t mean Claude Code is hard to use - it’s still one of the most advanced AI coding assistants today. But understanding boundaries makes using it smoother:

Know what it’s good at - daily coding, standard refactoring, code review
Know what it’s not good at - complex algorithms, very large projects, zero-error scenarios
Know when to intervene - key decisions, boundary conditions, test verification

This is like: understanding your car’s top speed, fuel consumption, off-road capability - not limitations, but what makes you drive more safely and smoothly.

This concludes the main text of the Harness Engineering series. Next: Appendix - File Index, Environment Variables Reference, Glossary, and Feature Flag Checklist.

Shortcoming 1: Context Window Ceiling#

Is 200K Tokens Really Enough?#

Limitations of Existing Solutions#

Shortcoming 2: Tool Latency Accumulation#

Where Does Latency Come From#

Limitations of Parallelization#

Shortcoming 3: Cost-Quality Tradeoff#

Real Cost of Token Consumption#

Complexity from Optimization#

Shortcoming 4: Lack of Offline Capability#

Limitations of Complete Cloud Dependency#

Gap with Local Models#

Shortcoming 5: Complex Logic Limitations#

Model Capability Boundaries#

Necessity of Verification Mechanisms#

Shortcoming 6: Memory System Boundaries#

Limitations of Cross-Session Memory#

Gap with “Real Memory”#

How to View These Shortcomings#

Shortcomings Are Results of Design Tradeoffs#

Using the Right Scenario Matters Most#

Future Improvement Directions#

Short-Term Achievable#

Long-Term Potentially Achievable#

Summary#