Shortcomings - What Claude Code Still Does Imperfectly

Table of Contents
After discussing so many of Claude Code’s strengths, you might think it’s flawless. But the truth is: no system is perfect, only systems that keep improving.
Today, let’s talk about Claude Code’s shortcomings - not to pour cold water, but to objectively understand its boundaries so you can use it better.
Shortcoming 1: Context Window Ceiling
Have you encountered this: the project is too large, Claude Code can’t read all the code, can only read part by part, resulting in limited global perspective?
Is 200K Tokens Really Enough?
Claude Code’s context window is 200K tokens, which sounds like a lot, but actual consumption is faster than imagined:
- System prompts: 15-20K
- Skill list (100 skills): ~8K
- One file read (2000 lines): 5-20K
- Code search results (10 results): 10-30K
- After several rounds of tool calls: already half used
For large codebases (like Linux kernel, Chromium browser), 200K can’t even fit the “directory tree,” let alone detailed content.
Limitations of Existing Solutions
Claude Code uses various tricks to mitigate this problem:
- Compression: turns old conversations into summaries, freeing space
- Paginated reading: large files read partially only
- Selective restoration: after compression, only restore recently used files
But these are all “making do within limited space,” not truly expanding capacity. When project scale exceeds a certain threshold, Claude Code can only “admire a leopard through a tube,” unable to see the whole picture.
This is like: giving you a 200-page capacity folder to organize a library - no matter how skilled the technique, what doesn’t fit simply doesn’t fit.
Shortcoming 2: Tool Latency Accumulation
Have you felt that Claude Code sometimes “thinks” for quite a while? Especially in complex tasks, one round of conversation takes dozens of seconds.
Where Does Latency Come From
Every tool call has latency:
- API round-trip: send request → model generates → return result (2-10 seconds)
- Tool execution: read file, search code, execute command (0.1-5 seconds)
- Multi-round iteration: complex tasks require multiple tool calls (10-50 rounds)
A task “help me refactor this module” might need:
- Search related files (3-5 rounds)
- Read critical code (5-10 rounds)
- Edit multiple files (5-10 rounds)
- Run tests to verify (2-5 rounds)
Each round 2-10 seconds, total is several minutes. This is the smooth case - if errors occur in the middle and retries are needed, even longer.
Limitations of Parallelization
Claude Code supports parallel tool calls (send multiple tools at once), but this only solves “width” not “depth.” If tasks have dependencies (must find file before editing), parallelization doesn’t help.
This is like: you have a super-smart consultant, but each question takes a few seconds to answer, and they can only process one step at a time. A smart brain is dragged down by slow “hands and feet.”
Shortcoming 3: Cost-Quality Tradeoff
Have you thought about how much it costs to write code with Claude Code?
Real Cost of Token Consumption
Claude Code API calls aren’t free:
- Input tokens: usually more expensive (because contains lots of context)
- Output tokens: relatively cheap
- Cache hit: 90% cheaper
- Cache miss: full price
One moderately complex task might consume:
- Input: 500K-2M tokens
- Output: 50K-200K tokens
- At current prices: a few dimes to a few yuan
Doesn’t sound like much? But if you use it writing code for 8 hours every day:
- Daily: dozens to hundreds of tasks
- Monthly: hundreds to thousands of yuan
Complexity from Optimization
To control costs, Claude Code does lots of optimizations:
- Prompt caching (cache_creation inputs 90% cheaper)
- Smart compression (reduces input tokens)
- Token budget (limits tool result size)
But these optimizations also increase system complexity. Cache break detection, compression strategy tuning, budget allocation - none of these are “free,” requiring engineering investment and maintenance costs.
This is like: driving a high-performance car, but fuel consumption isn’t low. You can use various tricks to save gas, but either sacrifice speed or increase complexity.
Shortcoming 4: Lack of Offline Capability
Have you encountered network disconnection where Claude Code is completely unusable?
Limitations of Complete Cloud Dependency
Claude Code is “cloud-native” - all model inference happens on Anthropic’s servers. This means:
- No network means no work: can’t use offline locally
- API failure means stoppage: if Anthropic servers have issues, you suffer too
- Data must be uploaded: code must be sent to the cloud to process
For certain scenarios, this is a hard limitation:
- No WiFi on airplanes, can’t write code if you want to
- Company intranet isolation, can’t connect externally
- Involving sensitive code, don’t want to upload to third parties
Gap with Local Models
Someone might say: why not run an open-source model locally?
Theoretically possible, but practically the gap is large:
- Capability gap: local models’ code ability usually weaker than Claude
- Tool integration: no tool system as complete as Claude Code
- Context length: local models usually can’t support 200K context
This is like: you have a super-smart remote assistant, but they can only work remotely. Once the network cuts out, you’re on your own.
Shortcoming 5: Complex Logic Limitations
Have you noticed that Claude Code handles simple tasks smoothly, but “flops” when encountering complex algorithms?
Model Capability Boundaries
Claude Code is powered by large language models, which have inherent limitations:
- Weak symbolic reasoning: complex math proofs, algorithm derivations often error-prone
- Poor long-range dependencies: complex logic relationships spanning multiple files easily “forgotten”
- Boundary condition blind spots: easily miss exceptional cases, boundary conditions
For example:
- “Implement a red-black tree” - might write the basic structure, but balance operations often error-prone
- “Optimize this SQL query” - might give suggestions, but complex query plan analysis not necessarily accurate
- “Refactor this concurrency module” - might introduce race conditions
Necessity of Verification Mechanisms
Because of these limitations, Claude Code needs:
- Verification Agent: specifically verifies implementation correctness
- Test requirements: run tests to verify modifications
- YOLO Classifier: requests confirmation for high-risk operations
These aren’t “icing on the cake,” but “necessary safety nets.”
This is like: you hired a smart but somewhat careless assistant. Handles daily affairs well, but for important documents you must review them yourself.
Shortcoming 6: Memory System Boundaries
Have you noticed that Claude Code’s “memory” is sometimes unreliable? After switching sessions, some details are forgotten.
Limitations of Cross-Session Memory
Claude Code has a cross-session memory system (Memdir, Extract Memories, Auto-Dream), but it has boundaries:
- Granularity issue: memory is coarse-grained topic files, not fine-grained conversation records
- Latency issue: Auto-Dream is overnight consolidation, new information doesn’t take effect immediately
- Accuracy issue: automatic extraction might miss critical information or extract incorrectly
- Privacy issue: sensitive information might be written to memory files
Gap with “Real Memory”
Human memory is:
- Immediate effect: just learned something and remembered
- Fine-grained: can recall specific details
- Rich associations: related content automatically connected
Claude Code’s memory is:
- Delayed effect: must wait for next session or overnight consolidation
- Coarse-grained: only summaries, no details
- Passive retrieval: doesn’t automatically associate
This is like: your assistant has a notebook recording important matters, but after shift change, the new assistant can only see summaries from the notebook, not the detailed discussions from before.
How to View These Shortcomings
Shortcomings Are Results of Design Tradeoffs
These limitations aren’t “bugs,” but results of design tradeoffs:
| Limitation | Design Choice | What If Flipped |
|---|---|---|
| Limited context | Controllable cost and latency | Unlimited context = unlimited cost + latency |
| Cloud dependent | Use strongest models | Local running = greatly reduced capability |
| Non-zero cost | High-quality service | Free = unsustainable service quality |
| Complex logic weak | Strong general capability | Specialized symbolic reasoning = weaker natural language |
Behind each “shortcoming” is “sufficient in another dimension.”
Using the Right Scenario Matters Most
Claude Code is suitable for:
- Medium-sized projects (main code fits in context)
- Iterative development (can accept multi-round latency)
- Cost-sensitive but controllable (willing to pay for efficiency)
- Has network environment (cloud dependency acceptable)
- Assisted not replaced (humans still review)
Claude Code is not suitable for:
- Very large projects (need global understanding)
- Extremely high real-time requirements (latency unacceptable)
- Extremely cost-sensitive (free is the only option)
- Completely offline environments (can’t connect to network)
- Zero-error scenarios (cannot have any errors)
Future Improvement Directions
Short-Term Achievable
- Larger context windows: as models upgrade, context might expand to 500K or even 1M
- Faster inference speed: optimize model architecture and inference infrastructure
- Better local model support: maybe someday can run near-cloud-quality models locally
- Smarter memory systems: more precise extraction, more timely consolidation
Long-Term Potentially Achievable
- True persistent state: like humans maintaining complete context across sessions
- Zero-latency tool calls: local execution + cloud inference hybrid architecture
- Enhanced symbolic reasoning: combining neural networks and symbolic systems
- Cost approaching zero: marginal cost reduction from technological progress
Summary
Claude Code’s six major shortcomings:
| Shortcoming | Core Manifestation | Coping Strategy |
|---|---|---|
| Context ceiling | 200K tokens can’t hold large projects | Modular development, batch processing |
| Tool latency accumulation | Complex tasks need multiple rounds, time accumulates | Parallelization, task splitting |
| Cost-quality tradeoff | High quality = high cost | Cache optimization, budget control |
| Lack of offline capability | No network means no work | Plan ahead, offline backup plan |
| Complex logic limitations | Algorithms, boundary conditions easily err | Verification mechanisms, human review |
| Memory system boundaries | Cross-session memory coarse-grained, delayed | Proactive memory management, CLAUDE.md supplement |
These shortcomings don’t mean Claude Code is hard to use - it’s still one of the most advanced AI coding assistants today. But understanding boundaries makes using it smoother:
- Know what it’s good at - daily coding, standard refactoring, code review
- Know what it’s not good at - complex algorithms, very large projects, zero-error scenarios
- Know when to intervene - key decisions, boundary conditions, test verification
This is like: understanding your car’s top speed, fuel consumption, off-road capability - not limitations, but what makes you drive more safely and smoothly.
This concludes the main text of the Harness Engineering series. Next: Appendix - File Index, Environment Variables Reference, Glossary, and Feature Flag Checklist.
