Agent Loop——How Does an AI's "Brain" Work?

Table of Contents
- A Complete Lifecycle
- query.ts: Three States of the State Machine
- Why “Continue” is Needed
- Streaming Output: Letting Users See the “Thinking Process”
- Tool Execution Orchestration: Parallel vs Sequential
- How is “Continue” Decided?
- Interruption and Cancellation: Users Can Stop Anytime
- Nested Agents: Loop within Loop
- Timeout and Error Handling
- Performance Optimization: Making the Loop Run Faster
- What This Means for You
Have you noticed a detail: when you ask Claude Code to do something complex, like “help me refactor this module,” it doesn’t do it all at once. It first analyzes the code, then executes some tools (like reading files, searching), pauses to look at the results, then decides the next step.
This process repeats several times until it thinks “I’m done.”
This “think → do → look → think again” cycle is the Agent Loop—Claude Code’s most core mechanism. Today we’re going to拆开 this “brain” and see how it works.
The diagram: Agent Loop is like a restaurant kitchen—接单→配菜→炒菜→上菜→问还要不要加菜
A Complete Lifecycle
Processing a complete user request through Agent Loop follows this flow:
User Input
↓
Assemble Context (system prompt + history + tool definitions)
↓
Call Model API
↓
Parse Model Response (may have multiple tool_use blocks)
↓
Execute Tools in Parallel
↓
Wait for All Tools to Complete
↓
Assemble Tool Results
↓
Decision: Continue Loop or Return to User
This loop may execute once (simple Q&A) or multiple times (complex refactoring tasks). The key is the “continue decision” at the end—how does the model know “I’m not done yet”?
query.ts: Three States of the State Machine
The core implementation of Agent Loop is in query.ts, which is a state machine with three main states:
IDLE: Waiting for user input. This is the initial state and also the state after each loop ends.
AWAITING_TOOL_RESULTS: The model returned tool_use requests and is executing tools. In this state, the system executes all requested tools in parallel and collects their results.
CONTINUING: Received tool results, preparing to assemble into new context and call the model again. This state is the loop’s “connector.”
The diagram: Agent Loop’s three-state machine and its transitions
State transition triggers:
- IDLE → AWAITING_TOOL_RESULTS: After user input, the model returned tool_use requests
- AWAITING_TOOL_RESULTS → CONTINUING: All tool executions complete
- CONTINUING → IDLE: The model didn’t return tool_use, directly gave text response
- CONTINUING → AWAITING_TOOL_RESULTS: The model returned new tool_use requests
Why “Continue” is Needed
You might ask: why can’t all tools be executed at once? Why the loop?
Because the model needs to see tool results before deciding what to do next.
For example: you say “help me find this bug.” Claude Code might first search related code (GrepTool), see the results and realize it needs to read a specific file (FileReadTool), read it and realize it needs to understand the function call chain (AgentTool’s find_references), and only then locate the problem.
Each step’s results influence the next step’s decision. This can’t be pre-planned—it needs dynamic adjustment based on intermediate results.
It’s like looking for something: you open a drawer first, nothing there, then open a cabinet, still nothing, and only then realize it was on the sofa. You couldn’t know at the start “open drawer → open cabinet → check sofa”—you have to go step by step.
Streaming Output: Letting Users See the “Thinking Process”
An important feature of Claude Code is streaming output—the model’s response isn’t displayed after complete generation, but shown as it’s being generated.
How is this implemented technically?
The Anthropic API supports streaming responses (SSE, Server-Sent Events), with model-generated content sent in chunks. Claude Code’s frontend (React Ink) receives these chunks in real-time and updates the UI.
But streaming output isn’t just “displaying text.” In the Agent Loop context, streaming output includes:
Text content streaming: The model’s natural language response, appearing character by character.
Tool parameter streaming: When the model generates tool_use blocks, parameters are in JSON format, also stream-generated. This means the UI can start rendering tool calls before parameters are fully generated—users see “Calling GrepTool to search…” without waiting for the entire JSON to parse.
Tool result streaming: Long-running tools (like BashTool executing time-consuming commands) report progress in real-time via onProgress callbacks, with the UI showing stdout output live.
This progressive information display is crucial for user experience. Users don’t stare at a blank screen waiting—they can see the AI “moving” and “thinking.”
Tool Execution Orchestration: Parallel vs Sequential
When the model requests multiple tools in one turn, these tools execute in parallel. But execution order is constrained by two factors:
isConcurrencySafe: Only tools marked as concurrency-safe can execute in parallel with others. If BashTool isn’t a read-only command, it can’t be concurrent; GrepTool can always be concurrent.
Dependencies: If tool B’s parameters depend on tool A’s results, they must execute sequentially. But this is relatively rare in Claude Code—the model usually waits for results before making new tool requests.
Actual execution flow:
Model Returns [toolA, toolB, toolC]
↓
Classify:
- toolA: isConcurrencySafe=true → Execute immediately
- toolB: isConcurrencySafe=false, queue empty → Execute immediately
- toolC: isConcurrencySafe=false, queue has B → Wait for B to complete
↓
Execute A and B in parallel, C waits for B
↓
All tools complete → Assemble results → Continue loop
How is “Continue” Decided?
This is the most core question: how does the model know “I need to continue” versus “I’m done”?
The answer: the model decides itself.
In Claude Code’s system prompt, there’s a section specifically guiding the model on when to stop and when to continue. The general logic is:
- If you’ve completed the user’s request, answer directly without calling tools
- If you need more information to complete the task, call tools
- If you called tools, wait for results, then decide the next step
These aren’t hardcoded rules—they guide model behavior through prompts. This also explains why the model sometimes appears “overly enthusiastic”—it thinks it’s not done, but the user thinks it’s plenty.
Interruption and Cancellation: Users Can Stop Anytime
The Agent Loop supports user interruption. When you press Ctrl+C or click cancel in the UI:
In-progress tool executions are cancelled. BashTool sends SIGTERM to child processes, AgentTool notifies sub-agents to stop.
The state machine returns to IDLE. Completed tool results are discarded, incomplete tool calls are cancelled.
The model receives notification. The system sends a message telling the model “the user cancelled the operation,” and the model can handle this in its next response.
This interruption mechanism keeps users in control—the AI can act autonomously, but users can take over anytime.
Nested Agents: Loop within Loop
Claude Code supports a special tool called AgentTool that can launch sub-agents to execute subtasks. This forms nested Agent Loops:
Main Agent Loop
↓
User Request: Refactor this module
↓
Model Decides to Use AgentTool
↓
Launch Sub-agent (new Loop)
↓
Sub-agent Completes Refactoring → Return Results
↓
Main Agent Continues (may verify results, do other things)
Sub-agents have their own context, their own tool permissions, their own loops. To the outside world, they’re like ordinary tools, but internally they’re complete Agents.
This design allows complex tasks to be conquered through division. The main agent handles high-level planning, sub-agents handle specific execution.
Timeout and Error Handling
The Agent Loop needs to handle various exceptional situations:
Tool execution timeout. Each tool has a timeout limit (e.g., BashTool defaults to 30 seconds), after which it returns an error result, and the model decides whether to retry or try a different approach.
Model API errors. Network issues, API rate limits, model temporarily unavailable, etc. Claude Code has retry logic, but after consecutive failures it reports to the user.
Tool execution errors. Non-zero exit codes from commands, file not found, etc. These errors are returned to the model as part of tool_result, and the model needs to handle them (retry, change approach, or explain to the user).
The principle of error handling: let the model decide. The system doesn’t treat errors as “termination” but as information, letting the model decide what to do next. This aligns with the “on distribution” philosophy—the model participates in decisions rather than passively executing.
Performance Optimization: Making the Loop Run Faster
Agent Loop performance directly affects user experience. Claude Code has optimizations in several areas:
Parallel tool execution. Whatever can run in parallel does, reducing wait time.
Prompt caching. Covered in detail in article 8, but the core is caching invariant context (like system prompts, tool definitions) to avoid resending them on every loop iteration.
Incremental updates. Between loops, only new messages need to be sent to the model; historical message cache keys can be reused.
Streaming responses. Start parsing and processing before the model completely generates its response, reducing perceived latency.
What This Means for You
Understanding the Agent Loop helps you collaborate better with Claude Code:
Understand “pauses.” When Claude Code pauses to “think,” it’s actually waiting for the model API response. It’s not frozen—it’s normal workflow.
Use sub-agents wisely. For complex tasks, explicitly telling Claude Code “first use a sub-agent to analyze, then decide” can make it work more effectively.
Control loop depth. If you notice Claude Code circling around the same issue, tell it directly “that’s enough, stop here” to avoid infinite loops.
Understand the value of streaming output. Seeing the AI “thinking” builds more trust than seeing just the final result—you know it’s working, not crashed.
If you want to build your own AI Agent, the Agent Loop design gives you these insights:
- Separate “thinking” from “acting,” let the model decide when to continue
- Support parallel execution, but consider safety and dependencies
- Streaming output improves user experience
- Keep users in control, allow interruption anytime
The Agent Loop is an AI Agent’s “heartbeat.” Each loop is a complete “sense → decide → act” cycle. Claude Code’s heartbeat is designed to be fast, steady, and flexible, allowing it to operate autonomously in complex coding tasks while maintaining the user’s ultimate control.
In the next article, we’ll dive deeper into tool execution orchestration details—how permission, concurrency, streaming, and interruption are implemented.
