From a Single String to a Structured System Prompt

In Part 1, the system prompt was about sixty words in a single string literal:

You are Eugene, an assistant that answers questions about a Rust project. If you don’t know the answer, call read_file to read project files.

That works fine when there is exactly one tool, one user, and one output format. But add a second tool and the model needs to know when to call list_files vs read_file. Add a third and you must teach ordering. Add an output format requirement like “cite the file you read” and you need a dedicated place for that rule. Add regression testing and you need to detect accidental edits. Add prompt caching and you suddenly care which parts of the prompt stay the same across turns.

The real issue is that a single long string prevents Anthropic Prompt Caching from working, because every turn resends the entire persona, giving a cache hit rate of zero.

This second installment introduces a minimal but survivable structure: four layers, a typed builder, a cache boundary, and a regression fingerprint. The Eugene v0.2 loop itself is unchanged; the change is entirely inside the system field.


Why Structure Matters

A prompt is code. It controls model behavior. Sentences compete for the model’s attention. An undifferentiated wall of text is skimmed and partially ignored; a short, labeled set of sections is read carefully and respected.

This sounds like superstition, but it is empirically true. Production models like Claude are trained on examples where system prompts have sections, user requests are clear, and assistant replies are explicit. When you mirror that shape, the model becomes more certain about what each section means, leading to fewer hallucinations, tighter outputs, and better tool selection.

The four layers below are not the only valid structure. They are the smallest structure that survives real-world agent evolution. The Claude Code CLI assembles its own system prompt in almost the same shape. This is not a coincidence — agents that survive contact with real users converge on roughly the same prompt anatomy.

Layered prompt architecture for a Rust AI Agent: Identity, Instructions, Output, Examples, Context


The Four Layers: Identity, Instructions, Output, Examples, Context

1. Identity — Voice and Posture

Identity is one paragraph that says who the agent is. It establishes voice, expertise, and posture.

You are Eugene, a careful research assistant who answers questions about a Rust project. You prefer reading the source over guessing.

Identity is not a resume. Writing “You are an expert Rust engineer with twenty years at major tech companies” does not make the model an expert. Identity controls voice:

  • A “careful research assistant” hedges before speaking.
  • A “fast no-nonsense engineer” gives terse answers.
  • A “patient teacher” explains in detail.

It also controls posture toward uncertainty. The sentence prefer reading the source over guessing measurably increases the model’s tendency to call read_file, because doing so feels more in-character.

2. Instructions — Behavioral Rules

Instructions are individually testable rules. Use a list, not prose:

- Use list_files to discover what is in the project before reading.
- Use read_file to inspect a specific file. Do not call it on paths you have not seen listed.
- If a tool returns an error, do not retry the same call.

Rules must be testable. “Use list_files before reading” is testable: did the model call list_files first? “Be helpful” is not — drop it.

Rule order matters. When rules conflict, the model reads top to bottom. Earlier rules anchor; later rules clarify. If two rules genuinely conflict, the prompt is wrong, and the model will pick at random. Resolve the conflict in the prompt, not at inference time.

Three to four rules are enough for a small surface. Ten rules is a smell. Twenty rules is a failure mode: the model will silently drop some of them. Long instruction lists usually mean a missing output constraint or context fact.

3. Output Constraints — Format

Output constraints describe the shape of the answer, not its content:

- Answer in plain prose, no markdown.
- Cite the file you read in parentheses, e.g. (src/main.rs).
- Keep replies under 200 words.

Decoupling format from behavior means you can change the citation format by editing one line in the Output layer, without re-reading the rest of the prompt. If downstream needs JSON, the Output layer provides the schema. If the user asks for “step by step”, the Output layer says “number every step” while the behavior layer keeps doing its job.

4. Examples — Few-Shot Demonstration

One example is often worth several instructions. It demonstrates:

  • The citation format.
  • The expected answer length.
  • The relationship between a tool call and a citation.
  • That brief acknowledgment is fine.
Example 1:
User: What edition does Cargo.toml use?
Assistant: I'll check Cargo.toml directly. (Cargo.toml) The project uses Rust edition 2024.

The danger is overfitting. If every example involves Cargo.toml, the model becomes weirdly fixated on it. Use one or two examples that span the actual input space.

5. Context — Fresh Runtime Data

Context is the layer everyone forgets at first. The model has a knowledge cutoff and does not know today’s date, the project name, or which files exist. It will guess, and guesses are often plausibly wrong.

The fix is mechanical: pass the facts in every request.

<env>
today: 2026-05-22
project_root: /Users/me/code/eugene
</env>

Real context grows quickly: user name, project name, available skills, current page, recent errors, latest commit hash. These are not opinions or instructions; they are data scoped to one request, supplied by code.


Rust Implementation: Typed Builder

In Rust, this structure becomes a builder. Each method corresponds to a layer, and the builder remembers which layers are static and which are dynamic, rendering the cache boundary in between.

let prompt = SystemPromptBuilder::new()
    .identity(
        "You are Eugene, a careful research assistant who answers \
         questions about a Rust project. You prefer reading the source \
         over guessing.",
    )
    .instruction("Use `list_files` to discover what is in the project before reading.")
    .instruction("Use `read_file` to inspect a specific file. Do not call it on paths you have not seen listed.")
    .instruction("If a tool returns an error, do not retry the same call.")
    .output_constraints(
        "Answer in plain prose. Cite the file you read in parentheses, \
         for example: (src/main.rs).",
    )
    .example(
        "What edition does Cargo.toml use?",
        "I'll check Cargo.toml directly. (Cargo.toml) The project uses Rust edition 2024.",
    )
    .context(format!("<env>\ntoday: {today}\nproject_root: {sandbox}\n</env>"))
    .build();

The rendered static prefix looks like this:

## Identity

You are Eugene, a careful research assistant ...

## Instructions

- Use `list_files` to discover what is in the project before reading.
- Use `read_file` to inspect a specific file. Do not call it on paths you have not seen listed.
- If a tool returns an error, do not retry the same call.

## Output

Answer in plain prose. Cite the file you read in parentheses ...

## Examples

Example 1:
User: What edition does Cargo.toml use?
Assistant: I'll check Cargo.toml directly. (Cargo.toml) ...

The dynamic suffix is shorter:

## Context

<env>
today: 2026-05-22
project_root: /Users/me/code/eugene
</env>

Headers and XML tags matter. ## Instructions tells the model the bullets are rules. ## Context tells it the lines are facts. <env> is a visual marker the model has seen thousands of times in training; it learns to read the inside as runtime state rather than guidance. Without this structure, facts and instructions blur, and the model treats facts as instructions or vice versa.


The Cache Boundary: Turning Structure Into Money

This is what separates a hobby agent from one you can afford to run.

Anthropic’s API supports Prompt Caching: send the system field as an array of text blocks, mark one block with cache_control: { type: "ephemeral" }, and the API caches that block and everything before it for five minutes. The next request with the same prefix hits the cache. Cached input tokens are billed at roughly 10% of the normal rate.

For a six-turn loop with a thousand-token persona, that is roughly a 5x reduction in input billing after the first turn, plus lower latency. The savings scale with how much of the prompt is stable. The persona is almost always stable; today’s date is not.

let blocks = prompt.into_system_blocks();
// blocks[0] = { type: "text", text: <static prefix>, cache_control: { type: "ephemeral" } }
// blocks[1] = { type: "text", text: <dynamic suffix> } // no cache_control

The builder partitions naturally: Identity, Instructions, Output, and Examples are static; Context is dynamic. The static prefix carries the cache marker; the dynamic suffix does not, so only the suffix is re-tokenized each turn.

Claude Code’s source even has a literal SYSTEM_PROMPT_DYNAMIC_BOUNDARY marker; its cache logic splits on that line. Anything that drifts between turns (mid-session MCP connects, model overrides, language preferences) sits below the boundary. The static persona above it is cached globally where the build allows.

Two rules keep the cache hot:

  1. The prefix must be byte-identical from request to request. A new whitespace, reordered instruction, or different date format inside Identity changes the hash and forces cache creation again.
  2. The prefix must be at least 1024 tokens for Anthropic to cache it. Small personas do not benefit; large ones benefit a lot.

The API usage response tells you the story:

[turn 0] in=4 cache_read=0 cache_create=1247 out=89
[turn 1] in=3 cache_read=1247 cache_create=0 out=42

Turn 0 creates the cache; every later turn reads from it. The cache is invisible until you measure it.


Section Memoization: In-Process Cache Boundaries

Prompt Caching handles token cost, but some dynamic context is expensive to compute: scanning the filesystem, loading memory snapshots, calling remote settings. Recomputing these every turn wastes wall time and clutters tracing.

The fix is to memoize each dynamic section by name and only recompute on explicit invalidation (a new user request, /clear, /compact). Claude Code’s source uses systemPromptSection for cacheable sections and DANGEROUS_uncachedSystemPromptSection for volatile ones, the latter requiring a reason because breaking the cache invalidates the whole downstream cache for that request.

In Rust, this can be a Section { name, compute_fn, cache_break } registry resolved at prompt build time, with results cached against a turn id. If a section is expensive, wrap it in your own cache and pass the cached value into the builder. The builder’s job is layout, not memoization.


Regression Fingerprinting: Preventing Silent Prompt Drift

The hardest bug to debug is an unintended prompt edit that breaks a use case nobody was testing. A fingerprint is the cheap defense.

The builder hashes the rendered text and exposes:

  • prefix_fingerprint — the static prefix, which is the cache key.
  • fingerprint — the full prompt.
const EXPECTED_PROMPT_FINGERPRINT: u64 = 0; // set after first run

if EXPECTED_PROMPT_FINGERPRINT != 0 && prompt.fingerprint() != EXPECTED_PROMPT_FINGERPRINT {
    eprintln!(
        "warning: system prompt fingerprint drifted (was {EXPECTED_PROMPT_FINGERPRINT}, is {}). \
         Update the constant if the change was intentional.",
        prompt.fingerprint()
    );
}

In CI, check that the prefix fingerprint matches a constant. When it diverges, either accept the change (update the constant, tell the team why) or reject it (revert the edit). The same fingerprint also tells you exactly when your next request will miss the prompt cache.

A fingerprint catches about 90% of trivial drift. The remaining 10% requires behavioral evaluation, which is a Part 4 topic.


Eugene v0.2 in Practice

The Part 1 loop is unchanged. The difference is the structured prompt and how it is sent:

let prompt = system_prompt(&today, &sandbox);
let system_blocks = prompt.into_system_blocks();

let response = send(&http, &api_key, &system_blocks, &tools, &messages).await?;

The agent now has two tools, list_files and read_file, and it knows from instructions and examples how to chain them. Ask it “What’s in the src folder?” and it will list the directory, pick a file, read it, and answer with a citation.

Run it on “What edition does Cargo.toml use?” and the response is closer to:

I’ll check Cargo.toml directly. (Cargo.toml) The project uses Rust edition 2024.

The citation comes from the Output layer; the decision to read the file comes from Identity and Instructions; the citation format comes from the Example; the cost is a fraction of what it would be without the cache boundary. Each layer pulls its weight.


What This Reveals

A const string works until it doesn’t. Once you have more than one tool, more than one output format, more than one engineer touching the prompt, or any cost expectation, you need structure.

The four layers are not arbitrary:

  • Identity = who
  • Instructions = what
  • Output = how
  • Context = now

When something does not fit cleanly into one of these buckets, it usually means the agent’s requirements are unclear, not that the structure is wrong.

The cache boundary turns the structure into money. The typed builder is the shape Rust gives you for free. The fingerprint is the cheap form of regression testing. Together, they let you edit a system prompt on a Friday afternoon without a sinking feeling.


Next: Part 3 — The Skill Trait

Part 2’s two tools are still ad-hoc functions wired into a match statement. There is no shared interface, no schema generation, no retries, and no way for third parties to ship new tools. Part 3 introduces the Skill trait, a registry that owns the dispatch table, and schemars for deriving JSON Schema straight from Rust structs. The agent stops growing by accretion and starts growing by composition.



For Developers in China

The original examples use the Anthropic API, which may not be directly accessible from mainland China. You can:

  1. Use a domestic model with tool-calling support such as Tongyi Qianwen or Zhipu GLM-5, adjusting the endpoint, auth headers, and request body format in the send function.
  2. Run local inference plus a tool layer with Ollama and open models like Qwen, implementing the same tool-calling loop and prompt structure yourself.
  3. Route through a proxy service that forwards Anthropic-compatible requests to an available backend model.

If you found this helpful, please clap, share, and bookmark this post! Follow “全栈之巅-梦兽编程” (Mengshou Programming) on WeChat for weekly deep dives into Rust and AI programming.

Learn more about Mengshou Programming AI Assistant services to put AI coding tools to work in your daily development.

Have questions about Rust AI Agents? Leave a comment below and we’ll reply as soon as possible.