Prompt Chaining: Make AI Check Its Own Work and Double Accuracy

Writers often say: good writing is rewritten, not written. The first version is basically a draft, and it takes two or three revisions to make it decent. Surprisingly, this applies to AI too — if you let Claude answer first, then have it look back and check if what it said was correct, the accuracy often improves significantly.

This is what we’re talking about today: prompt chaining. The name sounds fancy, but it’s really just “making AI think through multiple rounds.” Each round takes the previous round’s results and continues from there. You write code the same way, right? First get the functionality working, then clean up the structure. “Working” and “maintainable” are two different things. Chaining just makes AI follow this same path.

First Example: Let AI Find Its Own Mistakes

Let’s jump straight into an example. Ask Claude to list 10 English words ending in “ab”:

first_user = "List 10 words that all end with the letters 'ab'."

messages = [{"role": "user", "content": first_user}]
first_response = get_completion(messages)
print(first_response)

Claude might give you something like:

Cab 2. Dab 3. Grab 4. Gab 5. Jab 6. Lab 7. Nab 8. Slab 9. Tab 10. Blab

Looks fine, right? But Claude sometimes “hallucinates” — especially when you ask it to generate lots of examples at once, it might slip in a word or two that doesn’t actually exist. This is where chaining comes in — you follow up with:

second_user = "Please replace all 'words' that aren't real words."

messages = [
    {"role": "user", "content": first_user},
    {"role": "assistant", "content": first_response},
    {"role": "user", "content": second_user}
]
print(get_completion(messages))

Claude will re-examine its answer and swap out those made-up words for real ones. The principle is simple: don’t expect perfection on the first try. Give AI a chance to look back and check, and the error rate drops.

Don’t Make AI “Force-Edit” Correct Answers

But there’s a trap here. If you ask Claude to check an already-correct answer, it might get “overly cautious” — it’ll insist on changing things that are already right, as if not changing something makes it look like it didn’t do its job.

The solution is simple: give it an out. Add a line to your check instruction: “if everything is correct, don’t change anything”:

second_user = """Please replace all 'words' that aren't real words.
If all words are correct, return the original list."""

messages = [
    {"role": "user", "content": first_user},
    {"role": "assistant", "content": first_response},
    {"role": "user", "content": second_user}
]

This is just like talking to people. You tell a colleague “check if there are any problems with this proposal,” and if you don’t add “if there are no problems, let’s go with this,” they might force some issues just to show they carefully reviewed it. We covered this technique in Chapter 8 “Preventing AI Guessing” — it’s called giving AI an exit option.

Use Chaining to Polish Content Quality

Chaining isn’t just for catching errors — it can also polish. Say you ask Claude to write a three-sentence short story:

first_user = "Write a three-sentence short story about a girl who loves running."
first_response = get_completion([{"role": "user", "content": first_user}])

The first version will probably be pretty plain. Follow up with:

second_user = "Make this story better."

messages = [
    {"role": "user", "content": first_user},
    {"role": "assistant", "content": first_response},
    {"role": "user", "content": second_user}
]
print(get_completion(messages))

The second version is often noticeably better. Claude has pretty good self-awareness about “revising” — it knows where it cut corners in the first draft. It’s just like when you write something yourself; put it aside for a bit and come back, and you’ll spot problems immediately.

Feed Previous Results Into the Next Step

Chaining has an even more practical use: directly use the first step’s output as the second step’s input.

For example, you want to extract names from a conversation, then sort them alphabetically. Split it into two steps:

# Step 1: Extract names
first_user = """Find all person names in the following text:

"Hey Jesse. It's Erin. I'm calling about the party Joey's hosting tomorrow. Keisha said she'd be coming, and I think Mel will be there too.""""

prefill = "<names>"  # Prefill format

messages = [
    {"role": "user", "content": first_user},
    {"role": "assistant", "content": prefill}
]
first_response = get_completion(messages)
print(first_response)

Claude will spit out:

Jesse, Erin, Joey, Keisha, Mel

Then feed this list into step two:

second_user = "Sort this list alphabetically."

messages = [
    {"role": "user", "content": first_user},
    {"role": "assistant", "content": prefill + "\n" + first_response},
    {"role": "user", "content": second_user}
]
print(get_completion(messages))

Result: Erin, Jesse, Joey, Keisha, Mel.

In AI programming, this is called “function calling” — basically an assembly line. Break a task into several steps, each step handles its own part, and the final result is more reliable than having AI do everything at once.

When to Use It, When Not To

So when do you actually need chaining? My criterion is pretty crude: if the task is complex or can’t tolerate errors, split it into multiple steps.

Like having AI write code. You ask it to write a complete feature in one go, and what you get runs but might be rough. Try another approach: first have it write a working version, then have it review for edge cases, finally have it optimize naming and structure. Do it in rounds, each round does one thing, and the final code quality will be much better.

Or data analysis. Just saying “help me analyze this data” might get you a pile of generic fluff. But if you have it first extract key metrics, then find patterns based on those metrics, doing it in two steps gives much more concrete results.

But chaining isn’t appropriate everywhere. Each extra call costs more tokens. Something like “help me translate this passage” — one time is enough, splitting it just wastes money.

How many rounds is appropriate? My own experience is two or three rounds is about right. First round produces a draft, second round corrects or supplements, third round does final polish if necessary. Beyond that, improvement isn’t obvious, but you burn money faster.

Summary

The core of chaining is one thing: make AI iterate through multiple rounds. First round gets the work out, later rounds fill in gaps. Remember to give AI an “it’s okay not to change anything” escape route, so it doesn’t change just for the sake of changing. If the task is simple, don’t overcomplicate it. Only split complex tasks into steps. Two or three rounds is plenty.

AI makes mistakes, but it can also catch its own mistakes. Your job is to give it that checking opportunity.

Cheatsheet

# Basic chaining structure
messages = [
    {"role": "user", "content": "First question"},
    {"role": "assistant", "content": "AI's first response"},
    {"role": "user", "content": "Follow-up question based on first response"}
]
response = get_completion(messages)

# Check instruction with exit option
check_prompt = """Check and correct errors. If no errors, return original answer."""

# Multi-task chaining
step1_result = get_completion([{"role": "user", "content": "Task 1"}])
step2_result = get_completion([
    {"role": "user", "content": "Task 1"},
    {"role": "assistant", "content": step1_result},
    {"role": "user", "content": "Task 2 based on Task 1 result"}
])

When you use AI and get an unreliable response, how do you handle it? Do you just ask again, or have it revise based on its original answer? Share your approach in the comments.

This series will continue with future updates. Next time we’ll cover advanced “function calling” techniques — how to make AI call external tools to help you work. Follow Dream Beast Programming so you don’t miss future content. See you next time.

First Example: Let AI Find Its Own Mistakes#

Don’t Make AI “Force-Edit” Correct Answers#

Use Chaining to Polish Content Quality#

Feed Previous Results Into the Next Step#

When to Use It, When Not To#

Summary#