Prompt Chaining: Multi-Step AI Workflows
Master prompt chaining, routing, and parallelization patterns for AI workflows. Build reliable multi-step systems with real prompt templates and code examples.
Prompt Chaining
Prompt chaining decomposes complex tasks into sequential steps where each output feeds the next. Instead of one massive prompt hoping the model gets everything right, you break the work into focused subtasks — each with its own prompt, its own quality check, and a clear handoff to the next step.
It's the simplest multi-step pattern to implement: wire two LLM calls together, pass the output of call 1 as input to call 2. No planning agents, no tool loop, no autonomous decisions. Just structured decomposition.
Note:
If you're new to multi-step prompting, start here. Chaining is easier to build and debug than agentic prompting or RAG pipelines. Graduate to those when you need dynamic decision-making or external retrieval.
Sequential Chaining
Break a task into ordered steps. Each step processes the previous output and hands it forward. If step 3 fails, you retry step 3 — not steps 1 and 2.
Basic Pattern
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Step 1 │───→│ Step 2 │───→│ Step 3 │───→│ Final │
│ Plan │ │ Draft │ │ Polish │ │ Output │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
Step 1 — Plan
Create a detailed outline for a blog post about Kubernetes for beginners.
Include 4-5 H2 sections with 2-3 bullet points of key topics under each.
Output format:
## Section Title
- Key point 1
- Key point 2
Step 2 — Draft
Expand each section of this outline into 2-3 paragraphs. Write in a clear,
conversational tone suitable for beginners. Include concrete examples.
Outline:
{outline_from_step_1}
Previous decisions: The post targets complete beginners with no DevOps
experience. Tone is friendly, not academic.
Step 3 — Polish
Refine this draft for clarity, grammar, and accuracy. Add one concrete
example or analogy per section. Remove any jargon without explanation.
Draft:
{draft_from_step_2}
Original goal: Beginner-friendly Kubernetes guide.
Sections already covered: {already_covered}
Wiring It in Code
from openai import OpenAI
client = OpenAI()
def chain(question: str) -> str:
# Step 1: Plan
plan = client.chat.completions.create(
model="gpt-4o",
temperature=0.3,
messages=[{"role": "user", "content": f"Create a detailed outline for: {question}"}]
)
outline = plan.choices[0].message.content
# Gate check — if outline is empty, retry or fail
if not outline or len(outline) < 50:
raise ValueError("Step 1 produced insufficient output")
# Step 2: Draft
draft = client.chat.completions.create(
model="gpt-4o",
temperature=0.7,
messages=[{"role": "user", "content": f"""Expand this outline into full paragraphs.
Outline:
{outline}
Write in a clear, conversational tone."""}]
)
text = draft.choices[0].message.content
# Step 3: Polish
final = client.chat.completions.create(
model="gpt-4o",
temperature=0.3,
messages=[{"role": "user", "content": f"""Refine this draft. Fix grammar, improve clarity, add
one concrete example per section.
Draft:
{text}"""}]
)
return final.choices[0].message.content
Key details in the code:
- Gate checks between steps catch failures early. If step 1 produces garbage, don't waste tokens on steps 2 and 3.
- Temperature varies by step. Plan and polish use 0.3 (deterministic). Draft uses 0.7 (creative).
- Context is carried forward. Each step gets the output and a summary of what was decided.
Routing
Classify the input into a category, then dispatch to a specialized handler prompt optimized for that category.
Classify the user query into exactly one category. Output only the category
name and a confidence score from 0.0 to 1.0.
Categories:
- pricing: Questions about cost, plans, billing
- technical: Setup, configuration, troubleshooting
- account: Login, permissions, profile management
- feedback: Feature requests, complaints, suggestions
- general: Everything not covered above
Query: {user_query}
Category:
Confidence:
If confidence is below 0.7, route to a disambiguation prompt before the handler:
I want to make sure I understand your question correctly. Did you mean:
A) How to set up {topic} from scratch
B) How to fix an existing {topic} configuration
C) How to compare {topic} with alternatives
D) Something else — please clarify
Reply with just the letter.
Each handler has its own prompt:
You are a technical support specialist for {product_name}.
The user needs help with setup or configuration.
Rules:
- Give step-by-step instructions
- Include copy-pasteable commands where relevant
- Ask clarifying questions if the OS or version is not specified
- Point to official docs for advanced options
User question: {user_query}
Wiring routing in code:
def route_and_respond(query: str) -> str:
# Step 1: Classify
category, confidence = classify_query(client, query)
# Step 2: Disambiguate if unsure
if confidence < 0.7:
query = disambiguate(client, query, category)
# Step 3: Dispatch to handler
handler = HANDLERS.get(category, HANDLERS["general"])
return handler(client, query)
HANDLERS = {
"pricing": respond_pricing,
"technical": respond_technical,
"account": respond_account,
"feedback": respond_feedback,
"general": respond_general,
}
A catch-all general handler prevents routing dead ends. If no category matches, the system still responds.
Parallelization
Run independent subtasks concurrently, then combine results through an aggregator step.
You are researching {topic} from {n} different angles. Your assigned
perspective is: {perspective_name}
Rules:
- Research only your assigned angle
- Provide specific data points with sources
- Do not reference other perspectives — the aggregator handles that
- Format findings as structured bullet points
Research focus: {focus_area}
Run this prompt N times simultaneously — one per perspective. Then aggregate:
Synthesize these {n} research reports into a unified analysis.
{p1}: {report_1}
{p2}: {report_2}
{p3}: {report_3}
Output:
1. Comparison table showing key differences across perspectives
2. Areas of agreement (what all perspectives concur on)
3. Contradictions or conflicting findings (flag explicitly, don't smooth over)
4. Key takeaways as a prioritized list
Wiring parallel execution:
import asyncio
async def parallel_research(topic: str, perspectives: list[tuple[str, str]]) -> str:
tasks = [
asyncio.create_task(
research_perspective(client, topic, name, focus)
)
for name, focus in perspectives
]
results = await asyncio.gather(*tasks, return_exceptions=True)
# Filter out failed subtasks — don't block on one failure
completed = [(p[0], r) for (p, r) in zip(perspectives, results)
if not isinstance(r, Exception)]
return aggregate_findings(client, topic, completed)
return_exceptions=True prevents one failed subtask from killing the entire pipeline. The aggregator notes which perspectives completed and which are missing.
Chaining vs Agentic Prompting
Both handle multi-step tasks. Here's when to pick each:
| Pattern | Best For | Decision Logic | Recovery |
|---|---|---|---|
| Chaining | Fixed workflow, known steps ahead of time | Hardcoded — step 1 always calls step 2 | Retry the failing step |
| Agentic | Dynamic tasks, uncertain number of steps | LLM decides next action at runtime | Re-plan from current state |
Use chaining when you know the pipeline structure in advance. Use agentic prompting when the model needs to decide what to do next.
Chaining is simpler, faster, and cheaper. Agentic is more flexible but costs more tokens and adds latency.
Combining with Other Techniques
Chaining composes well with other prompting patterns:
Chain + CoT. Each step in the chain uses chain-of-thought internally:
Step 2: Draft each section.
[outline from Step 1]
For each section, think through:
- What's the one concept the reader needs to understand?
- What analogy would make this click?
- What order should the paragraphs go in?
Then write the section.
Chain + RAG. Retrieve documents before a chain step that needs external knowledge:
# Before Step 2 (draft), retrieve relevant docs
docs = retrieve(question=outline, top_k=3)
draft = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": f"Use these sources:\n{docs}\n\nExpand outline:\n{outline}"
}]
)
Chain + Tool Use. A step calls an external tool, then the next step processes the result:
# Step 1: Generate a SQL query
sql = client.chat.completions.create(...)
# Tool: Execute the query
results = db.execute(sql)
# Step 2: Format results into plain English
explanation = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": f"Results: {results}\n\nSummarize these findings for a non-technical audience."
}]
)
Common Failure Modes
Information loss between steps. Each LLM call may drop or distort details from the previous step. Mitigation: include a summary field alongside the raw output in each handoff.
[Full output from Step 1: {output}]
[Summary: The outline has 5 sections covering intro, core concepts,
hands-on tutorial, common pitfalls, and next steps.]
[Current step: Draft section 3 — "Hands-on tutorial"]
Compounding errors. An error in step 2 gets magnified in steps 3 and 4. Mitigation: add validation gates between steps.
if not draft or len(draft) < 200:
raise ValueError(f"Step 2 produced insufficient output: {len(draft)} chars")
Context drift. Long chains lose focus as each step subtly shifts the task. Mitigation: re-state the original goal in each step's prompt:
Remember, the end goal is a beginner-friendly Kubernetes guide.
Keep everything at that level. Don't assume prior DevOps knowledge.
Routing dead ends. No category matches and the system has no fallback. Mitigation: always include a general catch-all handler.
Parallel task conflicts. Two subtasks produce contradictory findings. Mitigation: instruct the aggregator to flag contradictions explicitly rather than smoothing them over.
Troubleshooting
| Symptom | Likely Cause | Fix |
|---|---|---|
| Chain output too short | Step instructions too vague | Add length requirement in that step's prompt |
| Chain output too long | No length constraint | Specify max words or section count per step |
| Router picks wrong category | Categories too broad or overlapping | Make categories mutually exclusive; test with edge cases |
| Parallel results conflict | Subtasks share hidden dependencies | Check subtask boundaries; reduce overlap in research focus |
| Quality degrades after step 3 | Context drift from original goal | Re-inject the original task description at step 3 |
| Chain fails silently on step N | No gate check before step N+1 | Add validation between every step |
| High latency from unnecessary steps | Every query runs the full chain | Route simple queries to a single-step handler |
Best Practices
- Validate at each step. Check output meets minimum requirements before passing forward.
- Keep steps focused. Each step does one transformation. If a step does two things, split it.
- Set max iterations. For chains with retry logic, cap at 3 retries per step to avoid loops.
- Carry context forward. Include the output, a summary, and what was already decided.
- Handle failures gracefully. A failed step shouldn't crash the pipeline — log it, retry, or skip.
- Design for testability. Each step is an independent function you can test with mock data.
- Start with a 2-step chain. Don't build a 7-step pipeline on day one. Get a 2-step chain working, then add steps.
Note:
Pro tip: The simplest chain that solves the problem is the best chain. A 2-step pipeline that works reliably beats a 7-step pipeline that breaks on edge cases.
Related Articles
Master Prompt Writing: Essential Guide for AI Models
Learn proven techniques for writing effective prompts that get better AI responses. Discover clarity principles, context setting, and advanced prompting strategies.
Fantasy & Isekai SREF Codes for Midjourney
Epic fantasy worlds with detailed environments and RPG-inspired aesthetics for Midjourney prompts.
Fluid & Organic Abstraction SREF Codes
Fluid and organic abstraction SREF codes for Midjourney featuring flowing forms, biomorphic shapes, and liquid dreamscapes.