5 Prompt Chaining Patterns With Real Code
Build 5 production-ready prompt chains: research pipeline, code review, content creation, data extraction, and debugging. Complete Python implementations with gate checks and error handling.
5 Prompt Chaining Patterns With Real Code
The prompt chaining technique page covers the concepts. This tutorial puts them into practice — five real chains you can copy, run, and adapt. Each is a complete Python implementation with gate checks, error handling, and logging so you can trace exactly what happens at every step.
Note:
All examples use the OpenAI Python SDK. Install with pip install openai. Set your API key: export OPENAI_API_KEY=sk-...
Pattern 1: Research Pipeline
A 3-step chain that researches a topic, cross-references sources, and produces a synthesized report. Useful for competitive analysis, market research, or technical deep dives.
Step 1 — Gather Sources
Research {topic}. Return a structured list of 5-7 credible sources with
key findings from each.
Output as JSON:
{
"sources": [
{
"title": "...",
"url": "...",
"key_findings": ["...", "..."]
}
]
}
Step 2 — Cross-Reference
These sources cover {topic}. Identify:
1. Claims that appear in 3+ sources (strong consensus)
2. Claims that appear in only 1 source (needs verification)
3. Direct contradictions between sources
4. Gaps — what important angle is missing?
Flag contradictions explicitly. Don't smooth them over.
Sources:
{sources_json_from_step_1}
Step 3 — Synthesize
Write a 500-word research brief on {topic} using these findings.
Structure:
- Executive summary (3 sentences)
- Key findings (numbered, with source count per finding)
- Contradictions and open questions
- Recommended next steps
Use specific data points. Cite sources inline as [1], [2], etc.
Cross-reference results:
{cross_ref_from_step_2}
Original sources:
{sources_json_from_step_1}
Full Implementation
import json
import logging
from openai import OpenAI
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
client = OpenAI()
def research_pipeline(topic: str) -> dict:
# Step 1: Gather sources
logger.info(f"[Step 1] Gathering sources for: {topic}")
sources_raw = client.chat.completions.create(
model="gpt-4o",
temperature=0.3,
response_format={"type": "json_object"},
messages=[{
"role": "user",
"content": f"""Research {topic}. Return structured JSON with
'sources' array. Each source: title, url, key_findings list.
Include 5-7 credible sources."""
}]
)
sources = json.loads(sources_raw.choices[0].message.content)
if not sources.get("sources") or len(sources["sources"]) < 3:
raise ValueError(f"Step 1 failed: only {len(sources.get('sources', []))} sources found")
logger.info(f"[Step 1] Found {len(sources['sources'])} sources")
# Step 2: Cross-reference
logger.info("[Step 2] Cross-referencing sources")
cross_ref_raw = client.chat.completions.create(
model="gpt-4o",
temperature=0.2,
messages=[{
"role": "user",
"content": f"""Cross-reference these sources on '{topic}'.
Identify consensus claims, unverified claims, contradictions, and gaps.
Sources: {json.dumps(sources, indent=2)}"""
}]
)
cross_ref = cross_ref_raw.choices[0].message.content
if len(cross_ref) < 100:
raise ValueError("Step 2 produced insufficient cross-reference")
logger.info(f"[Step 2] Cross-reference complete ({len(cross_ref)} chars)")
# Step 3: Synthesize
logger.info("[Step 3] Synthesizing research brief")
brief_raw = client.chat.completions.create(
model="gpt-4o",
temperature=0.4,
messages=[{
"role": "user",
"content": f"""Write a 500-word research brief on '{topic}'.
Cross-reference findings:
{cross_ref}
Original sources:
{json.dumps(sources, indent=2)}
Include executive summary, key findings with citations,
contradictions, and next steps."""
}]
)
brief = brief_raw.choices[0].message.content
logger.info(f"[Step 3] Brief complete ({len(brief)} chars)")
return {
"topic": topic,
"sources": sources,
"cross_reference": cross_ref,
"brief": brief,
}
What makes this production-ready:
- Step 1 gate check: fails fast if fewer than 3 sources — don't waste tokens on steps 2 and 3
- Step 2 gate check: ensures cross-reference is substantive before synthesis
- Structured output: Step 1 uses JSON mode for reliable parsing
- Logging: every step logs its status so you can trace failures
Pattern 2: Code Review Chain
A 3-step chain that reviews a pull request: first summarizes the diff, then analyzes risks, then generates specific review comments. Each step narrows from general to specific.
Note:
Patterns 2-5 assume the same setup as Pattern 1: import json, import logging, client = OpenAI(), and a configured logger. Copy those 4 lines from Pattern 1 before running any of the following.
def code_review_chain(diff: str, pr_context: str = "") -> dict:
logger.info("[Step 1] Summarizing changes")
summary = client.chat.completions.create(
model="gpt-4o",
temperature=0.2,
messages=[{
"role": "user",
"content": f"""Summarize this pull request diff in plain English.
Focus on: what changed, why (if inferable), and files affected.
PR context: {pr_context}
Diff:
{diff}"""
}]
).choices[0].message.content
if len(summary) < 50:
raise ValueError("Step 1: summary too short")
logger.info("[Step 2] Analyzing risks")
risks = client.chat.completions.create(
model="gpt-4o",
temperature=0.1,
messages=[{
"role": "user",
"content": f"""Analyze this PR for risks. Classify each finding:
- SECURITY: auth, injection, data exposure
- CORRECTNESS: logic errors, edge cases, race conditions
- PERFORMANCE: N+1 queries, memory leaks, blocking I/O
- MAINTAINABILITY: unclear naming, missing tests, tight coupling
Summary: {summary}
Diff:
{diff}
Return each finding as: [SEVERITY] [CATEGORY] File:line — description"""
}]
).choices[0].message.content
if not risks:
raise ValueError("Step 2: no risk analysis produced")
logger.info("[Step 3] Generating review comments")
comments = client.chat.completions.create(
model="gpt-4o",
temperature=0.2,
messages=[{
"role": "user",
"content": f"""Generate specific, actionable review comments for this PR.
Each comment must include:
- The exact file and line range
- What the issue is
- Why it matters
- A suggested fix with code
Focus on the high-severity risks identified below.
Risks:
{risks}
Diff:
{diff}"""
}]
).choices[0].message.content
return {
"summary": summary,
"risks": risks,
"comments": comments,
}
Key design decisions:
- Temperature decreases as specificity increases: 0.2 → 0.1 → 0.2
- Step 1 context limits: the summary constrains what steps 2 and 3 analyze, preventing scope creep
- Structured risk format:
[SEVERITY] [CATEGORY] File:linemakes output parseable
Pattern 3: Content Creation Pipeline
A 4-step chain that goes from topic → outline → draft → polish. Unlike the simple 3-step example in the techniques page, this adds strategy research before outlining and an editor persona in the polish step.
def content_pipeline(topic: str, audience: str, tone: str, word_count: int = 1000) -> dict:
logger.info(f"[Step 1] Researching strategy for: {topic}")
strategy = client.chat.completions.create(
model="gpt-4o",
temperature=0.4,
messages=[{
"role": "user",
"content": f"""You're a content strategist. For a {tone} article
about '{topic}' targeting {audience}:
1. What 3 angles would perform best?
2. What's the single most contrarian take we could include?
3. What specific data or examples would make this stand out?
4. What common advice should we deliberately contradict?
Be specific. Name studies, stats, or case studies we should reference."""
}]
).choices[0].message.content
logger.info("[Step 2] Building outline")
outline = client.chat.completions.create(
model="gpt-4o",
temperature=0.5,
messages=[{
"role": "user",
"content": f"""Create a detailed outline for a {word_count}-word
{tone} article about '{topic}' for {audience}.
Use these strategic insights to shape the angle:
{strategy}
Include:
- A hook that challenges a common assumption
- 5-7 H2 sections with 2-3 bullet points each
- A counter-argument section
- A concrete next-steps section"""
}]
).choices[0].message.content
if len(outline) < 200:
raise ValueError(f"Step 2: outline too short ({len(outline)} chars)")
logger.info("[Step 3] Writing draft")
draft = client.chat.completions.create(
model="gpt-4o",
temperature=0.7,
messages=[{
"role": "user",
"content": f"""Write a full draft following this outline. Target
{word_count} words. {tone} tone for {audience}.
Include specific data points, examples, and the contrarian angle
identified in the strategy phase.
Outline:
{outline}
Strategy context:
{strategy}"""
}]
).choices[0].message.content
logger.info("[Step 4] Polishing with editor persona")
polished = client.chat.completions.create(
model="gpt-4o",
temperature=0.2,
messages=[{
"role": "system",
"content": """You are a senior editor at a major publication.
Your job is to make writing tighter, clearer, and more compelling.
Cut fluff. Strengthen weak claims. Make the hook sharper.
Never rewrite just to rewrite — only change what needs changing."""
}, {
"role": "user",
"content": f"""Edit this draft for clarity, flow, and impact.
Rules:
- Cut any sentence that doesn't earn its place
- Replace vague claims with specifics
- Ensure the hook challenges something
- The counter-argument section must feel fair, not strawman
- Add [EDITOR'S NOTE: ...] where you made substantive changes
Draft:
{draft}"""
}]
).choices[0].message.content
return {
"strategy": strategy,
"outline": outline,
"draft": draft,
"polished": polished,
}
What's different from the basic chain:
- Strategy research step: prevents "generic outline syndrome" — the outline is shaped by real angles
- System prompt in step 4: uses an editor persona with specific instructions, not just "polish this"
- Editor's notes: the polish step annotates substantive changes so you can review them
Pattern 4: Data Extraction Pipeline
A 3-step chain that extracts structured data from unstructured text: first identifies entities, then resolves ambiguities, then formats the final output. Designed for processing legal documents, research papers, or meeting transcripts.
def extraction_pipeline(document: str, schema: dict) -> dict:
logger.info("[Step 1] Extracting entities")
entities_raw = client.chat.completions.create(
model="gpt-4o",
temperature=0.0,
response_format={"type": "json_object"},
messages=[{
"role": "user",
"content": f"""Extract entities from this document according to
the schema. If a field is not found, set it to null. Do not
hallucinate — only extract what's explicitly stated.
Schema: {json.dumps(schema, indent=2)}
Document:
{document[:8000]}"""
}]
)
entities = json.loads(entities_raw.choices[0].message.content)
missing = [k for k, v in entities.items() if v is None]
if len(missing) == len(schema):
raise ValueError("Step 1: no entities extracted")
logger.info(f"[Step 1] Extracted {len(entities) - len(missing)}/{len(schema)} fields. Missing: {missing}")
# Step 2: Resolve ambiguities
if missing:
logger.info(f"[Step 2] Resolving {len(missing)} missing fields")
resolutions_raw = client.chat.completions.create(
model="gpt-4o",
temperature=0.0,
response_format={"type": "json_object"},
messages=[{
"role": "user",
"content": f"""Some fields were not found in the initial extraction.
For each missing field, determine if:
- "inferable": can be reasonably inferred from context (provide value)
- "not_present": genuinely absent from the document (set to null)
- "ambiguous": multiple possible values exist (list them)
Missing fields: {json.dumps(missing)}
Already extracted: {json.dumps(entities, indent=2)}
Document:
{document[:8000]}
Return JSON: {{"resolutions": {{field: {{status, value, candidates}}}}}}"""
}]
)
resolutions = json.loads(resolutions_raw.choices[0].message.content)
for field, resolution in resolutions.get("resolutions", {}).items():
if resolution.get("status") == "inferable":
entities[field] = resolution["value"]
entities[f"{field}_confidence"] = "inferred"
logger.info(
f"[Step 2] Resolved {sum(1 for r in resolutions.get('resolutions', {}).values() if r.get('status') == 'inferable')} fields"
)
# Step 3: Validate and format
logger.info("[Step 3] Validating extracted data")
validation = client.chat.completions.create(
model="gpt-4o",
temperature=0.0,
messages=[{
"role": "user",
"content": f"""Validate this extracted data against the original document.
Flag any values that contradict the source text.
Extracted data: {json.dumps(entities, indent=2)}
Document: {document[:4000]}
Return: "VALID" if all values are supported, or list specific issues."""
}]
).choices[0].message.content
return {
"entities": entities,
"validation": validation,
"fields_extracted": len(entities) - len(missing),
"fields_inferred": sum(1 for k in entities if k.endswith("_confidence")),
}
Design highlights:
- Temperature 0.0: extraction must be deterministic — no creativity
- Two-pass strategy: first pass gets what's explicit, second pass resolves ambiguous/missing fields without hallucination
- Inference is labeled: fields inferred from context get a
_confidence: "inferred"suffix so downstream systems can decide whether to trust them - Validation step: the final pass checks for contradictions, not just grammar
Pattern 5: Debugging Assistant Chain
A 4-step chain that takes an error message and stack trace, hypothesizes causes, tests each hypothesis, and suggests a fix. Designed for runtime errors, not compile-time.
def debugging_chain(error: str, stacktrace: str, code_context: str = "") -> dict:
logger.info("[Step 1] Parsing error")
parsed = client.chat.completions.create(
model="gpt-4o",
temperature=0.1,
response_format={"type": "json_object"},
messages=[{
"role": "user",
"content": f"""Parse this error into structured data:
Error message: {error}
Stack trace: {stacktrace}
Return JSON:
{{
"error_type": "...",
"error_message": "...",
"file": "...",
"line": ...,
"function": "...",
"likely_category": "logic|config|dependency|type|runtime|unknown"
}}"""
}]
)
parsed_error = json.loads(parsed.choices[0].message.content)
logger.info(f"[Step 1] Parsed: {parsed_error['error_type']} in {parsed_error['file']}:{parsed_error['line']}")
# Step 2: Generate hypotheses
logger.info("[Step 2] Generating hypotheses")
hypotheses_raw = client.chat.completions.create(
model="gpt-4o",
temperature=0.3,
response_format={"type": "json_object"},
messages=[{
"role": "user",
"content": f"""Given this error, generate 3-5 ranked hypotheses
about the root cause. For each hypothesis:
- What specifically could cause this
- Why it's likely (or unlikely)
- What evidence would confirm or rule it out
Parsed error: {json.dumps(parsed_error, indent=2)}
Error message: {error}
Stack trace: {stacktrace}
{f'Code context: {code_context}' if code_context else ''}
Return JSON: {{"hypotheses": [{{"rank": 1, "cause": "...", "likelihood": "high|medium|low", "evidence_to_check": "...", "fix_if_confirmed": "..."}}]}}"""
}]
)
hypotheses = json.loads(hypotheses_raw.choices[0].message.content)
logger.info(f"[Step 2] Generated {len(hypotheses['hypotheses'])} hypotheses")
# Step 3: Test top hypothesis
top = hypotheses["hypotheses"][0]
logger.info(f"[Step 3] Testing: {top['cause']}")
test_plan = client.chat.completions.create(
model="gpt-4o",
temperature=0.1,
messages=[{
"role": "user",
"content": f"""Design a minimal test to confirm or rule out this hypothesis.
Hypothesis: {top['cause']}
Evidence to check: {top['evidence_to_check']}
Error: {error}
Provide:
1. A specific command or code snippet to run
2. What output confirms the hypothesis
3. What output rules it out
4. If ruled out, which hypothesis to test next"""
}]
).choices[0].message.content
# Step 4: Suggest fix
logger.info("[Step 4] Generating fix")
fix = client.chat.completions.create(
model="gpt-4o",
temperature=0.2,
messages=[{
"role": "user",
"content": f"""Provide a specific fix for this error.
Error: {error}
Most likely cause: {top['cause']}
Test plan: {test_plan}
Include:
1. The exact code change (diff format)
2. Why this fix addresses the root cause (not just the symptom)
3. A test to verify the fix works
4. How to prevent this class of error in the future"""
}]
).choices[0].message.content
return {
"parsed_error": parsed_error,
"hypotheses": hypotheses,
"test_plan": test_plan,
"fix": fix,
}
Why this works:
- Error parsing step: normalizes the error into structured data so subsequent steps don't re-parse
- Ranked hypotheses: generates multiple causes and ranks them, rather than guessing one
- Evidence-based testing: each hypothesis includes what evidence would confirm or rule it out
- Prevention advice: the fix step includes how to prevent recurrence, not just a patch
Running Any Chain
All five patterns follow the same structure. Wrap them in a runner:
def run_chain(name: str, fn, **kwargs):
logger.info(f"=== Starting {name} ===")
try:
result = fn(**kwargs)
logger.info(f"=== {name} completed successfully ===")
return result
except ValueError as e:
logger.error(f"=== {name} failed at gate check: {e} ===")
raise
except Exception as e:
logger.error(f"=== {name} failed unexpectedly: {e} ===")
raise
# Usage
brief = run_chain(
"Research Pipeline",
research_pipeline,
topic="WebAssembly adoption in 2026"
)
Patterns at a Glance
| Pattern | Steps | Best For | Key Technique |
|---|---|---|---|
| Research Pipeline | 3 | Competitive analysis, market research | Two-pass source validation |
| Code Review | 3 | PR review automation | Decreasing temperature for precision |
| Content Creation | 4 | Blog posts, reports, documentation | Strategy research before outlining |
| Data Extraction | 3 | Structured data from documents | Two-pass with labeled inference |
| Debugging Assistant | 4 | Runtime error diagnosis | Ranked hypotheses + test plans |
Note:
Pro tip: Start with a 2-step version of any pattern. Make it reliable. Then add the third step. A working 2-step chain beats a broken 5-step chain.
Related Articles
Prototype Visualization Prompts: From Sketch to Render
Turn rough sketches into realistic 3D renders and visualize prototypes with Nano Banana. Accelerate your industrial design workflow.
Prompt Injection Defense
Practical hardening for production prompt templates. Input sanitization, output validation, canary tokens, dual-LLM patterns, and layered defense architectures that work in the real world.
API Cost Optimization: Cut LLM Expenses by 80%
Token counting, model routing, batch processing, and caching across OpenAI, Anthropic, and Google. Practical strategies to optimize API costs without sacrificing output quality.