Plan-and-Solve: Two-Stage Decomposition
Plan first, then execute. Address CoT's missing-step and calculation errors with a structured two-phase approach that outperforms zero-shot chain-of-thought across 10 datasets.
The Core Idea
Plan-and-Solve (Wang et al. 2023) fixes a fundamental weakness of zero-shot CoT: the model skips steps and makes calculation errors because it thinks and executes simultaneously. The fix is separation — plan the whole solution first, then execute each step with the plan as a guide.
Zero-Shot CoT: "Step 1, Step 2, Step 3..."
→ Missing steps, calculation errors, semantic confusion
Plan-and-Solve: Phase 1: "Here's my plan: ..."
Phase 2: "Step 1: ... Step 2: ... (following the plan)"
→ Fewer missing steps, better organization
The Three CoT Failure Modes
Plan-and-Solve was designed to address three specific failure patterns in zero-shot CoT:
| Failure Mode | What Happens | Example |
|---|---|---|
| Missing-step errors | Model jumps from problem to answer, skipping intermediate reasoning | "A train leaves at 3pm traveling 60mph. It's 4pm now. Distance = 60 miles." (wrong — didn't account for time properly) |
| Calculation errors | Model does arithmetic wrong during free-form reasoning | "48 × 37 = 1,764" (off by 12 — common digit-swap hallucination) |
| Semantic misunderstanding | Model misinterprets the problem structure | Treating a comparison problem as a counting problem |
PS addresses missing-step errors. PS+ (with detailed instructions) addresses calculation and semantic errors.
Prompt Templates
PS Prompt (Plan-and-Solve)
Q: {problem}
Let's first understand the problem and devise a plan to solve the problem.
Then, let's carry out the plan to solve the problem step by step.
PS+ Prompt (Plan-and-Solve with Extra Guidance)
Q: {problem}
Let's first understand the problem, extract relevant variables
and their corresponding numerals, and devise a plan.
Then, let's carry out the plan, calculate intermediate variables
(pay attention to correct numeral calculation and commonsense),
solve the problem step by step, and show the answer.
The PS+ variant adds three guardrails:
- Extract variables explicitly — prevents overlooking numbers in the problem
- Calculate intermediate variables — forces showing work, reducing mental arithmetic errors
- Pay attention to commonsense — reminds the model to sanity-check results
PS vs PS+ vs CoT: An Example
Problem: John has 5 boxes. Each box contains 12 apples.
He gives away 3 boxes. How many apples does he have left?
┌─────────────────────────────────────────────────────────────┐
│ Zero-Shot CoT │
│ "Let's think step by step. John had 5 boxes × 12 = 60 │
│ apples. He gave away 3 boxes, so he has 2 × 12 = 24." │
│ → Answer: 24 ✓ (but reasoning skipped the allocation step) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Plan-and-Solve (PS) │
│ "Let's first understand the problem and devise a plan... │
│ │
│ Plan: │
│ 1. Calculate total apples John starts with │
│ 2. Calculate boxes remaining after giving away │
│ 3. Calculate apples in remaining boxes │
│ │
│ Step 1: 5 boxes × 12 apples per box = 60 total apples │
│ Step 2: 5 boxes - 3 boxes = 2 boxes remaining │
│ Step 3: 2 boxes × 12 apples = 24 apples │
│ Answer: 24 ✓" │
└─────────────────────────────────────────────────────────────┘
The plan makes the reasoning auditable. You can verify each step independently.
Implementation
def plan_and_solve(llm, problem: str, use_ps_plus: bool = False):
"""Two-stage: plan first, then execute."""
# Phase 1: Plan
if use_ps_plus:
plan_prompt = f"""Q: {problem}
Let's first understand the problem, extract relevant variables
and their corresponding numerals, and devise a plan."""
else:
plan_prompt = f"""Q: {problem}
Let's first understand the problem and devise a plan
to solve the problem."""
plan = llm.generate(plan_prompt)
# Phase 2: Execute the plan
solve_prompt = f"""Q: {problem}
{plan}
Then, let's carry out the plan, solve the problem step by step,
and show the answer."""
solution = llm.generate(solve_prompt)
return {
"plan": plan,
"solution": solution,
"variant": "PS+" if use_ps_plus else "PS"
}
Handling Plan Failures
Plans aren't perfect. When the generated plan is wrong, you need a fallback.
Replanning Trigger
def plan_with_fallback(llm, problem: str):
"""Plan, execute, and replan if execution reveals plan flaws."""
plan, solution = plan_and_solve(llm, problem)
# Check if the plan was followed or if execution contradicted it
if "I realize the plan is wrong" in solution or \
"actually" in solution.lower() and solution.lower().count("actually") > 1:
replan_prompt = f"""The original plan for this problem had issues.
Problem: {problem}
Original plan: {plan}
Issue found: The plan doesn't account for all constraints.
Create a corrected plan and solve again:"""
plan, solution = plan_and_solve(llm, replan_prompt)
return plan, solution
Common Plan Failures
| Failure | Symptom | Fix |
|---|---|---|
| Missing constraint | Plan has N steps but problem has N+1 requirements | PS+ with explicit variable extraction |
| Wrong order | Plan puts dependent steps in wrong sequence | Ask "Does step K depend on step J? If so, reorder." |
| Overly vague | "Solve the problem" as a plan step | Request "specific, numbered steps with sub-goals" |
| Circular plan | Plan references outputs that haven't been computed | Add "Verify each step's prerequisites are met" |
When Plan-and-Solve Wins
Strongest on:
- Multi-step math word problems (GSM8K, SVAMP, MultiArith)
- Symbolic reasoning with many sequential operations
- Long-form generation where structure prevents rambling
- Tasks where CoT consistently misses intermediate steps
No advantage on:
- Single-step problems (classification, factual lookup)
- Tasks where the reasoning is trivial and decomposition adds overhead
- Creative generation where rigid planning kills fluidity
Plan-and-Solve vs. Other Techniques
| Technique | Structure | Zero-Shot? | Key Strength | Key Weakness |
|---|---|---|---|---|
| Standard CoT | Linear chain | Yes (zero-shot) / No (few-shot) | Simple, universal | Missing steps, calc errors |
| Plan-and-Solve | Plan → Execute | Yes | Structured, auditable | Plan rigidity |
| Least-to-Most | Decompose → Solve sequentially | No (needs exemplars) | Harder-than-exemplar generalization | Decomposition can fail |
| Tree-of-Thought | Branch → Evaluate → Prune | No (needs evaluator) | Explores alternatives | High cost, needs good scorer |
Plan-and-Solve vs. Least-to-Most
Both decompose problems, but the decomposition strategy differs:
| Aspect | Plan-and-Solve | Least-to-Most |
|---|---|---|
| Decomposition style | Top-down plan, then execute | Bottom-up: easiest subproblem first |
| Exemplar dependency | Zero-shot works | Few-shot (needs decomposition examples) |
| Problem scope | Fixed-complexity problems | Problems harder than training examples |
| Cost | 2 LLM calls per problem | N calls (one per subproblem) |
| Best for | Math reasoning, structured tasks | SCAN, compositional generalization |
Production Integration
LangChain adopted Plan-and-Solve as Plan-and-Execute. In practice, you can use it as a drop-in replacement for zero-shot CoT:
# Replace this:
response = llm.generate(f"{problem}\nLet's think step by step.")
# With this:
response = plan_and_solve(llm, problem, use_ps_plus=True)
The tradeoff: PS uses roughly 2x the tokens of CoT (plan + execution), but the accuracy gain on multi-step problems typically justifies it.
Related Articles
Image Generation with ChatGPT
Explore creative image generation prompts for ChatGPT, from 3D glassmorphism icons to futuristic designs and artistic transformations.
Quick Prompts for Nano Banana: Ready-to-Use Templates
Fast, copy-paste prompts for social media, business, and trending formats. Get instant results with these ready-to-use Nano Banana templates.
Marketing Copy: Create Compelling Content with ChatGPT
Learn how to craft persuasive marketing copy that engages your audience and drives conversions using ChatGPT. Master proven frameworks like AIDA and PAS.