The Core Idea

Plan-and-Solve (Wang et al. 2023) fixes a fundamental weakness of zero-shot CoT: the model skips steps and makes calculation errors because it thinks and executes simultaneously. The fix is separation — plan the whole solution first, then execute each step with the plan as a guide.

Zero-Shot CoT: "Step 1, Step 2, Step 3..."
               → Missing steps, calculation errors, semantic confusion

Plan-and-Solve: Phase 1: "Here's my plan: ..."
                Phase 2: "Step 1: ... Step 2: ... (following the plan)"
               → Fewer missing steps, better organization

The Three CoT Failure Modes

Plan-and-Solve was designed to address three specific failure patterns in zero-shot CoT:

Failure Mode	What Happens	Example
Missing-step errors	Model jumps from problem to answer, skipping intermediate reasoning	"A train leaves at 3pm traveling 60mph. It's 4pm now. Distance = 60 miles." (wrong — didn't account for time properly)
Calculation errors	Model does arithmetic wrong during free-form reasoning	"48 × 37 = 1,764" (off by 12 — common digit-swap hallucination)
Semantic misunderstanding	Model misinterprets the problem structure	Treating a comparison problem as a counting problem

PS addresses missing-step errors. PS+ (with detailed instructions) addresses calculation and semantic errors.

Prompt Templates

PS Prompt (Plan-and-Solve)

Q: {problem}

Let's first understand the problem and devise a plan to solve the problem.
Then, let's carry out the plan to solve the problem step by step.

PS+ Prompt (Plan-and-Solve with Extra Guidance)

Q: {problem}

Let's first understand the problem, extract relevant variables
and their corresponding numerals, and devise a plan.
Then, let's carry out the plan, calculate intermediate variables
(pay attention to correct numeral calculation and commonsense),
solve the problem step by step, and show the answer.

The PS+ variant adds three guardrails:

Extract variables explicitly — prevents overlooking numbers in the problem
Calculate intermediate variables — forces showing work, reducing mental arithmetic errors
Pay attention to commonsense — reminds the model to sanity-check results

PS vs PS+ vs CoT: An Example

Problem: John has 5 boxes. Each box contains 12 apples.
         He gives away 3 boxes. How many apples does he have left?

┌─────────────────────────────────────────────────────────────┐
│ Zero-Shot CoT                                                │
│ "Let's think step by step. John had 5 boxes × 12 = 60       │
│  apples. He gave away 3 boxes, so he has 2 × 12 = 24."      │
│ → Answer: 24 ✓ (but reasoning skipped the allocation step)   │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ Plan-and-Solve (PS)                                          │
│ "Let's first understand the problem and devise a plan...     │
│                                                              │
│ Plan:                                                        │
│ 1. Calculate total apples John starts with                   │
│ 2. Calculate boxes remaining after giving away               │
│ 3. Calculate apples in remaining boxes                       │
│                                                              │
│ Step 1: 5 boxes × 12 apples per box = 60 total apples       │
│ Step 2: 5 boxes - 3 boxes = 2 boxes remaining               │
│ Step 3: 2 boxes × 12 apples = 24 apples                     │
│ Answer: 24 ✓"                                                │
└─────────────────────────────────────────────────────────────┘

The plan makes the reasoning auditable. You can verify each step independently.

Implementation

def plan_and_solve(llm, problem: str, use_ps_plus: bool = False):
    """Two-stage: plan first, then execute."""

    # Phase 1: Plan
    if use_ps_plus:
        plan_prompt = f"""Q: {problem}

Let's first understand the problem, extract relevant variables
and their corresponding numerals, and devise a plan."""
    else:
        plan_prompt = f"""Q: {problem}

Let's first understand the problem and devise a plan
to solve the problem."""

    plan = llm.generate(plan_prompt)

    # Phase 2: Execute the plan
    solve_prompt = f"""Q: {problem}

{plan}

Then, let's carry out the plan, solve the problem step by step,
and show the answer."""
    solution = llm.generate(solve_prompt)

    return {
        "plan": plan,
        "solution": solution,
        "variant": "PS+" if use_ps_plus else "PS"
    }

Handling Plan Failures

Plans aren't perfect. When the generated plan is wrong, you need a fallback.

Replanning Trigger

def plan_with_fallback(llm, problem: str):
    """Plan, execute, and replan if execution reveals plan flaws."""
    plan, solution = plan_and_solve(llm, problem)

    # Check if the plan was followed or if execution contradicted it
    if "I realize the plan is wrong" in solution or \
       "actually" in solution.lower() and solution.lower().count("actually") > 1:
        replan_prompt = f"""The original plan for this problem had issues.
Problem: {problem}
Original plan: {plan}
Issue found: The plan doesn't account for all constraints.

Create a corrected plan and solve again:"""
        plan, solution = plan_and_solve(llm, replan_prompt)

    return plan, solution

Common Plan Failures

Failure	Symptom	Fix
Missing constraint	Plan has N steps but problem has N+1 requirements	PS+ with explicit variable extraction
Wrong order	Plan puts dependent steps in wrong sequence	Ask "Does step K depend on step J? If so, reorder."
Overly vague	"Solve the problem" as a plan step	Request "specific, numbered steps with sub-goals"
Circular plan	Plan references outputs that haven't been computed	Add "Verify each step's prerequisites are met"

When Plan-and-Solve Wins

Strongest on:

Multi-step math word problems (GSM8K, SVAMP, MultiArith)
Symbolic reasoning with many sequential operations
Long-form generation where structure prevents rambling
Tasks where CoT consistently misses intermediate steps

No advantage on:

Single-step problems (classification, factual lookup)
Tasks where the reasoning is trivial and decomposition adds overhead
Creative generation where rigid planning kills fluidity

Plan-and-Solve vs. Other Techniques

Technique	Structure	Zero-Shot?	Key Strength	Key Weakness
Standard CoT	Linear chain	Yes (zero-shot) / No (few-shot)	Simple, universal	Missing steps, calc errors
Plan-and-Solve	Plan → Execute	Yes	Structured, auditable	Plan rigidity
Least-to-Most	Decompose → Solve sequentially	No (needs exemplars)	Harder-than-exemplar generalization	Decomposition can fail
Tree-of-Thought	Branch → Evaluate → Prune	No (needs evaluator)	Explores alternatives	High cost, needs good scorer

Plan-and-Solve vs. Least-to-Most

Both decompose problems, but the decomposition strategy differs:

Aspect	Plan-and-Solve	Least-to-Most
Decomposition style	Top-down plan, then execute	Bottom-up: easiest subproblem first
Exemplar dependency	Zero-shot works	Few-shot (needs decomposition examples)
Problem scope	Fixed-complexity problems	Problems harder than training examples
Cost	2 LLM calls per problem	N calls (one per subproblem)
Best for	Math reasoning, structured tasks	SCAN, compositional generalization

Production Integration

LangChain adopted Plan-and-Solve as Plan-and-Execute. In practice, you can use it as a drop-in replacement for zero-shot CoT:

# Replace this:
response = llm.generate(f"{problem}\nLet's think step by step.")

# With this:
response = plan_and_solve(llm, problem, use_ps_plus=True)

The tradeoff: PS uses roughly 2x the tokens of CoT (plan + execution), but the accuracy gain on multi-step problems typically justifies it.

Plan-and-Solve: Two-Stage Decomposition