The Core Idea

Least-to-Most prompting (Zhou et al. 2022, ICLR 2023) solves the hardest limitation of few-shot prompting: standard CoT can't solve problems harder than the examples it was shown. Least-to-Most breaks hard problems into a sequence of easier subproblems. By solving the easiest first and feeding answers forward, it generalizes to problems far beyond the exemplars.

Standard Few-Shot CoT:  See easy examples → Can only solve easy problems
Least-to-Most:          See easy examples → Decompose hard problem → Solve in sequence
                        → Can solve MUCH harder problems

The SCAN Breakthrough

SCAN is a compositional generalization benchmark: given commands like "jump twice" and "walk left," can the model understand "jump twice after walking left twice"?

Method	SCAN Accuracy (length split)	Training Data
Standard CoT (few-shot)	16%	14 examples
Least-to-Most	99%	14 examples
Neural-symbolic models (specialized)	~99%	15,000+ examples

Least-to-Most matched specialized models trained on 1,000x more data, using only 14 few-shot examples. That's the power of decomposition.

How Least-to-Most Works

Two distinct phases:

Phase 1: Decomposition

The prompt teaches the model how to break problems down. You provide few-shot examples showing a hard problem and its decomposition into easier subproblems.

Example:
Complex problem: "walk left twice and jump twice after walk"
Decomposition:
  1. "walk left twice"
  2. "walk left twice and jump twice after walk"  (combining)

Simple problem: "jump twice after walk left twice"
Decomposition:
  1. "walk left twice"
  2. "walk left twice and jump twice"  (combining)

Phase 2: Subproblem Solving

Each subproblem is solved in order. The answer to subproblem N is fed into the prompt for subproblem N+1.

Problem: "jump twice and walk left after walk right"

Decomposition:
  1. "jump twice"                          → Easy
  2. "walk left after walk right"          → Medium
  3. "jump twice and walk left after walk right" → Full (using answers from 1 and 2)

The model never faces the hard problem directly. It climbs the difficulty ladder one rung at a time.

Prompt Structure

Decomposition Prompt (Few-Shot)

You will break complex problems into simpler subproblems.

Example:
Complex: "look opposite right twice and walk opposite left"
Decomposition:
1. "look opposite right twice"
2. "look opposite right twice and walk opposite left"


Complex: "run left twice after run right"
Decomposition:
1. "run left twice"
2. "run left twice after run right" (keeping original structure)


Complex: "look twice after run"
Decomposition:
1. "look twice"
2. "look twice after run" (addressing the harder part after)


Now decompose this:
Complex: "{problem}"
Decomposition:

Solving Prompt (Sequential)

Now solve each subproblem one at a time. Use the solution of the
previous subproblem to help solve the next one.

Step 1: {subproblem_1}
Solution 1: {solution_1}

Step 2: {subproblem_2}, using Solution 1
Solution 2: {solution_2}

Step 3: {subproblem_3}, using Solutions 1 and 2
Solution 3: {solution_3}

Implementation

def least_to_most(llm, problem: str, decomposition_examples: list,
                  solve_examples: list):
    """Least-to-Most: decompose, then solve sequentially."""

    # Phase 1: Decompose into subproblems
    decomp_prompt = build_decomposition_prompt(
        decomposition_examples, problem
    )
    decomposition = llm.generate(decomp_prompt)
    subproblems = parse_subproblems(decomposition)  # ["sub1", "sub2", ...]

    # Phase 2: Solve each subproblem, feeding answers forward
    solutions = []
    for i, subproblem in enumerate(subproblems):
        solve_prompt = build_solve_prompt(
            solve_examples, subproblems, solutions, current_idx=i
        )
        answer = llm.generate(solve_prompt)
        solutions.append(answer)

    return {
        "decomposition": subproblems,
        "solutions": solutions,
        "final_answer": solutions[-1]
    }


def parse_subproblems(decomposition_text: str) -> list:
    """Extract numbered subproblems from decomposition output."""
    import re
    matches = re.findall(r'\d+\.\s+(.+)', decomposition_text)
    return matches if matches else [decomposition_text]


def build_decomposition_prompt(examples: list, problem: str) -> str:
    """Build the few-shot decomposition prompt (see Prompt Structure above)."""
    prompt = "You will break complex problems into simpler subproblems.\n\n"
    for ex in examples:
        prompt += f"Complex: \"{ex['complex']}\"\n"
        prompt += "Decomposition:\n"
        for i, step in enumerate(ex['decomposition'], 1):
            prompt += f"{i}. \"{step}\"\n"
        prompt += "\n"
    prompt += f"Now decompose this:\nComplex: \"{problem}\"\nDecomposition:\n"
    return prompt


def build_solve_prompt(examples: list, subproblems: list,
                       solutions: list, current_idx: int) -> str:
    """Build the sequential solving prompt, feeding prior answers forward."""
    prompt = "Now solve each subproblem one at a time.\n\n"
    for i in range(current_idx):
        prompt += f"Step {i+1}: {subproblems[i]}\nSolution {i+1}: {solutions[i]}\n\n"
    prompt += f"Step {current_idx+1}: {subproblems[current_idx]}"
    if current_idx > 0:
        prompt += f", using Solutions 1-{current_idx}"
    prompt += "\nSolution:"
    return prompt

When Least-to-Most Wins

Strongest on:

Compositional generalization (SCAN, COGS, CFQ)
Symbolic manipulation (last-letter concatenation, coin flip tracking)
Multi-step math with increasing difficulty
Problems where subproblems have clear dependency chains

No advantage on:

Single-step problems — decomposition is overhead with no benefit
Problems where subproblems are interdependent (circular dependencies)
Tasks where the "hard" part can't be separated from the "easy" parts
Creative tasks where decomposition destroys coherence

When Decomposition Fails

Not every problem decomposes cleanly:

Failure Mode	Example	Why It Fails
Circular dependencies	Subproblem A needs B's answer, B needs A's answer	No valid solving order
Non-compositional reasoning	"Is this argument valid?" — can't separate into independent steps	Reasoning is holistic
Over-decomposition	Breaking "write a haiku" into "write line 1," "write line 2," "write line 3"	Context needed across subproblems
Lost context	Each subproblem's answer loses nuance needed by later subproblems	Information compression

Detecting Decomposition Failures

# pseudo-code: check for circular dependencies via topological sort.
# In practice, use a library like networkx or a simple DFS cycle detector.
def is_decomposition_valid(subproblems: list, dependencies: dict) -> bool:
    """Check for circular dependencies via topological sort."""
    from collections import deque
    in_degree = {s: 0 for s in subproblems}
    for s in subproblems:
        for dep in dependencies.get(s, []):
            in_degree[s] += 1
    queue = deque([s for s in subproblems if in_degree[s] == 0])
    visited = 0
    while queue:
        node = queue.popleft()
        visited += 1
        for s in subproblems:
            if node in dependencies.get(s, []):
                in_degree[s] -= 1
                if in_degree[s] == 0:
                    queue.append(s)
    return visited == len(subproblems)

In practice, if the model's decomposition output is nonsensical or the subproblems can't be ordered, fall back to standard CoT.

Least-to-Most vs. Other Strategies

Technique	Decomposition	Order	Exemplar Requirement	Generalization
Standard CoT	None (linear)	Sequential	Few-shot	Up to exemplar difficulty
Plan-and-Solve	Top-down plan	Plan-guided	Zero-shot	Same complexity
Least-to-Most	Bottom-up	Easiest first	Few-shot (decomp. examples)	Harder than exemplars
Tree-of-Thought	Branching	BFS/DFS with evaluation	Few-shot	Same complexity
Decomposed Prompting	Modular sub-prompts	Arbitrary	Few-shot	Broader coverage

The Difficulty Ladder

Least-to-Most works by constructing a difficulty ladder. Each step should be only slightly harder than the previous:

Subproblem 1: One operation     (difficulty: 1/10)
Subproblem 2: Two operations    (difficulty: 3/10)
Subproblem 3: Three operations  (difficulty: 6/10)
Subproblem 4: Full problem      (difficulty: 9/10)

If the jump between subproblems is too large, the model fails at the transition. The decomposition prompt should enforce gradual difficulty increase.

Cost Analysis

Phase	Calls	Cost
Decomposition	1	1x
Solving (N subproblems)	N	N × 1x
Total	N + 1	(N + 1)x

For a problem decomposed into 3 subproblems: 4 total LLM calls (1 decomposition + 3 solving). This is more expensive than single-call CoT, but the accuracy gain on hard problems makes it worthwhile.

When to pay the cost:

The problem is harder than your few-shot examples
Standard CoT gets it wrong reliably
Wrong answers have high cost (medical, financial, legal)
You can batch the decomposition phase across similar problems

Real-World Use Cases

Beyond the SCAN benchmark, Least-to-Most applies to:

Multi-hop QA: "Who is the CEO of the company that acquired Instagram?" → Decompose: "What company acquired Instagram?" → "Who is the CEO of that company?"
Code generation: "Write a function that sorts a list, removes duplicates, and returns the top 3" → Decompose into sort, deduplicate, top-k
Data pipeline design: Decompose complex ETL into sequential transformation steps
Legal reasoning: Break a statute interpretation into definition lookup → condition check → application

Least-to-Most Prompting: Easy-to-Hard Generalization