The Core Idea

Self-consistency (Wang et al. 2022) replaces greedy decoding with diverse sampling. Instead of taking one reasoning path, generate multiple paths and select the most consistent answer. Errors tend to be unique; correct answers converge.

Standard CoT:  Model → One reasoning path → One answer (greedy, risk of error)
Self-Consistency: Model → 5-10 reasoning paths → Majority vote → Most reliable answer

When Single-Path Reasoning Fails

Question: When I was 6 my sister was half my age. Now I'm 70,
          how old is my sister?

Single CoT output:
"When I was 6, my sister was half my age = 3.
Now I'm 70, so she's 70 / 2 = 35."
→ WRONG (correct answer: 67)

With self-consistency, you generate multiple paths:

Path 1: "Sister was 3 when I was 6, age difference is 3 years.
        At 70, sister is 70 - 3 = 67." → Answer: 67 ✓

Path 2: "Sister was half my age at 6 = 3 years old.
        70 - 6 = 64 years passed. She's 3 + 64 = 67." → Answer: 67 ✓

Path 3: "Half of 70 is 35." → Answer: 35 ✗

Result: 67 appears twice, 35 appears once → Final answer: 67 ✓

Implementation

import asyncio
from collections import Counter

async def self_consistency(model, prompt, n_samples=5, temperature=0.7):
    """Generate N reasoning paths and return the majority answer."""
    responses = await asyncio.gather(*[
        model.generate(prompt, temperature=temperature, max_tokens=500)
        for _ in range(n_samples)
    ])

    # Extract final answers from reasoning paths
    answers = [extract_final_answer(r) for r in responses]

    # Majority vote
    counts = Counter(answers)
    best_answer, votes = counts.most_common(1)[0]
    confidence = votes / n_samples

    return {
        "answer": best_answer,
        "confidence": confidence,
        "all_answers": answers,
        "reasoning_paths": responses
    }

def extract_final_answer(text: str) -> str:
    """Extract the final answer from a reasoning chain.
    Looks for patterns like 'The answer is X' or 'Therefore, X'."""
    import re
    patterns = [
        r'(?:answer is|therefore|conclusion:)\s*(.+)',
        r'(?:^|\n)(\d+)\s*$'  # Last line is just a number
    ]
    for pattern in patterns:
        matches = re.findall(pattern, text.lower())
        if matches:
            return matches[-1].strip()
    return text.strip().split('\n')[-1]

Aggregation Strategies

Method	How It Works	Best For
Majority vote	Count exact answer matches	Discrete answers (numbers, categories)
Weighted vote	Weight by reasoning chain quality score	When you have a confidence evaluator
Span extraction	Find overlapping answer spans across responses	Free-text answers
LLM aggregator	Ask another LLM call to synthesize all paths	Complex multi-faceted answers

Temperature and Sampling

Temperature controls diversity. Higher = more diverse paths, but also more noise.

Temperature	Diversity	Accuracy Impact	Best For
0.0	Deterministic	No gain (same path each time)	Never use for self-consistency
0.3-0.5	Low diversity	Small gains	Simple arithmetic
0.5-0.7	Moderate diversity	Best balance	Most reasoning tasks
0.7-1.0	High diversity	Risk of noise overwhelming signal	Complex open-ended reasoning

When Self-Consistency Helps

Strong gains on:

Arithmetic reasoning (GSM8K, MATH datasets)
Commonsense reasoning (StrategyQA, CommonsenseQA)
Symbolic reasoning (date arithmetic, logical deduction)

Weak or no gains on:

Factual recall (the model either knows it or doesn't)
Simple classification (paths all converge to same answer)
Tasks where the model is fundamentally wrong 100% of the time
Creative writing (no single "correct" answer)

Cost Analysis

Self-consistency multiplies token costs linearly. Every sample is a full API call.

Samples	Relative Cost	Typical Accuracy Gain
1 (baseline)	1x	-
3	3x	+10-15%
5	5x	+15-20%
10	10x	+20-25% (diminishing returns beyond 10)

When the cost is worth it:

High-stakes decisions where accuracy matters more than cost
Automated pipelines where you can batch process
One-time analysis tasks (research, legal review)

Combining With Other Techniques

Self-consistency wraps around other prompting strategies — it's not a replacement.

CoT + Self-Consistency: The standard combination. Generate CoT chains, vote on answers.
ToT + Self-Consistency: Generate multiple trees, vote on final root nodes.
Few-Shot + Self-Consistency: Use few-shot examples to improve individual path quality, then vote.

DeepSeek for Code Generation: Agentic Coding & FIM

Master DeepSeek for code generation. Agentic coding via Claude Code/OpenCode integration, FIM completion patterns, and competitive programming with reasoning mode leveraging DeepSeek's SOTA coding benchmarks.

Vintage & Nostalgia: Retro Photography & Memorabilia Guide

Transport subjects to any era and create vintage memorabilia with Nano Banana. Master the art of retro photography and physical artifacts.

Code Refactoring with ChatGPT

Learn how to effectively use ChatGPT for code refactoring and improvement with proven prompts and best practices.

Self-Consistency: Improving Reasoning Through Majority Voting