DeepSeek's reasoning_effort parameter is the only dial you have for controlling thinking depth — temperature and top_p are disabled in thinking mode. The two effective levels are high (default) and max. But "max" isn't always better. It costs more tokens, adds latency, and on some tasks, overthinking degrades quality.

Effort Level Comparison

Aspect	high (default)	max
Reasoning depth	Thorough, multi-step	Exhaustive, explores alternatives
Token consumption	Baseline	+50-200% more reasoning tokens
Latency	Baseline	+30-100% more time
Best for	Most reasoning tasks	Complex proofs, architecture, strategy
Overkill on	Simple analysis, creative tasks	Most non-reasoning tasks
Agent contexts	Default for API	Auto-set by Claude Code, OpenCode

When to Use `effort=high`

High is the correct default for most tasks:

Standard reasoning tasks: Logic problems, analysis, explanation
Code generation: Function-level coding, bug fixes, refactoring
Document Q&A: Answering questions about provided documents
Classification & extraction: Structured output tasks
Creative writing: Where overthinking kills creativity
Cost-sensitive workflows: Where token budget matters

Use high when:
- The task benefits from reasoning but doesn't require exhaustive exploration
- You're optimizing for speed and cost
- The problem has a clear solution path
- You're not sure — high is the safer default

When to Use `effort=max`

Max is justified for problems where errors cascade:

Mathematical proofs: Multi-step derivations where each step must be verified
Architecture decisions: System design with competing constraints and tradeoffs
Complex debugging: Multi-hop error diagnosis across system boundaries
Strategic analysis: Exploring scenarios, identifying hidden assumptions
Competitive programming: Algorithmic problems with edge cases
Legal/regulatory reasoning: Where missing a clause has real consequences
Agentic coding: Auto-set by Claude Code and OpenCode for complex agent tasks

Use max when:
- The cost of a wrong answer exceeds the cost of extra tokens
- The problem has multiple valid approaches you want the model to consider
- You need the model to catch its own edge cases
- You're debugging and need to see exhaustive reasoning

The Diminishing Returns Curve

For most tasks, quality follows an S-curve but with fewer steps than Claude's token-budget approach:

Effort	Quality Gain	ROI
Non-thinking → high	Large jump (reasoning enabled)	Excellent
high → max	Marginal improvement for most tasks	Fair to poor
Max (on the right task)	Significant for complex reasoning	Excellent

The jump from no-thinking to high is the largest quality gain. high → max provides meaningful gains only on the specific task categories listed above. On typical Q&A, summarization, or coding tasks, max spends more tokens for indistinguishable output quality.

Cost Implications

Reasoning tokens are billed at output token rates:

Model	Cost per 1M thinking tokens	Cost for 4K thinking tokens
Pro (high)	$0.87	~$0.0035
Pro (max)	$0.87	~$0.007-0.014 (2-4x more tokens)
Flash (high)	$0.28	~$0.0011
Flash (max)	$0.28	~$0.0022-0.0045

At DeepSeek's pricing, even max on Pro is cheaper than Claude's base API call. But the relative difference matters at scale: 1M requests at max vs high on Pro costs $3,500-$10,500 more.

When Thinking Mode Hurts

Some tasks are better without thinking mode:

Task	Why thinking degrades quality
Creative writing	Over-rationalizing kills spontaneity and voice
Simple translation	Adds latency, no quality gain
Conversational chat	Thinking tokens are wasted on social conventions
Routine classification	Deterministic task, reasoning is overhead
JSON extraction (known schema)	JSON mode alone is sufficient

Note:

Pro Move: For Claude Code integration, set CLAUDE_CODE_EFFORT_LEVEL=max for the main agent and leave sub-agents at default (high). The main agent benefits from exhaustive reasoning during planning; sub-agents executing specific instructions don't.

Note:

Don't confuse effort with correctness: In thinking mode, low and medium effort values are silently mapped to high. If you're testing different levels, only high and max are real. xhigh maps to max for compatibility.

Thinking Mode Guide — Foundation: how to enable thinking mode and read reasoning_content tokens.
Multi-Turn Reasoning — Effort levels interact with multi-turn behavior — tool-call chains auto-set effort to max.

DeepSeek Reasoning Effort Control: High vs Max

Effort Level Comparison

When to Use `effort=high`

When to Use `effort=max`

The Diminishing Returns Curve

Cost Implications

When Thinking Mode Hurts

Related Articles

Parallel vs Sequential Chains: When to Use Each

Literature Review Guide

Essay Structure

On this page

DeepSeek Reasoning Effort Control: High vs Max

Effort Level Comparison

When to Use effort=high

When to Use effort=max

The Diminishing Returns Curve

Cost Implications

When Thinking Mode Hurts

Related Pages

Related Articles

Parallel vs Sequential Chains: When to Use Each

Literature Review Guide

Essay Structure

On this page

When to Use `effort=high`

When to Use `effort=max`