DeepSeek Reasoning Effort Control: High vs Max

Master DeepSeek's reasoning_effort parameter. When high vs max effort, cost implications, diminishing returns curve, and which task categories benefit most from maximum reasoning.

June 11, 2026
DeepSeekReasoning EffortThinking ModeCostPrompt Engineering

DeepSeek's reasoning_effort parameter is the only dial you have for controlling thinking depth — temperature and top_p are disabled in thinking mode. The two effective levels are high (default) and max. But "max" isn't always better. It costs more tokens, adds latency, and on some tasks, overthinking degrades quality.

Effort Level Comparison

Aspecthigh (default)max
Reasoning depthThorough, multi-stepExhaustive, explores alternatives
Token consumptionBaseline+50-200% more reasoning tokens
LatencyBaseline+30-100% more time
Best forMost reasoning tasksComplex proofs, architecture, strategy
Overkill onSimple analysis, creative tasksMost non-reasoning tasks
Agent contextsDefault for APIAuto-set by Claude Code, OpenCode

When to Use effort=high

High is the correct default for most tasks:

  • Standard reasoning tasks: Logic problems, analysis, explanation
  • Code generation: Function-level coding, bug fixes, refactoring
  • Document Q&A: Answering questions about provided documents
  • Classification & extraction: Structured output tasks
  • Creative writing: Where overthinking kills creativity
  • Cost-sensitive workflows: Where token budget matters
Use high when:
- The task benefits from reasoning but doesn't require exhaustive exploration
- You're optimizing for speed and cost
- The problem has a clear solution path
- You're not sure — high is the safer default

When to Use effort=max

Max is justified for problems where errors cascade:

  • Mathematical proofs: Multi-step derivations where each step must be verified
  • Architecture decisions: System design with competing constraints and tradeoffs
  • Complex debugging: Multi-hop error diagnosis across system boundaries
  • Strategic analysis: Exploring scenarios, identifying hidden assumptions
  • Competitive programming: Algorithmic problems with edge cases
  • Legal/regulatory reasoning: Where missing a clause has real consequences
  • Agentic coding: Auto-set by Claude Code and OpenCode for complex agent tasks
Use max when:
- The cost of a wrong answer exceeds the cost of extra tokens
- The problem has multiple valid approaches you want the model to consider
- You need the model to catch its own edge cases
- You're debugging and need to see exhaustive reasoning

The Diminishing Returns Curve

For most tasks, quality follows an S-curve but with fewer steps than Claude's token-budget approach:

EffortQuality GainROI
Non-thinking → highLarge jump (reasoning enabled)Excellent
high → maxMarginal improvement for most tasksFair to poor
Max (on the right task)Significant for complex reasoningExcellent

The jump from no-thinking to high is the largest quality gain. highmax provides meaningful gains only on the specific task categories listed above. On typical Q&A, summarization, or coding tasks, max spends more tokens for indistinguishable output quality.

Cost Implications

Reasoning tokens are billed at output token rates:

ModelCost per 1M thinking tokensCost for 4K thinking tokens
Pro (high)$0.87~$0.0035
Pro (max)$0.87~$0.007-0.014 (2-4x more tokens)
Flash (high)$0.28~$0.0011
Flash (max)$0.28~$0.0022-0.0045

At DeepSeek's pricing, even max on Pro is cheaper than Claude's base API call. But the relative difference matters at scale: 1M requests at max vs high on Pro costs $3,500-$10,500 more.

When Thinking Mode Hurts

Some tasks are better without thinking mode:

TaskWhy thinking degrades quality
Creative writingOver-rationalizing kills spontaneity and voice
Simple translationAdds latency, no quality gain
Conversational chatThinking tokens are wasted on social conventions
Routine classificationDeterministic task, reasoning is overhead
JSON extraction (known schema)JSON mode alone is sufficient

Note:

Pro Move: For Claude Code integration, set CLAUDE_CODE_EFFORT_LEVEL=max for the main agent and leave sub-agents at default (high). The main agent benefits from exhaustive reasoning during planning; sub-agents executing specific instructions don't.

Note:

Don't confuse effort with correctness: In thinking mode, low and medium effort values are silently mapped to high. If you're testing different levels, only high and max are real. xhigh maps to max for compatibility.

  • Thinking Mode Guide — Foundation: how to enable thinking mode and read reasoning_content tokens.
  • Multi-Turn Reasoning — Effort levels interact with multi-turn behavior — tool-call chains auto-set effort to max.