DeepSeek V4 made 1M token context the default — it's not a premium tier or an experimental feature you need to beg for. At 5x Claude's 200K, this changes what's possible in a single prompt. You can load an entire monorepo, 10 full novels, a year of Slack messages, or 1,000+ pages of documentation into a single request.

But 1M context creates new challenges. The U-shape attention curve is wider, cache economics are different at this scale, and the marginal value of "just load everything" has diminishing returns. The strategies below are specific to operating at this scale.

Enabling 1M Context

Use the [1m] suffix on model names:

# With 1M context
client.chat.completions.create(
    model="deepseek-v4-pro[1m]",
    messages=messages
)

# Without the suffix — defaults to smaller context window
client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=messages
)

In Claude Code:

export ANTHROPIC_MODEL=deepseek-v4-pro[1m]

The 1M U-Shape Attention Curve

Like all long-context models, DeepSeek's attention follows a U-shape — strongest at the beginning and end of the context window, weakest in the middle. But at 1M tokens, the "weak middle" is 500K tokens wide.

The 1M Sandwich Pattern

[0-50K tokens: BEGINNING — High attention]
- System prompt (static, cache-friendly)
- Task instructions
- Format specifications
- Output requirements

[50K-950K tokens: MIDDLE — Lower attention]
- Primary content (documents, codebase, data)
- Structure with clear section headers for navigation
- Place the MOST IMPORTANT content closest to edges

[950K-1M tokens: END — High attention]
- Specific task question
- Reference markers to middle content
- Repeat critical constraints
- Output format reminder

Progressive Disclosure at 1M Scale

Don't dump 1M tokens at once. Build context progressively:

Turn 1 (50K tokens):
"Here's the project README, architecture doc, and directory tree.
Which files are relevant to implementing [feature]?"

Turn 2 (200K tokens):
"Good. Now here are the files you identified: [paste relevant files].
Propose an implementation approach."

Turn 3 (500K tokens):
"Here are the test files for those modules: [paste tests].
Identify edge cases your approach misses."

Turn 4 (1M tokens):
"Final round. Here are the deployment configs, CI pipeline, and monitoring setup.
What production concerns should we address?"

When 1M Context Wins vs RAG

Scenario	Recommended	Why
Cross-document reasoning (compare 50 contracts)	Full 1M context	RAG misses cross-document relationships
Unknown retrieval target ("find anything unusual")	Full 1M context	Cannot build RAG query for "anything unusual"
One-off analysis of a large document	Full 1M context	Engineering cost of RAG > compute cost
Repeated Q&A against same document set	RAG + context caching	Cache hits on static documents are cheaper
High-volume fact retrieval	RAG	Lower latency, lower cost per query
First-pass document screening	1M context (Flash)	Scan 500 pages for relevance at $0.14/M

Cost at 1M Scale

Loading 1M input tokens with Flash costs roughly $0.14 per request. With Pro, $0.435. Compare:

Scenario	DeepSeek Flash (1M)	Claude Sonnet (200K)
1M token document analysis	$0.14 input	Not possible (5x 200K requests: $15)
500K token codebase review	$0.07	$7.50 (3x 200K requests)
Cross-document search (10 docs × 100K)	$0.14 (single request)	$15 (50x 200K with overlap)

For tasks that genuinely need the entire context, DeepSeek enables analysis that's either impossible or cost-prohibitive with any other model.

Attention Management for 1M

DOCUMENT SET (900K tokens):

=== SECTION 1: Requirements Specification ===
[paste requirements]

=== SECTION 2: Technical Architecture ===
[paste architecture doc]

=== SECTION 3: Test Plans ===
[paste test plans]

...

END ANCHOR:
"You've read the full specification. Focus your analysis on:
- Authentication flow: Section 2.3
- Database schema: Section 2.7
- API contracts: Section 2.12
If you find conflicting information between sections, flag it explicitly."

Structured Section Markers

Good markers (model navigates well):

=== SECTION 2.3: Authentication Flow ===

Bad markers (model struggles):

so for auth we basically use JWTs and the client sends stuff

Verification at Scale

"After analyzing this 800K token document set, verify your answer:

1. Quote the EXACT passage you based your conclusion on
2. State whether there are conflicting passages elsewhere
3. Indicate your confidence: HIGH (explicitly stated) / MEDIUM (inferred) / LOW (extrapolated)
4. List the sections you DID NOT find relevant — absence is important information"

Note:

Pro Move: For codebase analysis at 1M scale, use a file-tree-first approach. Send the directory tree (5-10K tokens) with effort=high and ask the model to identify which files are relevant. Then load only those files. Claude identifies relevant files with surprising accuracy, and you save 900K+ tokens per analysis.

Note:

The "more is better" trap: Loading 1M tokens when 50K would suffice wastes money and degrades retrieval accuracy. The model's attention is a finite resource — every irrelevant token you add dilutes focus on the relevant ones. Always ask: "Could I achieve this with focused retrieval?"

Context Caching — Make 1M context cost-effective with cache-aware prompt design. 50x cost reduction on cache hits.
Needle-in-Megahaystack — Retrieval patterns for finding specific information in 1M-token documents.

DeepSeek 1M Context Strategies: Prompt Structuring at Scale

Enabling 1M Context

The 1M U-Shape Attention Curve

The 1M Sandwich Pattern

Progressive Disclosure at 1M Scale

When 1M Context Wins vs RAG

Cost at 1M Scale

Attention Management for 1M

Explicit Navigation Anchors

Structured Section Markers

Verification at Scale

Related Articles

Algorithm Design Prompts for ChatGPT | Problem Solving Guide

Essay Structure

Master ChatGPT Prompts: Complete Strategy Guide

On this page

DeepSeek 1M Context Strategies: Prompt Structuring at Scale

Enabling 1M Context

The 1M U-Shape Attention Curve

The 1M Sandwich Pattern

Progressive Disclosure at 1M Scale

When 1M Context Wins vs RAG

Cost at 1M Scale

Attention Management for 1M

Explicit Navigation Anchors

Structured Section Markers

Verification at Scale

Related Pages

Related Articles

Algorithm Design Prompts for ChatGPT | Problem Solving Guide

Essay Structure

Master ChatGPT Prompts: Complete Strategy Guide

On this page