Claude Code Cost Optimization
Cut Claude Code costs by 70-95% using shell scripts, provider switching, and task routing. When to use extended thinking vs cheap execution.
Claude Code Cost Optimization
Claude Code provides access to the best reasoning models available. It's also expensive — $15/million output tokens. A single long session can cost several dollars. This tutorial covers the patterns that keep costs under control without sacrificing output quality.
The Cost Problem
A typical Claude Code session:
| Activity | Tokens (in/out) | Cost (Sonnet) |
|---|---|---|
| Codebase indexing | 50,000 / 0 | $0.15 |
| Exploring a bug | 5,000 / 3,000 | $0.06 |
| Generating a fix | 2,000 / 500 | $0.01 |
| Generating tests | 1,000 / 4,000 | $0.06 |
| Code review of fix | 3,000 / 1,000 | $0.02 |
| Documentation update | 1,000 / 2,000 | $0.03 |
| Total | 71,500 | $0.34 |
The indexing and exploration phases are unavoidable — Claude needs context. But test generation, bulk code output, and documentation are high-token, low-reasoning tasks. Those are the optimization targets.
Strategy 1: Task Routing
Not every prompt needs Claude-level reasoning. Route tasks by complexity:
Use Claude for:
- Architecture decisions
- Bug root cause analysis
- Code review for correctness
- Security audit
- Multi-step refactoring plans
Route elsewhere for:
- Test generation (DeepSeek: 95% cheaper)
- Boilerplate code (DeepSeek)
- Documentation generation (MiniMax or DeepSeek)
- Commit message generation (DeepSeek)
- Changelog generation (DeepSeek)
The routing decision is simple: does this task require reasoning, or generation? If generation, use a cheaper model.
Strategy 2: Shell Scripts for Bulk Generation
Claude Code can invoke shell scripts via the bash tool. The pattern: Claude designs the approach, a script calls a cheaper API for bulk generation, Claude reviews the output.
The helper script
Create ~/.claude/hooks/cheap-gen.sh:
#!/bin/bash
# Routes generation requests to DeepSeek API
# Pipe a prompt to stdin, get generated content on stdout
PROMPT=$(cat)
MODEL="${DEEPSEEK_MODEL:-deepseek-chat}"
curl -s https://api.deepseek.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPSEEK_API_KEY" \
-d "$(jq -n \
--arg model "$MODEL" \
--arg prompt "$PROMPT" \
'{
model: $model,
messages: [{role: "user", content: $prompt}],
temperature: 0.3,
max_tokens: 4096
}')" | jq -r '.choices[0].message.content'
chmod +x ~/.claude/hooks/cheap-gen.sh
Using the script
In a Claude Code session, pipe generation tasks through the script:
# Generate tests for a module
echo "Generate Jest unit tests for src/auth.ts covering login,
logout, token refresh, and rate limiting. Include edge cases
for expired tokens and invalid credentials." | \
~/.claude/hooks/cheap-gen.sh > __tests__/auth.test.ts
Claude designs the test strategy and reviews the output. DeepSeek generates 4,000 tokens of test code for $0.001 instead of $0.06.
When to use the script
| Task | Script? | Reason |
|---|---|---|
| Test generation | Yes | High volume, deterministic output |
| API endpoint stubs | Yes | Pattern is consistent across endpoints |
| Documentation | Yes | Volume task, Claude reviews for accuracy |
| Changelog generation | Yes | Templates from git log |
| Bug fixes | No | Needs reasoning, not generation |
| Architecture design | No | The whole point of using Claude |
| Security audit | No | Cannot trust cheap model for security |
Strategy 3: Minimize Context Bloat
Claude charges for input tokens too. Every message in the conversation history counts. Longer sessions = more tokens = higher cost.
Use CLAUDE.md aggressively
Put everything Claude needs in CLAUDE.md. Read once at startup, not repeated every turn:
## Project Conventions
- TypeScript strict mode
- Use Zod for validation
- Tests use Vitest
- Commit messages follow conventional commits
## Common Commands
- Build: npm run build
- Lint: npm run lint
- Test: npm test
- Type check: npm run typecheck
## Frequently Referenced Files
- src/types.ts — Core type definitions
- src/config.ts — App configuration
- src/lib/api.ts — API client
Every time you reference a convention in chat, Claude re-reads it. Put it in CLAUDE.md — it's read once. This saves 500-2,000 tokens per turn.
Compact long sessions
After 20+ turns, the conversation history alone can be 40,000 input tokens. Every new message costs $0.12 just for context. Use /compact to create a summary and start fresh.
Avoid reading unnecessary files
Claude reads files you reference. If you say "check src/components/" and there are 30 files, Claude reads all 30. Be specific: "check src/components/Button.tsx".
Strategy 4: Model Selection
Extended thinking — use sparingly
Extended thinking doubles or triples the cost per response. Enable it only for:
- Complex multi-step architecture decisions
- Bug investigations where standard reasoning failed
- Security audits requiring deep analysis
For everything else, standard mode is sufficient.
Haiku for quick tasks
Claude Haiku is 90% cheaper than Sonnet. For simple questions, code explanations, and quick lookups:
claude --model claude-haiku-4-20250514
Use Haiku for the "quick question" turns that don't need reasoning depth. Save Sonnet for the actual work.
Opus — almost never worth it
Opus costs 5x more than Sonnet. The quality difference is marginal for coding tasks. Use Sonnet. The cost difference adds up fast:
| Model | 10,000 output tokens | 50 turns/day | 20 days/month |
|---|---|---|---|
| Sonnet | $0.15 | $7.50 | $150 |
| Opus | $0.75 | $37.50 | $750 |
Strategy 5: Session Discipline
One session, one task
Don't keep a session alive across tasks. Context accumulates. Start fresh for each feature or bug. Use CLAUDE.md to preserve project conventions across sessions.
Close sessions when done
Idle sessions still hold context in memory. Claude Code doesn't charge for idle time, but if you come back to an old session, you're paying for the entire stale context on the next message. Close it.
Use the right tool for the job
Claude Code is a coding agent. For non-coding questions, use the Claude web interface or API directly — you don't need the codebase context or tool overhead.
# Instead of asking Claude Code "what's the GitLab API rate limit"
# Just use curl + the web interface
curl -s https://docs.gitlab.com/api/ | grep "rate limit"
Cost Tracking
Check session cost
# During a session, Claude shows token usage
# Look for the cost line in each response
# Or check usage programmatically
# Anthropic API dashboard: console.anthropic.com
Set budget alerts
In the Anthropic console, set usage alerts at $5, $10, and $25. You'll get an email before costs surprise you.
Monthly cost benchmark
| Usage Level | Typical Cost | Optimization Target |
|---|---|---|
| Light (1-2 hrs/day) | $20-50/mo | Haiku for quick tasks |
| Medium (4-6 hrs/day) | $80-150/mo | Shell scripts, task routing |
| Heavy (8+ hrs/day) | $200-400/mo | All strategies, consider OpenCode |
Quick Reference: Cost by Task
| Task | Recommended Model | Approx. Cost |
|---|---|---|
| Architecture design | Sonnet, extended | $0.05-0.15 |
| Bug investigation | Sonnet | $0.03-0.08 |
| Feature implementation | Sonnet | $0.05-0.20 |
| Test generation | DeepSeek via script | $0.001-0.005 |
| Documentation | DeepSeek via script | $0.002-0.01 |
| Code review | Sonnet | $0.02-0.05 |
| Quick question | Haiku | $0.001-0.005 |
| Refactoring plan | Sonnet, extended | $0.05-0.10 |
| Boilerplate/CRUD | DeepSeek via script | $0.001-0.003 |
Related Content
- Claude Code Patterns — Claude driving Gemini with Ralph loops
- Offload Bulk Generation to DeepSeek — Detailed setup for the DeepSeek helper script
- Multi-Model Workflows — Provider switching with OpenCode
- Claude Code Getting Started — Installation and first session
Related Articles
Prompt Engineering in Gemini CLI (Enterprise)
Enterprise prompt engineering for Gemini CLI. The Ralph loop, system instruction files, sandbox-aware prompting, Google extension patterns, and 1M context strategies for Vertex AI.
Antigravity CLI — AI Coding Agent for Everyone
Google's terminal-based AI coding agent. Replaces Gemini CLI for individual consumers. Go-native, multi-model, async workflows. Install with curl, authenticate with Google, start coding.
OpenCode — Getting Started
Install and configure OpenCode, the multi-provider AI coding agent. Covers installation, LLM provider setup, Zen/Go pricing, and first session walkthrough.