The Example Spectrum

Every prompt exists on a spectrum: zero-shot (no examples), few-shot (some examples), many-shot (many examples). The right choice depends on task complexity, model capability, and your token budget.

Zero-Shot Prompting

The model performs a task with no examples — only instructions. This works because modern LLMs are instruction-tuned on massive datasets.

Classify the text as positive, negative, or neutral.

Text: "The delivery was late but the product works fine."
Sentiment:

When zero-shot works:

Classification with clear labels
Summarization and rewriting
Simple extraction (dates, names, emails)
Translation between common languages
Instruction-following models (GPT-4, Claude 3.5, Gemini 1.5 Pro)

When zero-shot fails:

Tasks requiring specific output formatting
Domain-specific terminology the model hasn't seen
Nuanced categorization with subtle distinctions
Complex reasoning with multiple steps

Few-Shot Prompting

Provide 1-10 demonstrations in the prompt. The model learns the task pattern from examples — this is in-context learning.

Classify the text as positive, negative, or neutral.

Text: "Absolutely loved it, best purchase ever."
Sentiment: positive

Text: "Complete waste of money, broke after two days."
Sentiment: negative

Text: "It arrived on time, haven't tried it yet though."
Sentiment: neutral

Text: "Not what I expected but it's growing on me."
Sentiment:

Key findings from research (Min et al. 2022):

The format matters more than label accuracy. Random labels in the right format outperform no examples.
Input distribution matching matters — use examples from the same domain as your target.
Label distribution matching helps too — if your real data is 70% positive, make your examples ~70% positive.

How many examples?

Shots	Best For	Token Cost
1-shot	Simple format tasks, when model already knows the domain	Minimal
3-shot	Moderate classification, structured extraction	Low
5-shot	Complex labeling, edge case coverage	Medium
10-shot	Nuanced reasoning, multi-step tasks	High

Many-Shot Prompting

With context windows now reaching 200K+ tokens, you can include 50-100+ examples. This was impractical a year ago but is now viable with prompt caching.

When many-shot beats few-shot:

Highly specialized classification with many edge cases
Low-resource languages where the model needs extensive grounding
Complex multi-step reasoning where demonstrations build on each other
When the model consistently fails on specific edge cases

The diminishing returns curve:

0 → 1 example: largest accuracy jump
1 → 5 examples: strong improvement
5 → 20 examples: moderate improvement
20 → 100+ examples: marginal improvement, high token cost

Making many-shot affordable:

Prompt caching (Anthropic) gives 90% discount on cache hits. Put static examples first, dynamic input last.
Use shorter examples — strip verbose descriptions, keep only input/output pairs.
Use GPT-4o-mini or Claude Haiku for classification tasks where many-shot helps.

Decision Framework

Is this a classification/extraction task with clear labels?
  ├─ Yes → Try zero-shot first. Add 3-shot if accuracy < 90%.
  └─ No → Is this complex reasoning?
           ├─ Yes → Use CoT or ToT instead of example-based prompting.
           └─ No → Is this a format-dependent output task?
                    ├─ Yes → Use 1-3 examples showing exact format.
                    └─ No → Is this a niche domain?
                             ├─ Yes → Use 5-10 domain-specific examples.
                             └─ No → Try zero-shot with detailed instructions.

Provider Behavior

Model	Zero-Shot Strength	Few-Shot Notes
GPT-4o	Excellent	Needs format consistency in examples
Claude 3.5 Sonnet	Very good	Excels with structured formatting examples
Gemini 1.5 Pro	Good	Prefers more examples for nuanced tasks
GPT-4o-mini	Moderate	Often needs 3+ examples for reliability
Claude Haiku	Moderate	Benefits from clear format demonstrations

Token Cost Comparison

Typical cost for a classification task (100 input tokens, 10 output tokens):

Strategy	Tokens In	Cost per Call (GPT-4o)	Relative
Zero-shot	100	$0.00025	1x
3-shot	400	$0.001	4x
10-shot	1,100	$0.00275	11x
50-shot	5,100	$0.01275	51x

With prompt caching (cached examples), the 3-shot cost drops to ~$0.0004 — nearly zero-shot pricing.

Few-shot vs Zero-shot: Choosing the Right Strategy

The Example Spectrum

Zero-Shot Prompting

Few-Shot Prompting

Many-Shot Prompting

Decision Framework

Provider Behavior

Token Cost Comparison

Related Articles

RAG Patterns: Retrieval-Augmented Generation

Claude Long Document Strategies: Structuring 100K+ Token Prompts

DeepSeek 1M Context Window: Strategies & Caching

On this page