Zero-Shot vs Few-Shot Prompting Explained (2026)
How you prompt an LLM dramatically affects output quality. Zero-shot, few-shot, and chain-of-thought are the three foundational techniques. Here's when and how to use each.
The Three Techniques at a Glance
Zero-shot: "Classify this review as positive or negative: 'Great product!'"
→ Just ask. No examples. LLM uses its training knowledge.
Few-shot: "Here are examples:
'Love it!' → positive
'Terrible' → negative
'Waste of money' → negative
Now classify: 'Great product!'"
→ Show examples first. LLM follows the pattern.
Chain-of-thought: "Classify this review. Think step by step:
1. Identify key sentiment words
2. Consider context and tone
3. Make your classification"
→ Ask the LLM to reason through it.
Zero-Shot Prompting
Ask the model to perform a task with no examples:
Prompt: "Translate the following English text to French:
'The weather is beautiful today.'"
Output: "Le temps est magnifique aujourd'hui."
When Zero-Shot Works Well
- Common tasks: Translation, summarization, simple Q&A
- Clear instructions: The task is unambiguous
- Standard formats: Well-known output structures
- Modern models: GPT-4o and Claude handle most tasks zero-shot
When Zero-Shot Fails
❌ Ambiguous format:
"Extract entities from this text"
→ Model might return a list, a table, JSON, or prose
❌ Domain-specific:
"Classify this medical note"
→ Model doesn't know your classification categories
❌ Unusual patterns:
"Format this data for our CRM import"
→ Model doesn't know your CRM's format
Zero-Shot Tips
✅ Be specific about format:
"Extract entities and return as JSON with keys: name, type, confidence"
✅ Define categories explicitly:
"Classify as one of: [bug, feature, question, documentation]"
✅ Specify constraints:
"Summarize in exactly 3 bullet points, each under 15 words"
Few-Shot Prompting
Provide examples before asking the model to perform the task:
Prompt:
"Classify customer support tickets into categories.
Examples:
'My payment was charged twice' → billing
'The app crashes when I upload photos' → bug
'Can you add dark mode?' → feature-request
'How do I export my data?' → question
'The API returns 500 on large payloads' → bug
Now classify:
'I was charged $99 instead of $49' →"
Output: "billing"
Why Few-Shot Works Better
Zero-shot:
Model: "I'll classify based on what I think the categories should be"
Risk: Invents categories you don't want
Few-shot:
Model: "I see the pattern — these are the valid categories,
and this is how to distinguish between them"
Result: Consistent with your system
How Many Examples?
2-3 examples: Good for simple tasks (classification, formatting)
5-7 examples: Better for nuanced tasks (tone matching, edge cases)
10+ examples: Diminishing returns — consider fine-tuning instead
Rule: Cover each category at least once.
Include edge cases if they matter.
Few-Shot Best Practices
✅ Diverse examples (cover different categories):
positive, negative, neutral — not 3 positive examples
✅ Edge cases:
Include the tricky ones: sarcasm, mixed sentiment, ambiguous
✅ Consistent format:
Every example follows the exact same structure
✅ Representative examples:
Match the complexity of real inputs
❌ Don't use only easy examples:
"I love this!" → positive (too obvious)
"The product arrived late but works great" → ? (this is useful)
Few-Shot Example: Data Extraction
Prompt:
"Extract product information from descriptions.
Example 1:
Input: 'The Sony WH-1000XM5 wireless headphones offer 30-hour battery
life and retail for $349.99'
Output: {"product": "Sony WH-1000XM5", "category": "headphones",
"price": 349.99, "feature": "30-hour battery"}
Example 2:
Input: 'Apple's M3 MacBook Air starts at $1,099 with 8GB RAM
and 256GB storage'
Output: {"product": "MacBook Air M3", "category": "laptop",
"price": 1099, "feature": "8GB RAM, 256GB storage"}
Now extract:
Input: 'The Kindle Paperwhite Signature Edition with 32GB storage
is priced at $189.99 and features wireless charging'"
Output: {"product": "Kindle Paperwhite Signature Edition",
"category": "e-reader", "price": 189.99,
"feature": "32GB storage, wireless charging"}
Chain-of-Thought (CoT) Prompting
Ask the model to show its reasoning before answering:
Prompt:
"A store sells apples for $2 each and oranges for $3 each.
If someone buys 4 apples and 3 oranges, and pays with a $20 bill,
how much change do they receive?
Think step by step."
Output:
"Step 1: Calculate apple cost: 4 × $2 = $8
Step 2: Calculate orange cost: 3 × $3 = $9
Step 3: Calculate total: $8 + $9 = $17
Step 4: Calculate change: $20 - $17 = $3
The customer receives $3 in change."
When CoT Is Essential
✅ Math and logic problems
✅ Multi-step reasoning
✅ Complex analysis
✅ Tasks where accuracy > speed
✅ Problems the model often gets wrong zero-shot
❌ Not needed for:
Simple classification
Translation
Basic formatting
Tasks that don't require reasoning
CoT Variants
Basic CoT:
"Think step by step."
Structured CoT:
"Reason through this:
1. Identify the key variables
2. Set up the relationship
3. Solve
4. Verify your answer"
Self-consistency CoT:
Generate 3 CoT paths → take the majority answer
(more reliable for complex problems)
Combining Techniques
Few-Shot + Chain-of-Thought
The most powerful combination:
Prompt:
"Determine if these contracts have auto-renewal clauses.
Example 1:
Contract: 'This agreement shall continue for successive one-year
periods unless either party provides 30 days written notice.'
Reasoning: The phrase 'continue for successive periods' indicates
automatic renewal. The 'unless' clause provides an opt-out mechanism.
Answer: Yes, auto-renewal with 30-day notice to cancel.
Example 2:
Contract: 'This agreement terminates on December 31, 2026.
Any renewal requires a new signed agreement.'
Reasoning: 'Terminates on' with a fixed date indicates a clear
end. 'Requires a new signed agreement' explicitly prevents auto-renewal.
Answer: No auto-renewal.
Now analyze:
Contract: 'The initial term is 12 months. Upon expiration, this
agreement will renew for additional 12-month terms at the
then-current rate unless cancelled 60 days prior to renewal.'"
Choosing the Right Technique
| Situation | Technique | Why |
|---|---|---|
| Simple, common task | Zero-shot | Model already knows how |
| Custom format/categories | Few-shot | Model needs your pattern |
| Math or logic | Chain-of-thought | Reduces errors |
| Complex + custom | Few-shot + CoT | Maximum accuracy |
| Tone matching | Few-shot | Show desired voice |
| Data extraction | Few-shot | Define your schema |
| Analysis/reasoning | CoT | Forces structured thinking |
FAQ
Does few-shot use more tokens (cost more)?
Yes — examples add to input tokens. But the accuracy improvement usually saves money by reducing retries and errors. Use the cheapest model that works with your technique.
How do I know which technique to use?
Start with zero-shot. If the output isn't consistent or accurate, add examples (few-shot). If reasoning is wrong, add chain-of-thought. Iterate from simple to complex.
Do modern models still need few-shot?
Less than before — GPT-4o and Claude handle many tasks zero-shot. But for custom formats, unusual categories, or consistent output structure, few-shot still significantly improves results.
What about system prompts vs few-shot examples?
Use system prompts for role/behavior/constraints. Use few-shot for format/pattern/classification examples. They complement each other.
Bottom Line
Zero-shot for standard tasks the model handles well. Few-shot when you need consistent, custom-formatted output. Chain-of-thought for reasoning-heavy tasks. Combine few-shot + CoT for maximum accuracy on complex, custom tasks.
The best prompt engineers in 2026 don't use one technique — they match the technique to the task.