Zero-Shot vs Few-Shot Prompting Explained (2026)

How you prompt an LLM dramatically affects output quality. Zero-shot, few-shot, and chain-of-thought are the three foundational techniques. Here's when and how to use each.

The Three Techniques at a Glance

Zero-shot:  "Classify this review as positive or negative: 'Great product!'"
            → Just ask. No examples. LLM uses its training knowledge.

Few-shot:   "Here are examples:
             'Love it!' → positive
             'Terrible' → negative
             'Waste of money' → negative
             Now classify: 'Great product!'"
            → Show examples first. LLM follows the pattern.

Chain-of-thought: "Classify this review. Think step by step:
                   1. Identify key sentiment words
                   2. Consider context and tone
                   3. Make your classification"
            → Ask the LLM to reason through it.

Zero-Shot Prompting

Ask the model to perform a task with no examples:

Prompt: "Translate the following English text to French:
'The weather is beautiful today.'"

Output: "Le temps est magnifique aujourd'hui."

When Zero-Shot Works Well

Common tasks: Translation, summarization, simple Q&A
Clear instructions: The task is unambiguous
Standard formats: Well-known output structures
Modern models: GPT-4o and Claude handle most tasks zero-shot

When Zero-Shot Fails

❌ Ambiguous format:
  "Extract entities from this text"
  → Model might return a list, a table, JSON, or prose

❌ Domain-specific:
  "Classify this medical note"
  → Model doesn't know your classification categories

❌ Unusual patterns:
  "Format this data for our CRM import"
  → Model doesn't know your CRM's format

Zero-Shot Tips

✅ Be specific about format:
  "Extract entities and return as JSON with keys: name, type, confidence"

✅ Define categories explicitly:
  "Classify as one of: [bug, feature, question, documentation]"

✅ Specify constraints:
  "Summarize in exactly 3 bullet points, each under 15 words"

Few-Shot Prompting

Provide examples before asking the model to perform the task:

Prompt:
"Classify customer support tickets into categories.

Examples:
'My payment was charged twice' → billing
'The app crashes when I upload photos' → bug
'Can you add dark mode?' → feature-request
'How do I export my data?' → question
'The API returns 500 on large payloads' → bug

Now classify:
'I was charged $99 instead of $49' →"

Output: "billing"

Why Few-Shot Works Better

Zero-shot:
  Model: "I'll classify based on what I think the categories should be"
  Risk: Invents categories you don't want

Few-shot:
  Model: "I see the pattern — these are the valid categories,
          and this is how to distinguish between them"
  Result: Consistent with your system

How Many Examples?

2-3 examples:  Good for simple tasks (classification, formatting)
5-7 examples:  Better for nuanced tasks (tone matching, edge cases)
10+ examples:  Diminishing returns — consider fine-tuning instead

Rule: Cover each category at least once.
Include edge cases if they matter.

Few-Shot Best Practices

✅ Diverse examples (cover different categories):
  positive, negative, neutral — not 3 positive examples

✅ Edge cases:
  Include the tricky ones: sarcasm, mixed sentiment, ambiguous

✅ Consistent format:
  Every example follows the exact same structure

✅ Representative examples:
  Match the complexity of real inputs

❌ Don't use only easy examples:
  "I love this!" → positive  (too obvious)
  "The product arrived late but works great" → ? (this is useful)

Few-Shot Example: Data Extraction

Prompt:
"Extract product information from descriptions.

Example 1:
Input: 'The Sony WH-1000XM5 wireless headphones offer 30-hour battery
life and retail for $349.99'
Output: {"product": "Sony WH-1000XM5", "category": "headphones",
"price": 349.99, "feature": "30-hour battery"}

Example 2:
Input: 'Apple's M3 MacBook Air starts at $1,099 with 8GB RAM
and 256GB storage'
Output: {"product": "MacBook Air M3", "category": "laptop",
"price": 1099, "feature": "8GB RAM, 256GB storage"}

Now extract:
Input: 'The Kindle Paperwhite Signature Edition with 32GB storage
is priced at $189.99 and features wireless charging'"

Output: {"product": "Kindle Paperwhite Signature Edition",
"category": "e-reader", "price": 189.99,
"feature": "32GB storage, wireless charging"}

Chain-of-Thought (CoT) Prompting

Ask the model to show its reasoning before answering:

Prompt:
"A store sells apples for $2 each and oranges for $3 each.
If someone buys 4 apples and 3 oranges, and pays with a $20 bill,
how much change do they receive?

Think step by step."

Output:
"Step 1: Calculate apple cost: 4 × $2 = $8
Step 2: Calculate orange cost: 3 × $3 = $9
Step 3: Calculate total: $8 + $9 = $17
Step 4: Calculate change: $20 - $17 = $3

The customer receives $3 in change."

When CoT Is Essential

✅ Math and logic problems
✅ Multi-step reasoning
✅ Complex analysis
✅ Tasks where accuracy > speed
✅ Problems the model often gets wrong zero-shot

❌ Not needed for:
  Simple classification
  Translation
  Basic formatting
  Tasks that don't require reasoning

CoT Variants

Basic CoT:
  "Think step by step."

Structured CoT:
  "Reason through this:
   1. Identify the key variables
   2. Set up the relationship
   3. Solve
   4. Verify your answer"

Self-consistency CoT:
  Generate 3 CoT paths → take the majority answer
  (more reliable for complex problems)

Combining Techniques

Few-Shot + Chain-of-Thought

The most powerful combination:

Prompt:
"Determine if these contracts have auto-renewal clauses.

Example 1:
Contract: 'This agreement shall continue for successive one-year
periods unless either party provides 30 days written notice.'
Reasoning: The phrase 'continue for successive periods' indicates
automatic renewal. The 'unless' clause provides an opt-out mechanism.
Answer: Yes, auto-renewal with 30-day notice to cancel.

Example 2:
Contract: 'This agreement terminates on December 31, 2026.
Any renewal requires a new signed agreement.'
Reasoning: 'Terminates on' with a fixed date indicates a clear
end. 'Requires a new signed agreement' explicitly prevents auto-renewal.
Answer: No auto-renewal.

Now analyze:
Contract: 'The initial term is 12 months. Upon expiration, this
agreement will renew for additional 12-month terms at the
then-current rate unless cancelled 60 days prior to renewal.'"

Choosing the Right Technique

Situation	Technique	Why
Simple, common task	Zero-shot	Model already knows how
Custom format/categories	Few-shot	Model needs your pattern
Math or logic	Chain-of-thought	Reduces errors
Complex + custom	Few-shot + CoT	Maximum accuracy
Tone matching	Few-shot	Show desired voice
Data extraction	Few-shot	Define your schema
Analysis/reasoning	CoT	Forces structured thinking

FAQ

Does few-shot use more tokens (cost more)?

Yes — examples add to input tokens. But the accuracy improvement usually saves money by reducing retries and errors. Use the cheapest model that works with your technique.

How do I know which technique to use?

Start with zero-shot. If the output isn't consistent or accurate, add examples (few-shot). If reasoning is wrong, add chain-of-thought. Iterate from simple to complex.

Do modern models still need few-shot?

Less than before — GPT-4o and Claude handle many tasks zero-shot. But for custom formats, unusual categories, or consistent output structure, few-shot still significantly improves results.

What about system prompts vs few-shot examples?

Use system prompts for role/behavior/constraints. Use few-shot for format/pattern/classification examples. They complement each other.

Bottom Line

Zero-shot for standard tasks the model handles well. Few-shot when you need consistent, custom-formatted output. Chain-of-thought for reasoning-heavy tasks. Combine few-shot + CoT for maximum accuracy on complex, custom tasks.

The best prompt engineers in 2026 don't use one technique — they match the technique to the task.