How to Build an AI-Powered Recommendation Engine (2026)
Recommendation engines drive 35% of Amazon's revenue and 80% of Netflix viewing. In 2026, you can build one in a weekend using embeddings and vector search — no ML PhD required. Here's how.
The Three Approaches
1. Collaborative Filtering: "Users like you also liked..."
Based on: User behavior patterns
Example: Spotify's "Discover Weekly"
2. Content-Based: "Because you liked X, here's similar Y..."
Based on: Item attributes/features
Example: Netflix genre matching
3. Embedding-Based (2026 approach): "These items are close in meaning..."
Based on: AI-generated semantic understanding
Example: "You bought a camping tent" → recommends hiking boots
(not because others bought both, but because AI understands
camping and hiking are related activities)
The Modern Stack
2020 approach:
TensorFlow → train model for weeks → deploy on GPU servers → $$$
2026 approach:
OpenAI/Voyage embeddings → store in vector DB → query similar items
Build in a weekend. Deploy on serverless. Costs $5-50/month.
Architecture
┌─────────────┐ ┌──────────────┐ ┌────────────┐
│ Your App │────▶│ Embeddings │────▶│ Vector DB │
│ │ │ (OpenAI) │ │ (Pinecone) │
│ User views │ │ │ │ │
│ product │ │ Convert to │ │ Find │
│ │◀────│ vectors │◀────│ nearest │
│ Show recs │ │ │ │ neighbors │
└─────────────┘ └──────────────┘ └────────────┘
Step 1: Generate Embeddings
Convert your items into vectors (numbers that capture meaning):
import OpenAI from 'openai';
const openai = new OpenAI();
async function getEmbedding(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: text,
});
return response.data[0].embedding;
}
// Create embedding for a product
const product = {
name: 'Lightweight Hiking Backpack',
description: '40L ultralight backpack for multi-day hikes.
Waterproof, ventilated back panel, trekking pole attachments.',
category: 'Outdoor Gear',
tags: ['hiking', 'backpacking', 'ultralight'],
};
const embedding = await getEmbedding(
`${product.name}. ${product.description}. ${product.tags.join(', ')}`
);
// Returns: [0.023, -0.041, 0.087, ...] (1536 numbers)
Step 2: Store in Vector Database
import { Pinecone } from '@pinecone-database/pinecone';
const pinecone = new Pinecone();
const index = pinecone.index('products');
// Store all products with their embeddings
async function indexProducts(products: Product[]) {
const vectors = await Promise.all(
products.map(async (product) => ({
id: product.id,
values: await getEmbedding(
`${product.name}. ${product.description}. ${product.tags.join(', ')}`
),
metadata: {
name: product.name,
category: product.category,
price: product.price,
},
}))
);
await index.upsert(vectors);
}
Vector DB Options
| Database | Type | Free Tier | Price |
|---|---|---|---|
| Pinecone | Managed | 100K vectors | $25/mo+ |
| Qdrant | Self-host/cloud | Self-host free | $25/mo+ |
| Weaviate | Self-host/cloud | Self-host free | $25/mo+ |
| Turso | SQLite + vectors | 9GB free | $29/mo |
| Supabase pgvector | Postgres extension | 500MB free | $25/mo+ |
Step 3: Query for Recommendations
// "Similar items" recommendations
async function getRecommendations(productId: string, limit = 5) {
// Get the product's embedding
const product = await getProduct(productId);
const embedding = await getEmbedding(
`${product.name}. ${product.description}`
);
// Find similar products
const results = await index.query({
vector: embedding,
topK: limit + 1, // +1 to exclude self
includeMetadata: true,
filter: { id: { $ne: productId } }, // Exclude the product itself
});
return results.matches.map((match) => ({
id: match.id,
name: match.metadata.name,
score: match.score, // 0-1, higher = more similar
}));
}
// Usage
const recs = await getRecommendations('hiking-backpack-001');
// Returns:
// [
// { name: "Trail Running Vest", score: 0.92 },
// { name: "Trekking Poles", score: 0.89 },
// { name: "Hiking Boots", score: 0.87 },
// { name: "Water Filter", score: 0.85 },
// { name: "Camping Hammock", score: 0.82 },
// ]
Step 4: Add User Personalization
User Profile Embeddings
// Build a user preference vector from their behavior
async function getUserPreferenceEmbedding(userId: string) {
const history = await getUserHistory(userId);
// Get items they've viewed, purchased, or liked
// Combine recent items into a preference description
const preferenceText = history
.slice(-10) // Last 10 interactions
.map((item) => `${item.name}: ${item.description}`)
.join('. ');
return getEmbedding(
`User interested in: ${preferenceText}`
);
}
// Personalized recommendations
async function getPersonalizedRecs(userId: string, limit = 10) {
const userEmbedding = await getUserPreferenceEmbedding(userId);
const results = await index.query({
vector: userEmbedding,
topK: limit,
includeMetadata: true,
filter: {
id: { $nin: await getUserPurchasedIds(userId) } // Exclude already purchased
},
});
return results.matches;
}
Step 5: Hybrid Approach
Combine content similarity with collaborative signals:
async function getHybridRecommendations(userId: string, productId: string) {
// Content-based: similar to this product
const contentRecs = await getRecommendations(productId, 10);
// Personalized: based on user history
const personalRecs = await getPersonalizedRecs(userId, 10);
// Collaborative: what similar users bought
const collabRecs = await getCollaborativeRecs(userId, 10);
// Merge and rank (weighted scoring)
const scores = new Map();
contentRecs.forEach((r) => {
scores.set(r.id, (scores.get(r.id) || 0) + r.score * 0.4);
});
personalRecs.forEach((r) => {
scores.set(r.id, (scores.get(r.id) || 0) + r.score * 0.35);
});
collabRecs.forEach((r) => {
scores.set(r.id, (scores.get(r.id) || 0) + r.score * 0.25);
});
return [...scores.entries()]
.sort(([, a], [, b]) => b - a)
.slice(0, 8);
}
Cost Estimate
For a store with 10,000 products and 5,000 daily users:
Embeddings (one-time for catalog):
10,000 products × $0.00002/embedding = $0.20
Embeddings (user queries):
5,000 queries/day × $0.00002 = $0.10/day = $3/mo
Vector database:
Pinecone starter: $0 (under 100K vectors)
Or Supabase: $25/mo
Total: $3-28/month for Netflix-quality recommendations
FAQ
How is this different from "customers also bought"?
"Customers also bought" uses purchase correlation (collaborative filtering). Embedding-based recommendations understand semantic meaning — they can recommend hiking boots for a tent buyer even if no one has bought both yet.
How many items do I need for this to work?
Embedding-based recommendations work with as few as 50 items. Collaborative filtering needs thousands of user interactions. That's why embeddings are better for smaller catalogs.
How do I handle cold start (new users)?
Start with content-based recommendations (similar to what they're viewing). As they interact more, blend in personalization. After 5-10 interactions, personalized recommendations become useful.
Should I use OpenAI or train my own model?
Use OpenAI/Voyage embeddings to start — they work surprisingly well for most use cases. Only train custom models if you have domain-specific needs and 100K+ training examples.
Bottom Line
Use OpenAI embeddings + Pinecone or Supabase pgvector for a recommendation engine that works in a weekend. Start with content-based similarity, add user personalization as you collect data, then blend with collaborative filtering for the best results.
The recommendation engines of 2026 aren't about complex ML pipelines — they're about embeddings, vector search, and smart scoring.