Building AI-Powered Features in Web Apps: LLMs, Embeddings, and Fine-Tuning
Complete guide to integrating AI into web applications — LLMs, RAG, embeddings, fine-tuning, and production considerations.
Building AI-Powered Features in Web Apps: LLMs, Embeddings, and Fine-Tuning
AI is no longer experimental. It's production-ready, accessible, and economical. Companies are shipping AI features to millions of users. The question isn't whether to use AI, but how to use it effectively.
The challenge: AI systems are probabilistic, not deterministic. They can be expensive, slow, hallucinate, and violate privacy. Building production-grade AI features requires understanding these constraints and designing systems around them.
I've shipped multiple AI features across different applications. Let me share the patterns that work.
Architecture Patterns
Pattern 1: Basic Chat with Context Injection
// app/api/chat/route.ts
import { Anthropic } from '@anthropic-ai/sdk';
const client = new Anthropic();
export async function POST(request: Request) {
const { messages, context } = await request.json();
// System prompt with context (RAG-like)
const systemPrompt = `You are a helpful assistant specialized in React development.
User Profile Context:
- Experience Level: ${context.userLevel}
- Focus Areas: ${context.focusAreas.join(', ')}
Answer questions concisely with code examples when relevant.`;
try {
const response = await client.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
system: systemPrompt,
messages: messages.map((msg) => ({
role: msg.role as 'user' | 'assistant',
content: msg.content,
})),
});
return Response.json({
content: response.content[0].type === 'text' ? response.content[0].text : '',
stopReason: response.stop_reason,
});
} catch (error) {
console.error('Chat error:', error);
return Response.json({ error: 'Failed to generate response' }, { status: 500 });
}
}
Pattern 2: Retrieval-Augmented Generation (RAG)
RAG combines LLMs with custom knowledge bases for accurate, relevant answers:
// lib/embeddings.ts
import { Pinecone } from '@pinecone-database/pinecone';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
const pc = new Pinecone({
apiKey: process.env.PINECONE_API_KEY,
});
const embeddings = new OpenAIEmbeddings({
apiKey: process.env.OPENAI_API_KEY,
});
export async function searchDocuments(query: string) {
// 1. Embed the query
const queryEmbedding = await embeddings.embedQuery(query);
// 2. Search knowledge base
const index = pc.Index('documentation');
const results = await index.query({
vector: queryEmbedding,
topK: 5,
includeMetadata: true,
});
// 3. Return matching documents
return results.matches.map((match) => ({
id: match.id,
score: match.score,
content: match.metadata?.content || '',
source: match.metadata?.source || '',
}));
}
// app/api/rag-chat/route.ts
import { searchDocuments } from '@/lib/embeddings';
import { Anthropic } from '@anthropic-ai/sdk';
export async function POST(request: Request) {
const { query } = await request.json();
// 1. Search documents
const relevantDocs = await searchDocuments(query);
// 2. Format context
const context = relevantDocs
.map(
(doc) => `Source: ${doc.source} (relevance: ${(doc.score * 100).toFixed(0)}%)
${doc.content}`
)
.join('\n\n---\n\n');
// 3. Ask LLM with context
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
system: `You are a helpful assistant. Answer based on the provided context.
If context doesn't contain relevant information, say so.
Context:
${context}`,
messages: [{ role: 'user', content: query }],
});
return Response.json({
answer: response.content[0].type === 'text' ? response.content[0].text : '',
sources: relevantDocs.map((doc) => doc.source),
});
}
Pattern 3: Vision/Image Analysis
// app/api/analyze-image/route.ts
import { Anthropic } from '@anthropic-ai/sdk';
import { readFile } from 'node:fs/promises';
export async function POST(request: Request) {
const formData = await request.formData();
const file = formData.get('image') as File;
if (!file) {
return Response.json({ error: 'No image provided' }, { status: 400 });
}
// Convert image to base64
const buffer = await file.arrayBuffer();
const base64 = Buffer.from(buffer).toString('base64');
const client = new Anthropic();
const response = await client.messages.create({
model: 'claude-3-5-sonnet-20241022',
max_tokens: 1024,
messages: [
{
role: 'user',
content: [
{
type: 'image',
source: {
type: 'base64',
media_type: 'image/jpeg',
data: base64,
},
},
{
type: 'text',
text: 'Analyze this image and describe what you see. Focus on: 1) Main objects, 2) Colors and composition, 3) Any text visible, 4) Overall mood/context',
},
],
},
],
});
return Response.json({
analysis: response.content[0].type === 'text' ? response.content[0].text : '',
});
}
Advanced: Fine-Tuning Custom Models
For specialized tasks, fine-tune a smaller model on your data:
// scripts/prepare-training-data.ts
interface TrainingExample {
input: string;
output: string;
}
const trainingData: TrainingExample[] = [
{ input: 'how to use useState?', output: 'useState is a React hook that lets you add state to functional components...' },
{ input: 'what is context API?', output: 'Context API allows you to manage global state without prop drilling...' },
// ... 100+ examples
];
// Format for OpenAI fine-tuning
const jsonlContent = trainingData
.map((example) =>
JSON.stringify({
messages: [
{ role: 'user', content: example.input },
{ role: 'assistant', content: example.output },
],
})
)
.join('\n');
// Upload and start fine-tuning
import { OpenAI } from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
// Save JSONL file
await Bun.write('training-data.jsonl', jsonlContent);
// Upload file
const uploadedFile = await openai.files.create({
file: new File(['training-data.jsonl'], 'training.jsonl'),
purpose: 'fine-tune',
});
// Start fine-tuning job
const fineTuneJob = await openai.fineTuning.jobs.create({
training_file: uploadedFile.id,
model: 'gpt-3.5-turbo',
suffix: 'react-expert', // Model name will be gpt-3.5-turbo-react-expert
});
console.log('Fine-tuning started:', fineTuneJob.id);
Cost Optimization
AI can get expensive fast. Optimize:
// 1. Cache responses for identical queries
const responseCache = new Map<string, string>();
export async function getChatResponse(query: string) {
if (responseCache.has(query)) {
return responseCache.get(query);
}
const response = await generateResponse(query);
responseCache.set(query, response);
return response;
}
// 2. Use cheaper models when possible
const model = complexQuery
? 'claude-3-5-sonnet-20241022' // More expensive, more capable
: 'claude-3-haiku-20240307'; // Cheaper, faster
// 3. Batch requests
async function batchProcess(items: string[]) {
// Process in groups to reduce API calls
const batches = chunk(items, 10);
const results = await Promise.all(batches.map(processBatch));
return results.flat();
}
// 4. Implement rate limiting
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
const ratelimit = new Ratelimit({
redis: Redis.fromEnv(),
limiter: Ratelimit.slidingWindow(10, '1 h'), // 10 requests/hour
});
export async function POST(request: Request) {
const ip = request.headers.get('x-forwarded-for') || 'unknown';
const { success } = await ratelimit.limit(ip);
if (!success) {
return Response.json({ error: 'Rate limited' }, { status: 429 });
}
// Process request
}
Handling Hallucinations and Failures
LLMs can confidently say wrong things. Mitigate:
export async function generateWithValidation(prompt: string) {
const response = await generateResponse(prompt);
// 1. Check for confidence signals
if (response.includes('I don\'t know') || response.includes('uncertain')) {
return { answer: response, confidence: 'low' };
}
// 2. Validate against known facts
const factCheck = await validateFacts(response);
if (!factCheck.isValid) {
return {
answer: response,
warning: `Potential inaccuracy detected: ${factCheck.issues.join(',')}`,
};
}
// 3. Ask for sources
const followUp = await client.messages.create({
model: 'claude-3-5-sonnet-20241022',
messages: [
{ role: 'user', content: prompt },
{ role: 'assistant', content: response },
{ role: 'user', content: 'Where did you get this information? Provide sources.' },
],
});
return {
answer: response,
sources: followUp.content[0].type === 'text' ? followUp.content[0].text : null,
};
}
Privacy and Security
// 1. Never send PII to external APIs
function sanitizeInput(input: string): string {
// Remove email addresses, phone numbers, etc.
return input
.replace(/[\w.-]+@[\w.-]+\.\w+/g, '[EMAIL]')
.replace(/\(\d{3}\)\s?\d{3}-\d{4}/g, '[PHONE]');
}
// 2. Use local models for sensitive data
import { Ollama } from 'ollama';
const ollama = new Ollama({ baseUrl: 'http://localhost:11434' });
async function processConfidentialData(data: string) {
// Runs locally, no data leaves your server
const response = await ollama.generate({
model: 'mistral', // Open source model
prompt: data,
stream: false,
});
return response.response;
}
// 3. Implement audit logging
async function logAIRequest(
userID: string,
input: string,
output: string,
model: string
) {
await db.logs.create({
userId: userID,
inputHash: hashValue(input), // Don't store raw input to disk
outputHash: hashValue(output),
model,
timestamp: new Date(),
});
}
Production Monitoring
// app/api/insights/route.ts
import { recordMetric } from '@/lib/monitoring';
export async function POST(request: Request) {
const startTime = Date.now();
try {
const response = await generateResponse();
const duration = Date.now() - startTime;
await recordMetric('ai_request_success', {
duration,
model: 'claude-3',
tokens: response.usage.output_tokens,
});
return Response.json({ answer: response });
} catch (error) {
const duration = Date.now() - startTime;
await recordMetric('ai_request_error', {
duration,
error: error.message,
model: 'claude-3',
});
return Response.json({ error: error.message }, { status: 500 });
}
}
Common Pitfalls
| Problem | Solution |
|---|---|
| Unreliable outputs | Use few-shot prompting + validation |
| Long latency | Cache responses, use streaming |
| High costs | Use cheaper models, implement caching, batch requests |
| Hallucinations | Validate against known facts, ask for sources |
| Privacy breaches | Use local models or sanitize input |
Conclusion
AI features are now a core part of modern web applications. The key is understanding the tradeoffs: cost vs quality, speed vs accuracy, complexity vs maintainability.
Start simple—add a chatbot or content summarizer. Monitor costs and latency. Then gradually add more sophisticated AI features as you learn your users' needs.
The companies that win won't be those with the best AI, but those that integrate AI most thoughtfully into their products.