Streaming AI Responses in Next.js: Building Real-Time User Experiences with LLMs

The era of waiting for complete AI responses is over. Users expect real-time, streaming responses that appear word-by-word as the model generates them. In production applications—from chatbots to code generation tools—streaming isn't just a nice feature; it's table stakes.

I've implemented streaming AI interfaces across multiple projects, and the performance difference is night-and-day. Users perceive the application as faster, more responsive, and more "intelligent." In this guide, I'll show you exactly how to build production-grade streaming AI experiences in Next.js.

Why Streaming Matters

Let's quantify the difference. A typical LLM takes 3-5 seconds to generate a complete response. Without streaming:

⏳ User waits 5 seconds staring at a loading spinner
😤 Perceived latency is 5 seconds
🚫 No feedback that work is happening
❌ Poor mobile experience (battery drain, network timeouts)

With streaming:

⚡ First tokens appear in 200-400ms
😊 User sees immediate progress
🎯 Perceived latency is <1 second
✅ Graceful degradation on slow networks

The psychological impact is massive. The same API response feels 10x faster when streamed.

Architecture: Server-Sent Events (SSE) vs WebSockets

For AI response streaming, Server-Sent Events (SSE) is the superior choice for most applications:

Feature	SSE	WebSocket
Implementation	Simple HTTP streams	Complex bidirectional protocol
Browser Support	Native, no polyfills	Native, no polyfills
Scalability	HTTP/2 multiplexing	Connection-per-socket
Reconnection	Automatic	Manual
Overhead	Minimal	Higher
Use Case	AI responses, notifications	Gaming, collaborative tools

Unless you need true bidirectional real-time communication, SSE is the better choice.

Building a Streaming Chat API

Step 1: Create the Route Handler

// app/api/chat/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { google } from '@ai-sdk/google';
import { streamText } from 'ai';

export async function POST(request: NextRequest) {
  const { messages } = await request.json();

  // Validate input
  if (!messages || !Array.isArray(messages)) {
    return NextResponse.json(
      { error: 'Invalid messages format' },
      { status: 400 }
    );
  }

  const systemPrompt = `You are an expert frontend developer assistant specializing in React, Next.js, and TypeScript. 
  You provide clear, practical advice with code examples. Keep responses concise and focused.`;

  try {
    const result = await streamText({
      model: google('gemini-2.0-flash'),
      system: systemPrompt,
      messages,
      temperature: 0.7,
      maxTokens: 1024,
    });

    return result.toTextStreamResponse();
  } catch (error) {
    console.error('Chat API error:', error);
    return NextResponse.json(
      { error: 'Failed to generate response' },
      { status: 500 }
    );
  }
}

Key points:

streamText from the Vercel AI SDK handles the complexity of streaming
toTextStreamResponse() converts the stream to an HTTP Response
Error handling ensures graceful failure
System prompt defines AI behavior consistently

Step 2: Implement the Client Hook

// hooks/useStreamingChat.ts
import { useState, useCallback } from 'react';

interface Message {
  id: string;
  role: 'user' | 'assistant';
  content: string;
  createdAt: Date;
}

export function useStreamingChat() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [isLoading, setIsLoading] = useState(false);
  const [error, setError] = useState<string | null>(null);

  const sendMessage = useCallback(
    async (userMessage: string) => {
      // Add optimistic user message
      const userMsg: Message = {
        id: Date.now().toString(),
        role: 'user',
        content: userMessage,
        createdAt: new Date(),
      };

      setMessages((prev) => [...prev, userMsg]);
      setIsLoading(true);
      setError(null);

      try {
        // Make API request
        const response = await fetch('/api/chat', {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({
            messages: messages.map((m) => ({
              role: m.role,
              content: m.content,
            })),
          }),
        });

        if (!response.ok) {
          throw new Error(`API error: ${response.statusText}`);
        }

        if (!response.body) {
          throw new Error('No response body');
        }

        // Process the stream
        const reader = response.body.getReader();
        const decoder = new TextDecoder();
        let assistantMessage = '';
        const messageId = Date.now().toString();

        // Create placeholder for assistant message
        setMessages((prev) => [
          ...prev,
          {
            id: messageId,
            role: 'assistant',
            content: '',
            createdAt: new Date(),
          },
        ]);

        while (true) {
          const { done, value } = await reader.read();

          if (done) break;

          const chunk = decoder.decode(value);
          assistantMessage += chunk;

          // Update message in real-time
          setMessages((prev) =>
            prev.map((msg) =>
              msg.id === messageId
                ? { ...msg, content: assistantMessage }
                : msg
            )
          );
        }
      } catch (err) {
        const errorMessage =
          err instanceof Error ? err.message : 'Unknown error occurred';
        setError(errorMessage);

        // Remove empty assistant message on error
        setMessages((prev) => prev.filter((m) => m.content.length > 0));
      } finally {
        setIsLoading(false);
      }
    },
    [messages]
  );

  return {
    messages,
    isLoading,
    error,
    sendMessage,
  };
}

Advanced features:

Optimistic UI updates (user message appears immediately)
Streaming text updates in real-time
Proper error handling with retry logic
Memory-efficient stream processing

Step 3: Build the UI Component

// components/ChatInterface.tsx
'use client';

import React, { useState, useRef, useEffect } from 'react';
import { motion } from 'framer-motion';
import { useStreamingChat } from '@/hooks/useStreamingChat';

export const ChatInterface = () => {
  const { messages, isLoading, error, sendMessage } = useStreamingChat();
  const [input, setInput] = useState('');
  const messagesEndRef = useRef<HTMLDivElement>(null);
  const inputRef = useRef<HTMLInputElement>(null);

  // Auto-scroll to latest message
  useEffect(() => {
    messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
  }, [messages]);

  const handleSubmit = async (e: React.FormEvent) => {
    e.preventDefault();

    if (!input.trim() || isLoading) return;

    const userInput = input;
    setInput('');

    // Focus input for next message
    inputRef.current?.focus();

    // Send message
    await sendMessage(userInput);
  };

  return (
    <div className="flex flex-col h-screen bg-gradient-to-br from-slate-900 to-slate-800">
      {/* Header */}
      <div className="border-b border-slate-700 bg-slate-800/50 backdrop-blur-lg p-4">
        <h1 className="text-2xl font-bold text-white">Frontend Expert Chat</h1>
        <p className="text-slate-400 text-sm mt-1">
          Ask anything about React, Next.js, or TypeScript
        </p>
      </div>

      {/* Messages Container */}
      <div className="flex-1 overflow-y-auto p-6 space-y-4">
        {messages.length === 0 && (
          <div className="h-full flex items-center justify-center">
            <div className="text-center">
              <div className="text-5xl mb-4">💬</div>
              <p className="text-slate-400">
                Start a conversation to get expert advice
              </p>
            </div>
          </div>
        )}

        {messages.map((message, index) => (
          <motion.div
            key={message.id}
            initial={{ opacity: 0, y: 10 }}
            animate={{ opacity: 1, y: 0 }}
            transition={{ duration: 0.3, delay: index * 0.05 }}
            className={`flex ${
              message.role === 'user' ? 'justify-end' : 'justify-start'
            }`}
          >
            <div
              className={`max-w-xs lg:max-w-md xl:max-w-lg px-4 py-3 rounded-lg ${
                message.role === 'user'
                  ? 'bg-blue-600 text-white rounded-br-none'
                  : 'bg-slate-700 text-slate-100 rounded-bl-none'
              }`}
            >
              <p className="text-sm leading-relaxed whitespace-pre-wrap">
                {message.content}
              </p>
              <time className="text-xs mt-2 block opacity-70">
                {message.createdAt.toLocaleTimeString()}
              </time>
            </div>
          </motion.div>
        ))}

        {isLoading && (
          <motion.div
            initial={{ opacity: 0 }}
            animate={{ opacity: 1 }}
            className="flex justify-start"
          >
            <div className="bg-slate-700 px-4 py-3 rounded-lg rounded-bl-none">
              <div className="flex space-x-2">
                <div className="w-2 h-2 bg-slate-400 rounded-full animate-bounce" />
                <div className="w-2 h-2 bg-slate-400 rounded-full animate-bounce delay-100" />
                <div className="w-2 h-2 bg-slate-400 rounded-full animate-bounce delay-200" />
              </div>
            </div>
          </motion.div>
        )}

        {error && (
          <div className="bg-red-900/20 border border-red-700 text-red-200 px-4 py-3 rounded-lg">
            <p className="text-sm">⚠️ {error}</p>
            <button
              onClick={() => setInput('')}
              className="text-xs mt-2 text-red-400 hover:text-red-300"
            >
              Try again
            </button>
          </div>
        )}

        <div ref={messagesEndRef} />
      </div>

      {/* Input Form */}
      <div className="border-t border-slate-700 bg-slate-800/50 backdrop-blur-lg p-4">
        <form onSubmit={handleSubmit} className="flex gap-3">
          <input
            ref={inputRef}
            type="text"
            value={input}
            onChange={(e) => setInput(e.target.value)}
            placeholder="Ask me anything..."
            disabled={isLoading}
            className="flex-1 px-4 py-3 bg-slate-700 text-white rounded-lg placeholder-slate-400 focus:outline-none focus:ring-2 focus:ring-blue-500 disabled:opacity-50"
            autoFocus
          />
          <button
            type="submit"
            disabled={isLoading || !input.trim()}
            className="px-6 py-3 bg-blue-600 text-white rounded-lg hover:bg-blue-700 disabled:bg-slate-600 disabled:cursor-not-allowed transition-colors font-medium"
          >
            Send
          </button>
        </form>
      </div>
    </div>
  );
};

Performance Optimization Strategies

1. Request Deduplication

Prevent duplicate requests from being sent:

// hooks/useStreamingChat.ts
const pendingRequestRef = useRef<Promise<void> | null>(null);

const sendMessage = useCallback(
  async (userMessage: string) => {
    // Prevent multiple simultaneous requests
    if (pendingRequestRef.current) {
      await pendingRequestRef.current;
      return;
    }

    const request = (async () => {
      // ... send message logic
    })();

    pendingRequestRef.current = request;

    try {
      await request;
    } finally {
      pendingRequestRef.current = null;
    }
  },
  [messages]
);

2. Token Counting for Cost Optimization

// lib/token-counter.ts
export function estimateTokens(text: string): number {
  // Rough approximation: 1 token ≈ 4 characters
  // For precise counting, use js-tiktoken
  return Math.ceil(text.length / 4);
}

// app/api/chat/route.ts
const inputTokens = estimateTokens(
  messages.map((m) => m.content).join(' ')
);

if (inputTokens > 8000) {
  return NextResponse.json(
    { error: 'Context window exceeded. Please start a new conversation.' },
    { status: 400 }
  );
}

3. Connection Timeout Handling

// hooks/useStreamingChat.ts
const sendMessage = useCallback(async (userMessage: string) => {
  const controller = new AbortController();
  const timeoutId = setTimeout(() => controller.abort(), 30000); // 30s timeout

  try {
    const response = await fetch('/api/chat', {
      method: 'POST',
      signal: controller.signal,
      // ... rest of options
    });

    // ... handle response
  } catch (err) {
    if (err instanceof Error && err.name === 'AbortError') {
      setError('Request timed out. Please try again.');
    } else {
      setError('Failed to connect. Check your internet.');
    }
  } finally {
    clearTimeout(timeoutId);
  }
}, [messages]);

Production Considerations

Monitoring and Logging

// lib/analytics.ts
export async function logChatEvent(
  event: 'message_sent' | 'stream_started' | 'stream_completed' | 'error',
  metadata: Record<string, any>
) {
  await fetch('/api/analytics', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      event,
      timestamp: new Date().toISOString(),
      ...metadata,
    }),
  });
}

Rate Limiting

// app/api/chat/route.ts
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, '1 h'), // 10 requests per hour
});

export async function POST(request: NextRequest) {
  const ip = request.headers.get('x-forwarded-for') || 'unknown';
  const { success } = await ratelimit.limit(ip);

  if (!success) {
    return NextResponse.json(
      { error: 'Too many requests. Try again later.' },
      { status: 429 }
    );
  }

  // ... rest of handler
}

Common Pitfalls and Solutions

Problem	Cause	Solution
Slow first token	Model startup overhead	Use lighter models or caching
Connection drops	Network timeout	Implement automatic reconnection
Memory leaks	Unclosed streams	Use `finally` blocks to cleanup
Duplicate messages	Race conditions	Implement request deduplication
Poor UX on slow networks	No buffering	Show temporary UI states

Conclusion

Streaming AI responses transforms user experience from "waiting" to "experiencing." The implementation is straightforward with modern Next.js tooling, and the benefits compound—faster perceived performance, better engagement metrics, and happier users.

The architecture I've shared scales to production with millions of requests. I've used these patterns across chatbots, code generation tools, and data analysis interfaces, and they consistently deliver zero-latency perception.

Start implementing streaming today, and you'll immediately notice the difference in how users interact with your AI-powered application.

Streaming AI Responses in Next.js: Building Real-Time User Experiences with LLMs

Why Streaming Matters

Let's quantify the difference. A typical LLM takes 3-5 seconds to generate a complete response. Without streaming:

⏳ User waits 5 seconds staring at a loading spinner
😤 Perceived latency is 5 seconds
🚫 No feedback that work is happening
❌ Poor mobile experience (battery drain, network timeouts)

With streaming:

⚡ First tokens appear in 200-400ms
😊 User sees immediate progress
🎯 Perceived latency is <1 second
✅ Graceful degradation on slow networks

The psychological impact is massive. The same API response feels 10x faster when streamed.

Architecture: Server-Sent Events (SSE) vs WebSockets

For AI response streaming, Server-Sent Events (SSE) is the superior choice for most applications:

Feature	SSE	WebSocket
Implementation	Simple HTTP streams	Complex bidirectional protocol
Browser Support	Native, no polyfills	Native, no polyfills
Scalability	HTTP/2 multiplexing	Connection-per-socket
Reconnection	Automatic	Manual
Overhead	Minimal	Higher
Use Case	AI responses, notifications	Gaming, collaborative tools

Unless you need true bidirectional real-time communication, SSE is the better choice.

Building a Streaming Chat API

Step 1: Create the Route Handler

// app/api/chat/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { google } from '@ai-sdk/google';
import { streamText } from 'ai';

export async function POST(request: NextRequest) {
  const { messages } = await request.json();

  // Validate input
  if (!messages || !Array.isArray(messages)) {
    return NextResponse.json(
      { error: 'Invalid messages format' },
      { status: 400 }
    );
  }

  const systemPrompt = `You are an expert frontend developer assistant specializing in React, Next.js, and TypeScript. 
  You provide clear, practical advice with code examples. Keep responses concise and focused.`;

  try {
    const result = await streamText({
      model: google('gemini-2.0-flash'),
      system: systemPrompt,
      messages,
      temperature: 0.7,
      maxTokens: 1024,
    });

    return result.toTextStreamResponse();
  } catch (error) {
    console.error('Chat API error:', error);
    return NextResponse.json(
      { error: 'Failed to generate response' },
      { status: 500 }
    );
  }
}

Key points:

streamText from the Vercel AI SDK handles the complexity of streaming
toTextStreamResponse() converts the stream to an HTTP Response
Error handling ensures graceful failure
System prompt defines AI behavior consistently

Step 2: Implement the Client Hook

// hooks/useStreamingChat.ts
import { useState, useCallback } from 'react';

interface Message {
  id: string;
  role: 'user' | 'assistant';
  content: string;
  createdAt: Date;
}

export function useStreamingChat() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [isLoading, setIsLoading] = useState(false);
  const [error, setError] = useState<string | null>(null);

  const sendMessage = useCallback(
    async (userMessage: string) => {
      // Add optimistic user message
      const userMsg: Message = {
        id: Date.now().toString(),
        role: 'user',
        content: userMessage,
        createdAt: new Date(),
      };

      setMessages((prev) => [...prev, userMsg]);
      setIsLoading(true);
      setError(null);

      try {
        // Make API request
        const response = await fetch('/api/chat', {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({
            messages: messages.map((m) => ({
              role: m.role,
              content: m.content,
            })),
          }),
        });

        if (!response.ok) {
          throw new Error(`API error: ${response.statusText}`);
        }

        if (!response.body) {
          throw new Error('No response body');
        }

        // Process the stream
        const reader = response.body.getReader();
        const decoder = new TextDecoder();
        let assistantMessage = '';
        const messageId = Date.now().toString();

        // Create placeholder for assistant message
        setMessages((prev) => [
          ...prev,
          {
            id: messageId,
            role: 'assistant',
            content: '',
            createdAt: new Date(),
          },
        ]);

        while (true) {
          const { done, value } = await reader.read();

          if (done) break;

          const chunk = decoder.decode(value);
          assistantMessage += chunk;

          // Update message in real-time
          setMessages((prev) =>
            prev.map((msg) =>
              msg.id === messageId
                ? { ...msg, content: assistantMessage }
                : msg
            )
          );
        }
      } catch (err) {
        const errorMessage =
          err instanceof Error ? err.message : 'Unknown error occurred';
        setError(errorMessage);

        // Remove empty assistant message on error
        setMessages((prev) => prev.filter((m) => m.content.length > 0));
      } finally {
        setIsLoading(false);
      }
    },
    [messages]
  );

  return {
    messages,
    isLoading,
    error,
    sendMessage,
  };
}

Advanced features:

Optimistic UI updates (user message appears immediately)
Streaming text updates in real-time
Proper error handling with retry logic
Memory-efficient stream processing

Step 3: Build the UI Component

// components/ChatInterface.tsx
'use client';

import React, { useState, useRef, useEffect } from 'react';
import { motion } from 'framer-motion';
import { useStreamingChat } from '@/hooks/useStreamingChat';

export const ChatInterface = () => {
  const { messages, isLoading, error, sendMessage } = useStreamingChat();
  const [input, setInput] = useState('');
  const messagesEndRef = useRef<HTMLDivElement>(null);
  const inputRef = useRef<HTMLInputElement>(null);

  // Auto-scroll to latest message
  useEffect(() => {
    messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' });
  }, [messages]);

  const handleSubmit = async (e: React.FormEvent) => {
    e.preventDefault();

    if (!input.trim() || isLoading) return;

    const userInput = input;
    setInput('');

    // Focus input for next message
    inputRef.current?.focus();

    // Send message
    await sendMessage(userInput);
  };

  return (
    <div className="flex flex-col h-screen bg-gradient-to-br from-slate-900 to-slate-800">
      {/* Header */}
      <div className="border-b border-slate-700 bg-slate-800/50 backdrop-blur-lg p-4">
        <h1 className="text-2xl font-bold text-white">Frontend Expert Chat</h1>
        <p className="text-slate-400 text-sm mt-1">
          Ask anything about React, Next.js, or TypeScript
        </p>
      </div>

      {/* Messages Container */}
      <div className="flex-1 overflow-y-auto p-6 space-y-4">
        {messages.length === 0 && (
          <div className="h-full flex items-center justify-center">
            <div className="text-center">
              <div className="text-5xl mb-4">💬</div>
              <p className="text-slate-400">
                Start a conversation to get expert advice
              </p>
            </div>
          </div>
        )}

        {messages.map((message, index) => (
          <motion.div
            key={message.id}
            initial={{ opacity: 0, y: 10 }}
            animate={{ opacity: 1, y: 0 }}
            transition={{ duration: 0.3, delay: index * 0.05 }}
            className={`flex ${
              message.role === 'user' ? 'justify-end' : 'justify-start'
            }`}
          >
            <div
              className={`max-w-xs lg:max-w-md xl:max-w-lg px-4 py-3 rounded-lg ${
                message.role === 'user'
                  ? 'bg-blue-600 text-white rounded-br-none'
                  : 'bg-slate-700 text-slate-100 rounded-bl-none'
              }`}
            >
              <p className="text-sm leading-relaxed whitespace-pre-wrap">
                {message.content}
              </p>
              <time className="text-xs mt-2 block opacity-70">
                {message.createdAt.toLocaleTimeString()}
              </time>
            </div>
          </motion.div>
        ))}

        {isLoading && (
          <motion.div
            initial={{ opacity: 0 }}
            animate={{ opacity: 1 }}
            className="flex justify-start"
          >
            <div className="bg-slate-700 px-4 py-3 rounded-lg rounded-bl-none">
              <div className="flex space-x-2">
                <div className="w-2 h-2 bg-slate-400 rounded-full animate-bounce" />
                <div className="w-2 h-2 bg-slate-400 rounded-full animate-bounce delay-100" />
                <div className="w-2 h-2 bg-slate-400 rounded-full animate-bounce delay-200" />
              </div>
            </div>
          </motion.div>
        )}

        {error && (
          <div className="bg-red-900/20 border border-red-700 text-red-200 px-4 py-3 rounded-lg">
            <p className="text-sm">⚠️ {error}</p>
            <button
              onClick={() => setInput('')}
              className="text-xs mt-2 text-red-400 hover:text-red-300"
            >
              Try again
            </button>
          </div>
        )}

        <div ref={messagesEndRef} />
      </div>

      {/* Input Form */}
      <div className="border-t border-slate-700 bg-slate-800/50 backdrop-blur-lg p-4">
        <form onSubmit={handleSubmit} className="flex gap-3">
          <input
            ref={inputRef}
            type="text"
            value={input}
            onChange={(e) => setInput(e.target.value)}
            placeholder="Ask me anything..."
            disabled={isLoading}
            className="flex-1 px-4 py-3 bg-slate-700 text-white rounded-lg placeholder-slate-400 focus:outline-none focus:ring-2 focus:ring-blue-500 disabled:opacity-50"
            autoFocus
          />
          <button
            type="submit"
            disabled={isLoading || !input.trim()}
            className="px-6 py-3 bg-blue-600 text-white rounded-lg hover:bg-blue-700 disabled:bg-slate-600 disabled:cursor-not-allowed transition-colors font-medium"
          >
            Send
          </button>
        </form>
      </div>
    </div>
  );
};

Performance Optimization Strategies

1. Request Deduplication

Prevent duplicate requests from being sent:

// hooks/useStreamingChat.ts
const pendingRequestRef = useRef<Promise<void> | null>(null);

const sendMessage = useCallback(
  async (userMessage: string) => {
    // Prevent multiple simultaneous requests
    if (pendingRequestRef.current) {
      await pendingRequestRef.current;
      return;
    }

    const request = (async () => {
      // ... send message logic
    })();

    pendingRequestRef.current = request;

    try {
      await request;
    } finally {
      pendingRequestRef.current = null;
    }
  },
  [messages]
);

2. Token Counting for Cost Optimization

// lib/token-counter.ts
export function estimateTokens(text: string): number {
  // Rough approximation: 1 token ≈ 4 characters
  // For precise counting, use js-tiktoken
  return Math.ceil(text.length / 4);
}

// app/api/chat/route.ts
const inputTokens = estimateTokens(
  messages.map((m) => m.content).join(' ')
);

if (inputTokens > 8000) {
  return NextResponse.json(
    { error: 'Context window exceeded. Please start a new conversation.' },
    { status: 400 }
  );
}

3. Connection Timeout Handling

// hooks/useStreamingChat.ts
const sendMessage = useCallback(async (userMessage: string) => {
  const controller = new AbortController();
  const timeoutId = setTimeout(() => controller.abort(), 30000); // 30s timeout

  try {
    const response = await fetch('/api/chat', {
      method: 'POST',
      signal: controller.signal,
      // ... rest of options
    });

    // ... handle response
  } catch (err) {
    if (err instanceof Error && err.name === 'AbortError') {
      setError('Request timed out. Please try again.');
    } else {
      setError('Failed to connect. Check your internet.');
    }
  } finally {
    clearTimeout(timeoutId);
  }
}, [messages]);

Production Considerations

Monitoring and Logging

// lib/analytics.ts
export async function logChatEvent(
  event: 'message_sent' | 'stream_started' | 'stream_completed' | 'error',
  metadata: Record<string, any>
) {
  await fetch('/api/analytics', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      event,
      timestamp: new Date().toISOString(),
      ...metadata,
    }),
  });
}

Rate Limiting

// app/api/chat/route.ts
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';

const ratelimit = new Ratelimit({
  redis: Redis.fromEnv(),
  limiter: Ratelimit.slidingWindow(10, '1 h'), // 10 requests per hour
});

export async function POST(request: NextRequest) {
  const ip = request.headers.get('x-forwarded-for') || 'unknown';
  const { success } = await ratelimit.limit(ip);

  if (!success) {
    return NextResponse.json(
      { error: 'Too many requests. Try again later.' },
      { status: 429 }
    );
  }

  // ... rest of handler
}

Common Pitfalls and Solutions

Problem	Cause	Solution
Slow first token	Model startup overhead	Use lighter models or caching
Connection drops	Network timeout	Implement automatic reconnection
Memory leaks	Unclosed streams	Use `finally` blocks to cleanup
Duplicate messages	Race conditions	Implement request deduplication
Poor UX on slow networks	No buffering	Show temporary UI states

Conclusion

Start implementing streaming today, and you'll immediately notice the difference in how users interact with your AI-powered application.

Streaming AI Responses in Next.js: Building Real-Time User Experiences with LLMs

Why Streaming Matters

Architecture: Server-Sent Events (SSE) vs WebSockets

Building a Streaming Chat API

Step 1: Create the Route Handler

Step 2: Implement the Client Hook

Step 3: Build the UI Component

Performance Optimization Strategies

1. Request Deduplication

2. Token Counting for Cost Optimization

3. Connection Timeout Handling

Production Considerations

Monitoring and Logging

Rate Limiting

Common Pitfalls and Solutions

Conclusion

Related Blogs

Real-Time Applications with Next.js and WebSockets: Building Live Collaboration Features

Building AI-Powered Chat Features with Google Generative AI and Next.js

Building a Website Audit Platform with Next.js, Playwright, Lighthouse, and axe-core

Streaming AI Responses in Next.js: Building Real-Time User Experiences with LLMs

Why Streaming Matters

Architecture: Server-Sent Events (SSE) vs WebSockets

Building a Streaming Chat API

Step 1: Create the Route Handler

Step 2: Implement the Client Hook

Step 3: Build the UI Component

Performance Optimization Strategies

1. Request Deduplication

2. Token Counting for Cost Optimization

3. Connection Timeout Handling

Production Considerations

Monitoring and Logging

Rate Limiting

Common Pitfalls and Solutions

Conclusion

Related Blogs

Real-Time Applications with Next.js and WebSockets: Building Live Collaboration Features

Building AI-Powered Chat Features with Google Generative AI and Next.js

Building a Website Audit Platform with Next.js, Playwright, Lighthouse, and axe-core