HomeCore API SkillsStreaming Responses
intermediate15 min read· Module 5, Lesson 4

🌊Streaming Responses

Stream responses token by token for real-time UIs — with TypeScript and Python examples

Streaming Responses

When you use the Claude API normally, you wait for the entire response to complete before displaying it. With streaming, you get the response word by word as it's generated — exactly like you see in ChatGPT or Claude.ai.

Why Streaming?

Without StreamingWith Streaming
Wait 5-15 seconds for full responseSee first word in under a second
Poor user experienceSmooth, interactive experience
Blank screen while waitingText appears progressively
Suitable for background tasks onlyPerfect for user interfaces

Streaming in TypeScript/JavaScript

JavaScript
import Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic(); async function streamResponse() { const stream = await client.messages.stream({ model: "claude-sonnet-4-20250514", max_tokens: 1024, messages: [ { role: "user", content: "Write a short story about a programmer" } ], }); // Receive each chunk as it's generated for await (const event of stream) { if ( event.type === "content_block_delta" && event.delta.type === "text_delta" ) { process.stdout.write(event.delta.text); } } // After stream completes const finalMessage = await stream.finalMessage(); console.log("\n\nInput tokens:", finalMessage.usage.input_tokens); console.log("Output tokens:", finalMessage.usage.output_tokens); } streamResponse();

Streaming in Python

Python
import anthropic client = anthropic.Anthropic() # Method 1: Using stream context manager with client.messages.stream( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[ {"role": "user", "content": "Write a short story about a programmer"} ], ) as stream: for text in stream.text_stream: print(text, end="", flush=True) print() # newline at the end

Stream Events

During streaming, you receive different types of events:

EventDescription
message_startMessage begins — includes model info
content_block_startNew content block starts
content_block_deltaNew text chunk (this is what you display)
content_block_stopContent block ends
message_deltaMessage update (e.g., stop reason)
message_stopMessage complete

Building a Real-Time UI

With React

JavaScript
import { useState } from "react"; function ChatComponent() { const [response, setResponse] = useState(""); const [isStreaming, setIsStreaming] = useState(false); async function handleSend(message) { setIsStreaming(true); setResponse(""); const res = await fetch("/api/chat", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ message }), }); const reader = res.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const text = decoder.decode(value); setResponse((prev) => prev + text); } setIsStreaming(false); } return ( <div> <div className="response"> {response} {isStreaming && <span className="cursor blinking">|</span>} </div> </div> ); }

Server-Sent Events (SSE) Backend

JavaScript
// Express.js endpoint app.post("/api/chat", async (req, res) => { // Set SSE headers res.setHeader("Content-Type", "text/event-stream"); res.setHeader("Cache-Control", "no-cache"); res.setHeader("Connection", "keep-alive"); const stream = await client.messages.stream({ model: "claude-sonnet-4-20250514", max_tokens: 1024, messages: [{ role: "user", content: req.body.message }], }); for await (const event of stream) { if ( event.type === "content_block_delta" && event.delta.type === "text_delta" ) { res.write(`data: ${JSON.stringify({ text: event.delta.text })}\n\n`); } } res.write("data: [DONE]\n\n"); res.end(); });

Python FastAPI SSE

Python
from fastapi import FastAPI from fastapi.responses import StreamingResponse import anthropic import json app = FastAPI() client = anthropic.Anthropic() @app.post("/api/chat") async def chat(request: dict): async def generate(): with client.messages.stream( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": request["message"]}], ) as stream: for text in stream.text_stream: yield f"data: {json.dumps({'text': text})}\n\n" yield "data: [DONE]\n\n" return StreamingResponse(generate(), media_type="text/event-stream")

Error Handling in Streams

JavaScript
try { const stream = await client.messages.stream({ model: "claude-sonnet-4-20250514", max_tokens: 1024, messages: [{ role: "user", content: "Your question" }], }); for await (const event of stream) { if ( event.type === "content_block_delta" && event.delta.type === "text_delta" ) { process.stdout.write(event.delta.text); } } } catch (error) { if (error.status === 429) { console.error("Rate limited — wait and retry"); } else if (error.status === 529) { console.error("Server overloaded — try later"); } else { console.error("Error:", error.message); } }

When to Use Streaming vs Regular Requests

Use streaming when:

  • Building an interactive UI (chatbot, writing assistant)
  • You want a smooth user experience
  • The response is long and you want to display it immediately

Don't use streaming when:

  • Processing in the background (batch processing)
  • You need the complete response at once to process it
  • Using structured outputs (JSON mode)
  • Building pipelines where one step feeds the next

Performance Considerations

  • Streaming adds minimal overhead
  • Time to first token (TTFT) is typically under 1 second
  • Total generation time is the same as non-streaming
  • Network connection must stay open for the duration

Summary

  • Streaming dramatically improves user experience
  • Use client.messages.stream() instead of client.messages.create()
  • Handle content_block_delta events to get text
  • Combine with SSE for building real-time web interfaces
  • Always handle errors and connection drops gracefully

Next: We'll learn error handling best practices for production applications.