intermediate15 min read· Module 5, Lesson 4
🌊Streaming Responses
Stream responses token by token for real-time UIs — with TypeScript and Python examples
Streaming Responses
When you use the Claude API normally, you wait for the entire response to complete before displaying it. With streaming, you get the response word by word as it's generated — exactly like you see in ChatGPT or Claude.ai.
Why Streaming?
| Without Streaming | With Streaming |
|---|---|
| Wait 5-15 seconds for full response | See first word in under a second |
| Poor user experience | Smooth, interactive experience |
| Blank screen while waiting | Text appears progressively |
| Suitable for background tasks only | Perfect for user interfaces |
Streaming in TypeScript/JavaScript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
async function streamResponse() {
const stream = await client.messages.stream({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [
{ role: "user", content: "Write a short story about a programmer" }
],
});
// Receive each chunk as it's generated
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
process.stdout.write(event.delta.text);
}
}
// After stream completes
const finalMessage = await stream.finalMessage();
console.log("\n\nInput tokens:", finalMessage.usage.input_tokens);
console.log("Output tokens:", finalMessage.usage.output_tokens);
}
streamResponse();Streaming in Python
import anthropic
client = anthropic.Anthropic()
# Method 1: Using stream context manager
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Write a short story about a programmer"}
],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
print() # newline at the endStream Events
During streaming, you receive different types of events:
| Event | Description |
|---|---|
message_start | Message begins — includes model info |
content_block_start | New content block starts |
content_block_delta | New text chunk (this is what you display) |
content_block_stop | Content block ends |
message_delta | Message update (e.g., stop reason) |
message_stop | Message complete |
Building a Real-Time UI
With React
import { useState } from "react";
function ChatComponent() {
const [response, setResponse] = useState("");
const [isStreaming, setIsStreaming] = useState(false);
async function handleSend(message) {
setIsStreaming(true);
setResponse("");
const res = await fetch("/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ message }),
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const text = decoder.decode(value);
setResponse((prev) => prev + text);
}
setIsStreaming(false);
}
return (
<div>
<div className="response">
{response}
{isStreaming && <span className="cursor blinking">|</span>}
</div>
</div>
);
}Server-Sent Events (SSE) Backend
// Express.js endpoint
app.post("/api/chat", async (req, res) => {
// Set SSE headers
res.setHeader("Content-Type", "text/event-stream");
res.setHeader("Cache-Control", "no-cache");
res.setHeader("Connection", "keep-alive");
const stream = await client.messages.stream({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [{ role: "user", content: req.body.message }],
});
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
res.write(`data: ${JSON.stringify({ text: event.delta.text })}\n\n`);
}
}
res.write("data: [DONE]\n\n");
res.end();
});Python FastAPI SSE
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import anthropic
import json
app = FastAPI()
client = anthropic.Anthropic()
@app.post("/api/chat")
async def chat(request: dict):
async def generate():
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": request["message"]}],
) as stream:
for text in stream.text_stream:
yield f"data: {json.dumps({'text': text})}\n\n"
yield "data: [DONE]\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")Error Handling in Streams
try {
const stream = await client.messages.stream({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
messages: [{ role: "user", content: "Your question" }],
});
for await (const event of stream) {
if (
event.type === "content_block_delta" &&
event.delta.type === "text_delta"
) {
process.stdout.write(event.delta.text);
}
}
} catch (error) {
if (error.status === 429) {
console.error("Rate limited — wait and retry");
} else if (error.status === 529) {
console.error("Server overloaded — try later");
} else {
console.error("Error:", error.message);
}
}When to Use Streaming vs Regular Requests
Use streaming when:
- Building an interactive UI (chatbot, writing assistant)
- You want a smooth user experience
- The response is long and you want to display it immediately
Don't use streaming when:
- Processing in the background (batch processing)
- You need the complete response at once to process it
- Using structured outputs (JSON mode)
- Building pipelines where one step feeds the next
Performance Considerations
- Streaming adds minimal overhead
- Time to first token (TTFT) is typically under 1 second
- Total generation time is the same as non-streaming
- Network connection must stay open for the duration
Summary
- Streaming dramatically improves user experience
- Use
client.messages.stream()instead ofclient.messages.create() - Handle
content_block_deltaevents to get text - Combine with SSE for building real-time web interfaces
- Always handle errors and connection drops gracefully
Next: We'll learn error handling best practices for production applications.