🌊Streaming Responses

Stream responses token by token for real-time UIs — with TypeScript and Python examples

Streaming Responses

When you use the Claude API normally, you wait for the entire response to complete before displaying it. With streaming, you get the response word by word as it's generated — exactly like you see in ChatGPT or Claude.ai.

Why Streaming?

Without Streaming	With Streaming
Wait 5-15 seconds for full response	See first word in under a second
Poor user experience	Smooth, interactive experience
Blank screen while waiting	Text appears progressively
Suitable for background tasks only	Perfect for user interfaces

Streaming in TypeScript/JavaScript

JavaScript
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function streamResponse() {
  const stream = await client.messages.stream({
    model: "claude-sonnet-4-20250514",
    max_tokens: 1024,
    messages: [
      { role: "user", content: "Write a short story about a programmer" }
    ],
  });

  // Receive each chunk as it's generated
  for await (const event of stream) {
    if (
      event.type === "content_block_delta" &&
      event.delta.type === "text_delta"
    ) {
      process.stdout.write(event.delta.text);
    }
  }

  // After stream completes
  const finalMessage = await stream.finalMessage();
  console.log("\n\nInput tokens:", finalMessage.usage.input_tokens);
  console.log("Output tokens:", finalMessage.usage.output_tokens);
}

streamResponse();

Streaming in Python

Python
import anthropic

client = anthropic.Anthropic()

# Method 1: Using stream context manager
with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Write a short story about a programmer"}
    ],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

print()  # newline at the end

Stream Events

During streaming, you receive different types of events:

Event	Description
`message_start`	Message begins — includes model info
`content_block_start`	New content block starts
`content_block_delta`	New text chunk (this is what you display)
`content_block_stop`	Content block ends
`message_delta`	Message update (e.g., stop reason)
`message_stop`	Message complete

Building a Real-Time UI

With React

JavaScript
import { useState } from "react";

function ChatComponent() {
  const [response, setResponse] = useState("");
  const [isStreaming, setIsStreaming] = useState(false);

  async function handleSend(message) {
    setIsStreaming(true);
    setResponse("");

    const res = await fetch("/api/chat", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ message }),
    });

    const reader = res.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;

      const text = decoder.decode(value);
      setResponse((prev) => prev + text);
    }

    setIsStreaming(false);
  }

  return (
    <div>
      <div className="response">
        {response}
        {isStreaming && <span className="cursor blinking">|</span>}
      </div>
    </div>
  );
}

Server-Sent Events (SSE) Backend

JavaScript
// Express.js endpoint
app.post("/api/chat", async (req, res) => {
  // Set SSE headers
  res.setHeader("Content-Type", "text/event-stream");
  res.setHeader("Cache-Control", "no-cache");
  res.setHeader("Connection", "keep-alive");

  const stream = await client.messages.stream({
    model: "claude-sonnet-4-20250514",
    max_tokens: 1024,
    messages: [{ role: "user", content: req.body.message }],
  });

  for await (const event of stream) {
    if (
      event.type === "content_block_delta" &&
      event.delta.type === "text_delta"
    ) {
      res.write(`data: ${JSON.stringify({ text: event.delta.text })}\n\n`);
    }
  }

  res.write("data: [DONE]\n\n");
  res.end();
});

Python FastAPI SSE

Python
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import anthropic
import json

app = FastAPI()
client = anthropic.Anthropic()

@app.post("/api/chat")
async def chat(request: dict):
    async def generate():
        with client.messages.stream(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[{"role": "user", "content": request["message"]}],
        ) as stream:
            for text in stream.text_stream:
                yield f"data: {json.dumps({'text': text})}\n\n"
        yield "data: [DONE]\n\n"

    return StreamingResponse(generate(), media_type="text/event-stream")

Error Handling in Streams

JavaScript
try {
  const stream = await client.messages.stream({
    model: "claude-sonnet-4-20250514",
    max_tokens: 1024,
    messages: [{ role: "user", content: "Your question" }],
  });

  for await (const event of stream) {
    if (
      event.type === "content_block_delta" &&
      event.delta.type === "text_delta"
    ) {
      process.stdout.write(event.delta.text);
    }
  }
} catch (error) {
  if (error.status === 429) {
    console.error("Rate limited — wait and retry");
  } else if (error.status === 529) {
    console.error("Server overloaded — try later");
  } else {
    console.error("Error:", error.message);
  }
}

When to Use Streaming vs Regular Requests

Use streaming when:

Building an interactive UI (chatbot, writing assistant)
You want a smooth user experience
The response is long and you want to display it immediately

Don't use streaming when:

Processing in the background (batch processing)
You need the complete response at once to process it
Using structured outputs (JSON mode)
Building pipelines where one step feeds the next

Performance Considerations

Streaming adds minimal overhead
Time to first token (TTFT) is typically under 1 second
Total generation time is the same as non-streaming
Network connection must stay open for the duration

Summary

Streaming dramatically improves user experience
Use client.messages.stream() instead of client.messages.create()
Handle content_block_delta events to get text
Combine with SSE for building real-time web interfaces
Always handle errors and connection drops gracefully

Next: We'll learn error handling best practices for production applications.

Module 5

4/7

✍️ Text Generation Mastery

Error Handling & Best Practices 🛡️

4/7