HomeAdvanced FeaturesPrompt Caching — Save Money
advanced12 min read· Module 6, Lesson 6

💾Prompt Caching — Save Money

Cache repeated prompts and save up to 90% on input token costs

Prompt Caching — Save Money

Prompt caching lets you save up to 90% on input token costs when you repeat the same context across multiple requests.

How It Works

Normally, every API request processes all input tokens from scratch. With caching:

  1. First request: You send content with cache_control — it gets cached (25% surcharge)
  2. Subsequent requests: Cached content is retrieved instantly at a 90% discount

When Caching Helps

  • Same system instructions in every request (e.g., chatbot)
  • Large document you ask multiple questions about
  • Fixed context + varying questions
  • Few-shot examples shared between requests

Practical Example

JavaScript
import Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic(); const systemPrompt = "You are a customer support assistant for TechStore. Rules: ..."; // Long instructions async function chat(userMessage) { const response = await client.messages.create({ model: "claude-sonnet-4-20250514", max_tokens: 1024, system: [ { type: "text", text: systemPrompt, cache_control: { type: "ephemeral" }, // Cache this content }, ], messages: [{ role: "user", content: userMessage }], }); console.log("Cache created:", response.usage.cache_creation_input_tokens || 0); console.log("Cache read:", response.usage.cache_read_input_tokens || 0); return response.content[0].text; } // First request — creates cache (25% surcharge) await chat("How much are the AirPods?"); // Subsequent requests — reads from cache (90% discount!) await chat("What's the return policy?");

Python Example

Python
import anthropic client = anthropic.Anthropic() system_prompt = "You are a customer support assistant..." response = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, system=[ { "type": "text", "text": system_prompt, "cache_control": {"type": "ephemeral"}, } ], messages=[{"role": "user", "content": "What's the return policy?"}], ) print(f"Cache creation: {response.usage.cache_creation_input_tokens}") print(f"Cache read: {response.usage.cache_read_input_tokens}")

Calculating Savings

Without cache:

  • System instructions: 2,000 tokens x $3.00/million = $0.006 per request
  • 1,000 requests/day = $6.00/day

With cache:

  • First request: 2,000 tokens x $3.75/million = $0.0075 (25% surcharge)
  • Remaining requests: 2,000 tokens x $0.30/million = $0.0006 each
  • 999 requests x $0.0006 = $0.60
  • Total: $0.61/day (90% savings!)

Cache Pricing Summary

RegularCache WriteCache Read
Opus 4$15.00/M$18.75/M (+25%)$1.50/M (-90%)
Sonnet 4$3.00/M$3.75/M (+25%)$0.30/M (-90%)
Haiku 3.5$0.80/M$1.00/M (+25%)$0.08/M (-90%)

Caching a Document for Multiple Questions

JavaScript
const longDocument = fs.readFileSync("contract.txt", "utf-8"); const response = await client.messages.create({ model: "claude-sonnet-4-20250514", max_tokens: 1024, messages: [ { role: "user", content: [ { type: "text", text: longDocument, cache_control: { type: "ephemeral" }, }, { type: "text", text: "What are the termination clauses?", }, ], }, ], });

Important Rules

  1. Minimum size: Cached content must be at least 1,024 tokens (Sonnet/Opus) or 2,048 tokens (Haiku)
  2. Exact match: The text must match character-for-character — any change invalidates the cache
  3. Duration: 5 minutes, resets with each use
  4. Order matters: Put fixed (cached) content first, variable content last

Summary

  • Prompt caching saves up to 90% on input costs
  • Use cache_control: { type: "ephemeral" } on fixed content
  • Cache lasts 5 minutes and resets with each use
  • Perfect for system prompts, large documents, and recurring examples

Next: We'll learn reusable prompt templates and patterns.