HomeCore API SkillsPDFs, Files & Multi-Modal Inputs
intermediate12 min read· Module 5, Lesson 7

📄PDFs, Files & Multi-Modal Inputs

Send PDFs, images, and files to Claude for analysis and extraction

PDFs, Files & Multi-Modal Inputs

Overview

Claude can process much more than plain text. You can send PDFs, images, and other files directly to Claude for analysis, extraction, and understanding. This opens up powerful use cases like invoice processing, contract analysis, resume parsing, and more.


PDF Support in Claude

Claude can handle PDFs in two ways:

1. Text-Based PDF Processing

Claude extracts and reads the text content from PDFs. This works for digitally-created PDFs with selectable text.

2. Visual PDF Processing

Claude can also "see" PDFs as rendered pages -- understanding layouts, tables, charts, headers, footers, and visual formatting. This works for scanned documents, image-based PDFs, and complex layouts.

FeatureText PDFsVisual/Scanned PDFs
Text extractionExcellentGood (OCR-like)
Table parsingGoodExcellent
Layout understandingLimitedExcellent
Chart/graph readingNot possibleGood
SpeedFastModerate
CostLowerHigher (image tokens)

Sending PDFs via the API

Method 1: Base64 Encoding

The most common approach is to encode the PDF as base64 and send it in the message content.

Python:

Python
client = anthropic.Anthropic() # Read and encode the PDF file with open("invoice.pdf", "rb") as f: pdf_data = base64.standard_b64encode(f.read()).decode("utf-8") message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, messages=[ { "role": "user", "content": [ { "type": "document", "source": { "type": "base64", "media_type": "application/pdf", "data": pdf_data } }, { "type": "text", "text": "Please extract all line items from this invoice and return them as JSON." } ] } ] ) print(message.content[0].text)

TypeScript:

TypeScript
const client = new Anthropic(); // Read and encode the PDF file const pdfBuffer = fs.readFileSync("invoice.pdf"); const pdfData = pdfBuffer.toString("base64"); const message = await client.messages.create({ model: "claude-sonnet-4-20250514", max_tokens: 4096, messages: [ { role: "user", content: [ { type: "document", source: { type: "base64", media_type: "application/pdf", data: pdfData } }, { type: "text", text: "Please extract all line items from this invoice and return them as JSON." } ] } ] }); console.log(message.content[0].text);

Method 2: URL Reference

You can also reference a publicly accessible PDF by URL.

Python:

Python
client = anthropic.Anthropic() message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, messages=[ { "role": "user", "content": [ { "type": "document", "source": { "type": "url", "url": "https://example.com/report.pdf" } }, { "type": "text", "text": "Summarize the key findings in this report." } ] } ] ) print(message.content[0].text)

TypeScript:

TypeScript
const client = new Anthropic(); const message = await client.messages.create({ model: "claude-sonnet-4-20250514", max_tokens: 4096, messages: [ { role: "user", content: [ { type: "document", source: { type: "url", url: "https://example.com/report.pdf" } }, { type: "text", text: "Summarize the key findings in this report." } ] } ] }); console.log(message.content[0].text);

The Files API: Upload Once, Reference by ID

For files you use repeatedly, the Files API lets you upload once and reference by ID -- saving bandwidth and improving performance.

Step 1: Upload the File

Python:

Python
client = anthropic.Anthropic() # Upload the file once with open("contract.pdf", "rb") as f: uploaded_file = client.files.create( file=f, purpose="vision" ) print(f"File ID: {uploaded_file.id}") # Save this ID for future use: file-abc123

TypeScript:

TypeScript
const client = new Anthropic(); const file = await client.files.create({ file: fs.createReadStream("contract.pdf"), purpose: "vision" }); console.log("File ID:", file.id); // Save this ID for future use: file-abc123

Step 2: Reference by ID in Messages

Python:

Python
message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, messages=[ { "role": "user", "content": [ { "type": "document", "source": { "type": "file", "file_id": "file-abc123" } }, { "type": "text", "text": "What are the key terms and conditions in this contract?" } ] } ] )

TypeScript:

TypeScript
const message = await client.messages.create({ model: "claude-sonnet-4-20250514", max_tokens: 4096, messages: [ { role: "user", content: [ { type: "document", source: { type: "file", file_id: "file-abc123" } }, { type: "text", text: "What are the key terms and conditions in this contract?" } ] } ] });

Supported File Types

TypeMedia TypeUse Case
PDFapplication/pdfDocuments, reports, invoices
PNGimage/pngScreenshots, diagrams, charts
JPEGimage/jpegPhotos, scanned documents
GIFimage/gifSimple graphics, animations (first frame)
WebPimage/webpWeb images

File Size Limits

  • Images: Up to 20 MB per image
  • PDFs: Up to 32 MB or 100 pages
  • Multiple files: You can send multiple files in a single request

Practical Examples

Example 1: Invoice Data Extraction

Extract structured data from an invoice PDF:

Python:

Python
client = anthropic.Anthropic() with open("invoice.pdf", "rb") as f: pdf_data = base64.standard_b64encode(f.read()).decode("utf-8") message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, system="You are an invoice data extraction engine. " "Extract all data and return valid JSON only. No other text.", messages=[ { "role": "user", "content": [ { "type": "document", "source": { "type": "base64", "media_type": "application/pdf", "data": pdf_data } }, { "type": "text", "text": """Extract the following from this invoice: { "invoice_number": "", "date": "", "due_date": "", "vendor": { "name": "", "address": "" }, "customer": { "name": "", "address": "" }, "line_items": [ { "description": "", "quantity": 0, "unit_price": 0, "total": 0 } ], "subtotal": 0, "tax": 0, "total": 0 }""" } ] } ] ) invoice_data = json.loads(message.content[0].text) print(json.dumps(invoice_data, indent=2))

Example 2: Contract Analysis

Analyze a legal contract and extract key clauses:

TypeScript:

TypeScript
const client = new Anthropic(); const pdfData = fs.readFileSync("contract.pdf").toString("base64"); const message = await client.messages.create({ model: "claude-sonnet-4-20250514", max_tokens: 8192, system: "You are an expert legal document analyzer. " + "Identify key clauses, obligations, risks, and important dates.", messages: [ { role: "user", content: [ { type: "document", source: { type: "base64", media_type: "application/pdf", data: pdfData } }, { type: "text", text: `Analyze this contract and provide: 1. **Parties involved** and their roles 2. **Key obligations** for each party 3. **Important dates** (start, end, renewal, deadlines) 4. **Payment terms** and amounts 5. **Termination clauses** and conditions 6. **Liability and indemnification** provisions 7. **Potential risks** or unusual clauses 8. **Confidentiality** requirements` } ] } ] }); console.log(message.content[0].text);

Example 3: Resume Parsing

Parse a resume PDF into structured data:

Python:

Python
client = anthropic.Anthropic() with open("resume.pdf", "rb") as f: pdf_data = base64.standard_b64encode(f.read()).decode("utf-8") message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, system="You are a resume parser. Extract all information into structured JSON. " "Return valid JSON only.", messages=[ { "role": "user", "content": [ { "type": "document", "source": { "type": "base64", "media_type": "application/pdf", "data": pdf_data } }, { "type": "text", "text": """Parse this resume into the following structure: { "personal_info": { "name": "", "email": "", "phone": "", "location": "", "linkedin": "", "website": "" }, "summary": "", "experience": [ { "title": "", "company": "", "location": "", "start_date": "", "end_date": "", "highlights": [] } ], "education": [ { "degree": "", "institution": "", "graduation_date": "", "gpa": null } ], "skills": { "technical": [], "languages": [], "certifications": [] } }""" } ] } ] ) resume_data = json.loads(message.content[0].text) print(json.dumps(resume_data, indent=2))

Example 4: Research Paper Summary

Summarize a research paper with key findings:

TypeScript:

TypeScript
const client = new Anthropic(); const pdfData = fs.readFileSync("research-paper.pdf").toString("base64"); const message = await client.messages.create({ model: "claude-sonnet-4-20250514", max_tokens: 4096, messages: [ { role: "user", content: [ { type: "document", source: { type: "base64", media_type: "application/pdf", data: pdfData } }, { type: "text", text: `Please provide a comprehensive summary of this research paper: 1. **Title and Authors** 2. **Research Question / Hypothesis** 3. **Methodology** (brief description) 4. **Key Findings** (bullet points) 5. **Conclusions** 6. **Limitations** mentioned by the authors 7. **Future Work** suggested 8. **Significance** - why this matters Keep the summary accessible to a non-specialist audience.` } ] } ] }); console.log(message.content[0].text);

Combining Images and PDFs in One Request

You can send multiple files of different types in a single request:

Python:

Python
client = anthropic.Anthropic() # Load a PDF with open("floor-plan.pdf", "rb") as f: pdf_data = base64.standard_b64encode(f.read()).decode("utf-8") # Load an image with open("photo-of-room.jpg", "rb") as f: image_data = base64.standard_b64encode(f.read()).decode("utf-8") message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, messages=[ { "role": "user", "content": [ { "type": "document", "source": { "type": "base64", "media_type": "application/pdf", "data": pdf_data } }, { "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": image_data } }, { "type": "text", "text": "Here is the floor plan (PDF) and a photo of the current room. " "Compare the floor plan with the photo and identify any " "discrepancies or areas that differ from the plan." } ] } ] ) print(message.content[0].text)

TypeScript:

TypeScript
const client = new Anthropic(); const pdfData = fs.readFileSync("floor-plan.pdf").toString("base64"); const imageData = fs.readFileSync("photo-of-room.jpg").toString("base64"); const message = await client.messages.create({ model: "claude-sonnet-4-20250514", max_tokens: 4096, messages: [ { role: "user", content: [ { type: "document", source: { type: "base64", media_type: "application/pdf", data: pdfData } }, { type: "image", source: { type: "base64", media_type: "image/jpeg", data: imageData } }, { type: "text", text: "Here is the floor plan (PDF) and a photo of the current room. " + "Compare the floor plan with the photo and identify any " + "discrepancies or areas that differ from the plan." } ] } ] }); console.log(message.content[0].text);

Cost Considerations

Understanding how file inputs affect pricing is important for production applications:

PDF Costs

  • Each page of a PDF is converted to an image for visual processing
  • Estimated cost: ~1,600 tokens per page (standard page)
  • A 10-page PDF costs approximately 16,000 input tokens
  • Text-heavy PDFs with simple layouts cost less than image-heavy ones

Image Costs

  • Images are resized to fit within Claude's processing limits
  • Estimated cost: varies by resolution
    • Small image (up to 384x384): ~170 tokens
    • Medium image (up to 768x768): ~680 tokens
    • Large image (up to 1568x1568): ~1,590 tokens

Cost Optimization Tips

  1. Use text extraction first: If the PDF is text-based, extract text client-side and send as plain text -- much cheaper.
Python
# Cheaper approach for text-based PDFs with pdfplumber.open("document.pdf") as pdf: text = "\n".join(page.extract_text() for page in pdf.pages) # Send extracted text instead of the PDF message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, messages=[ {"role": "user", "content": f"Analyze this document:\n\n{text}"} ] )
  1. Send only relevant pages: If you only need data from specific pages, extract and send just those pages.

  2. Resize images: Downscale large images before sending -- Claude does not need 4K resolution to understand most content.

  3. Use the Files API: For files you query multiple times, upload once to avoid repeated base64 encoding costs.

  4. Batch similar documents: Process multiple similar documents in one request when possible.


Best Practices

1. Be Specific in Your Questions

Instead of "What's in this PDF?", ask "Extract the invoice number, date, and total amount from this PDF."

2. Combine System Prompts with File Inputs

Use system prompts to set the extraction format and rules:

Python
message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, system="You are a document analysis engine. " "Always return structured JSON. " "Set missing fields to null. " "Never fabricate information not present in the document.", messages=[...] )

3. Handle Errors Gracefully

Python
client = anthropic.Anthropic() try: message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, messages=[ { "role": "user", "content": [ { "type": "document", "source": { "type": "base64", "media_type": "application/pdf", "data": pdf_data } }, { "type": "text", "text": "Extract data from this document." } ] } ] ) except anthropic.BadRequestError as e: print(f"Invalid request: {e}") # File too large, unsupported format, etc. except anthropic.APIError as e: print(f"API error: {e}") # Server-side issue, retry with backoff

4. Validate Extracted Data

Always validate the structure and content of extracted data before using it in your application.


Key Takeaways

  • Claude processes both text-based and visual/scanned PDFs
  • Use base64 encoding for direct file uploads or URLs for publicly accessible files
  • The Files API lets you upload once and reference by ID -- ideal for repeated use
  • Supported types include PDF, PNG, JPEG, GIF, and WebP
  • You can combine multiple file types (images + PDFs) in a single request
  • Cost optimization matters -- use text extraction for text PDFs, resize images, and send only relevant pages
  • Always use specific prompts with file inputs for best extraction results
  • System prompts combined with file inputs create powerful document processing pipelines