📄PDFs, Files & Multi-Modal Inputs
Send PDFs, images, and files to Claude for analysis and extraction
PDFs, Files & Multi-Modal Inputs
Overview
Claude can process much more than plain text. You can send PDFs, images, and other files directly to Claude for analysis, extraction, and understanding. This opens up powerful use cases like invoice processing, contract analysis, resume parsing, and more.
PDF Support in Claude
Claude can handle PDFs in two ways:
1. Text-Based PDF Processing
Claude extracts and reads the text content from PDFs. This works for digitally-created PDFs with selectable text.
2. Visual PDF Processing
Claude can also "see" PDFs as rendered pages -- understanding layouts, tables, charts, headers, footers, and visual formatting. This works for scanned documents, image-based PDFs, and complex layouts.
| Feature | Text PDFs | Visual/Scanned PDFs |
|---|---|---|
| Text extraction | Excellent | Good (OCR-like) |
| Table parsing | Good | Excellent |
| Layout understanding | Limited | Excellent |
| Chart/graph reading | Not possible | Good |
| Speed | Fast | Moderate |
| Cost | Lower | Higher (image tokens) |
Sending PDFs via the API
Method 1: Base64 Encoding
The most common approach is to encode the PDF as base64 and send it in the message content.
Python:
client = anthropic.Anthropic()
# Read and encode the PDF file
with open("invoice.pdf", "rb") as f:
pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "Please extract all line items from this invoice and return them as JSON."
}
]
}
]
)
print(message.content[0].text)TypeScript:
const client = new Anthropic();
// Read and encode the PDF file
const pdfBuffer = fs.readFileSync("invoice.pdf");
const pdfData = pdfBuffer.toString("base64");
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
messages: [
{
role: "user",
content: [
{
type: "document",
source: {
type: "base64",
media_type: "application/pdf",
data: pdfData
}
},
{
type: "text",
text: "Please extract all line items from this invoice and return them as JSON."
}
]
}
]
});
console.log(message.content[0].text);Method 2: URL Reference
You can also reference a publicly accessible PDF by URL.
Python:
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "url",
"url": "https://example.com/report.pdf"
}
},
{
"type": "text",
"text": "Summarize the key findings in this report."
}
]
}
]
)
print(message.content[0].text)TypeScript:
const client = new Anthropic();
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
messages: [
{
role: "user",
content: [
{
type: "document",
source: {
type: "url",
url: "https://example.com/report.pdf"
}
},
{
type: "text",
text: "Summarize the key findings in this report."
}
]
}
]
});
console.log(message.content[0].text);The Files API: Upload Once, Reference by ID
For files you use repeatedly, the Files API lets you upload once and reference by ID -- saving bandwidth and improving performance.
Step 1: Upload the File
Python:
client = anthropic.Anthropic()
# Upload the file once
with open("contract.pdf", "rb") as f:
uploaded_file = client.files.create(
file=f,
purpose="vision"
)
print(f"File ID: {uploaded_file.id}")
# Save this ID for future use: file-abc123TypeScript:
const client = new Anthropic();
const file = await client.files.create({
file: fs.createReadStream("contract.pdf"),
purpose: "vision"
});
console.log("File ID:", file.id);
// Save this ID for future use: file-abc123Step 2: Reference by ID in Messages
Python:
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "file",
"file_id": "file-abc123"
}
},
{
"type": "text",
"text": "What are the key terms and conditions in this contract?"
}
]
}
]
)TypeScript:
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
messages: [
{
role: "user",
content: [
{
type: "document",
source: {
type: "file",
file_id: "file-abc123"
}
},
{
type: "text",
text: "What are the key terms and conditions in this contract?"
}
]
}
]
});Supported File Types
| Type | Media Type | Use Case |
|---|---|---|
application/pdf | Documents, reports, invoices | |
| PNG | image/png | Screenshots, diagrams, charts |
| JPEG | image/jpeg | Photos, scanned documents |
| GIF | image/gif | Simple graphics, animations (first frame) |
| WebP | image/webp | Web images |
File Size Limits
- Images: Up to 20 MB per image
- PDFs: Up to 32 MB or 100 pages
- Multiple files: You can send multiple files in a single request
Practical Examples
Example 1: Invoice Data Extraction
Extract structured data from an invoice PDF:
Python:
client = anthropic.Anthropic()
with open("invoice.pdf", "rb") as f:
pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="You are an invoice data extraction engine. "
"Extract all data and return valid JSON only. No other text.",
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": """Extract the following from this invoice:
{
"invoice_number": "",
"date": "",
"due_date": "",
"vendor": {
"name": "",
"address": ""
},
"customer": {
"name": "",
"address": ""
},
"line_items": [
{
"description": "",
"quantity": 0,
"unit_price": 0,
"total": 0
}
],
"subtotal": 0,
"tax": 0,
"total": 0
}"""
}
]
}
]
)
invoice_data = json.loads(message.content[0].text)
print(json.dumps(invoice_data, indent=2))Example 2: Contract Analysis
Analyze a legal contract and extract key clauses:
TypeScript:
const client = new Anthropic();
const pdfData = fs.readFileSync("contract.pdf").toString("base64");
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 8192,
system: "You are an expert legal document analyzer. "
+ "Identify key clauses, obligations, risks, and important dates.",
messages: [
{
role: "user",
content: [
{
type: "document",
source: {
type: "base64",
media_type: "application/pdf",
data: pdfData
}
},
{
type: "text",
text: `Analyze this contract and provide:
1. **Parties involved** and their roles
2. **Key obligations** for each party
3. **Important dates** (start, end, renewal, deadlines)
4. **Payment terms** and amounts
5. **Termination clauses** and conditions
6. **Liability and indemnification** provisions
7. **Potential risks** or unusual clauses
8. **Confidentiality** requirements`
}
]
}
]
});
console.log(message.content[0].text);Example 3: Resume Parsing
Parse a resume PDF into structured data:
Python:
client = anthropic.Anthropic()
with open("resume.pdf", "rb") as f:
pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="You are a resume parser. Extract all information into structured JSON. "
"Return valid JSON only.",
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": """Parse this resume into the following structure:
{
"personal_info": {
"name": "",
"email": "",
"phone": "",
"location": "",
"linkedin": "",
"website": ""
},
"summary": "",
"experience": [
{
"title": "",
"company": "",
"location": "",
"start_date": "",
"end_date": "",
"highlights": []
}
],
"education": [
{
"degree": "",
"institution": "",
"graduation_date": "",
"gpa": null
}
],
"skills": {
"technical": [],
"languages": [],
"certifications": []
}
}"""
}
]
}
]
)
resume_data = json.loads(message.content[0].text)
print(json.dumps(resume_data, indent=2))Example 4: Research Paper Summary
Summarize a research paper with key findings:
TypeScript:
const client = new Anthropic();
const pdfData = fs.readFileSync("research-paper.pdf").toString("base64");
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
messages: [
{
role: "user",
content: [
{
type: "document",
source: {
type: "base64",
media_type: "application/pdf",
data: pdfData
}
},
{
type: "text",
text: `Please provide a comprehensive summary of this research paper:
1. **Title and Authors**
2. **Research Question / Hypothesis**
3. **Methodology** (brief description)
4. **Key Findings** (bullet points)
5. **Conclusions**
6. **Limitations** mentioned by the authors
7. **Future Work** suggested
8. **Significance** - why this matters
Keep the summary accessible to a non-specialist audience.`
}
]
}
]
});
console.log(message.content[0].text);Combining Images and PDFs in One Request
You can send multiple files of different types in a single request:
Python:
client = anthropic.Anthropic()
# Load a PDF
with open("floor-plan.pdf", "rb") as f:
pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")
# Load an image
with open("photo-of-room.jpg", "rb") as f:
image_data = base64.standard_b64encode(f.read()).decode("utf-8")
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/jpeg",
"data": image_data
}
},
{
"type": "text",
"text": "Here is the floor plan (PDF) and a photo of the current room. "
"Compare the floor plan with the photo and identify any "
"discrepancies or areas that differ from the plan."
}
]
}
]
)
print(message.content[0].text)TypeScript:
const client = new Anthropic();
const pdfData = fs.readFileSync("floor-plan.pdf").toString("base64");
const imageData = fs.readFileSync("photo-of-room.jpg").toString("base64");
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
messages: [
{
role: "user",
content: [
{
type: "document",
source: {
type: "base64",
media_type: "application/pdf",
data: pdfData
}
},
{
type: "image",
source: {
type: "base64",
media_type: "image/jpeg",
data: imageData
}
},
{
type: "text",
text: "Here is the floor plan (PDF) and a photo of the current room. "
+ "Compare the floor plan with the photo and identify any "
+ "discrepancies or areas that differ from the plan."
}
]
}
]
});
console.log(message.content[0].text);Cost Considerations
Understanding how file inputs affect pricing is important for production applications:
PDF Costs
- Each page of a PDF is converted to an image for visual processing
- Estimated cost: ~1,600 tokens per page (standard page)
- A 10-page PDF costs approximately 16,000 input tokens
- Text-heavy PDFs with simple layouts cost less than image-heavy ones
Image Costs
- Images are resized to fit within Claude's processing limits
- Estimated cost: varies by resolution
- Small image (up to 384x384): ~170 tokens
- Medium image (up to 768x768): ~680 tokens
- Large image (up to 1568x1568): ~1,590 tokens
Cost Optimization Tips
- Use text extraction first: If the PDF is text-based, extract text client-side and send as plain text -- much cheaper.
# Cheaper approach for text-based PDFs
with pdfplumber.open("document.pdf") as pdf:
text = "\n".join(page.extract_text() for page in pdf.pages)
# Send extracted text instead of the PDF
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[
{"role": "user", "content": f"Analyze this document:\n\n{text}"}
]
)-
Send only relevant pages: If you only need data from specific pages, extract and send just those pages.
-
Resize images: Downscale large images before sending -- Claude does not need 4K resolution to understand most content.
-
Use the Files API: For files you query multiple times, upload once to avoid repeated base64 encoding costs.
-
Batch similar documents: Process multiple similar documents in one request when possible.
Best Practices
1. Be Specific in Your Questions
Instead of "What's in this PDF?", ask "Extract the invoice number, date, and total amount from this PDF."
2. Combine System Prompts with File Inputs
Use system prompts to set the extraction format and rules:
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="You are a document analysis engine. "
"Always return structured JSON. "
"Set missing fields to null. "
"Never fabricate information not present in the document.",
messages=[...]
)3. Handle Errors Gracefully
client = anthropic.Anthropic()
try:
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
}
},
{
"type": "text",
"text": "Extract data from this document."
}
]
}
]
)
except anthropic.BadRequestError as e:
print(f"Invalid request: {e}")
# File too large, unsupported format, etc.
except anthropic.APIError as e:
print(f"API error: {e}")
# Server-side issue, retry with backoff4. Validate Extracted Data
Always validate the structure and content of extracted data before using it in your application.
Key Takeaways
- Claude processes both text-based and visual/scanned PDFs
- Use base64 encoding for direct file uploads or URLs for publicly accessible files
- The Files API lets you upload once and reference by ID -- ideal for repeated use
- Supported types include PDF, PNG, JPEG, GIF, and WebP
- You can combine multiple file types (images + PDFs) in a single request
- Cost optimization matters -- use text extraction for text PDFs, resize images, and send only relevant pages
- Always use specific prompts with file inputs for best extraction results
- System prompts combined with file inputs create powerful document processing pipelines