HomeCore API SkillsPDFs, Files & Multi-Modal Inputs

intermediate12 min read· Module 5, Lesson 7

📄PDFs, Files & Multi-Modal Inputs

Send PDFs, images, and files to Claude for analysis and extraction

PDFs, Files & Multi-Modal Inputs

Overview

Claude can process much more than plain text. You can send PDFs, images, and other files directly to Claude for analysis, extraction, and understanding. This opens up powerful use cases like invoice processing, contract analysis, resume parsing, and more.

PDF Support in Claude

Claude can handle PDFs in two ways:

1. Text-Based PDF Processing

Claude extracts and reads the text content from PDFs. This works for digitally-created PDFs with selectable text.

2. Visual PDF Processing

Claude can also "see" PDFs as rendered pages -- understanding layouts, tables, charts, headers, footers, and visual formatting. This works for scanned documents, image-based PDFs, and complex layouts.

Feature	Text PDFs	Visual/Scanned PDFs
Text extraction	Excellent	Good (OCR-like)
Table parsing	Good	Excellent
Layout understanding	Limited	Excellent
Chart/graph reading	Not possible	Good
Speed	Fast	Moderate
Cost	Lower	Higher (image tokens)

Sending PDFs via the API

Method 1: Base64 Encoding

The most common approach is to encode the PDF as base64 and send it in the message content.

Python:

Python

client = anthropic.Anthropic()

# Read and encode the PDF file
with open("invoice.pdf", "rb") as f:
    pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": "Please extract all line items from this invoice and return them as JSON."
                }
            ]
        }
    ]
)

print(message.content[0].text)

TypeScript:

TypeScript

const client = new Anthropic();

// Read and encode the PDF file
const pdfBuffer = fs.readFileSync("invoice.pdf");
const pdfData = pdfBuffer.toString("base64");

const message = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 4096,
  messages: [
    {
      role: "user",
      content: [
        {
          type: "document",
          source: {
            type: "base64",
            media_type: "application/pdf",
            data: pdfData
          }
        },
        {
          type: "text",
          text: "Please extract all line items from this invoice and return them as JSON."
        }
      ]
    }
  ]
});

console.log(message.content[0].text);

Method 2: URL Reference

You can also reference a publicly accessible PDF by URL.

Python:

Python

client = anthropic.Anthropic()

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "url",
                        "url": "https://example.com/report.pdf"
                    }
                },
                {
                    "type": "text",
                    "text": "Summarize the key findings in this report."
                }
            ]
        }
    ]
)

print(message.content[0].text)

TypeScript:

TypeScript

const client = new Anthropic();

const message = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 4096,
  messages: [
    {
      role: "user",
      content: [
        {
          type: "document",
          source: {
            type: "url",
            url: "https://example.com/report.pdf"
          }
        },
        {
          type: "text",
          text: "Summarize the key findings in this report."
        }
      ]
    }
  ]
});

console.log(message.content[0].text);

The Files API: Upload Once, Reference by ID

For files you use repeatedly, the Files API lets you upload once and reference by ID -- saving bandwidth and improving performance.

Step 1: Upload the File

Python:

Python

client = anthropic.Anthropic()

# Upload the file once
with open("contract.pdf", "rb") as f:
    uploaded_file = client.files.create(
        file=f,
        purpose="vision"
    )

print(f"File ID: {uploaded_file.id}")
# Save this ID for future use: file-abc123

TypeScript:

TypeScript

const client = new Anthropic();

const file = await client.files.create({
  file: fs.createReadStream("contract.pdf"),
  purpose: "vision"
});

console.log("File ID:", file.id);
// Save this ID for future use: file-abc123

Step 2: Reference by ID in Messages

Python:

Python
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "file",
                        "file_id": "file-abc123"
                    }
                },
                {
                    "type": "text",
                    "text": "What are the key terms and conditions in this contract?"
                }
            ]
        }
    ]
)

TypeScript:

TypeScript
const message = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 4096,
  messages: [
    {
      role: "user",
      content: [
        {
          type: "document",
          source: {
            type: "file",
            file_id: "file-abc123"
          }
        },
        {
          type: "text",
          text: "What are the key terms and conditions in this contract?"
        }
      ]
    }
  ]
});

Supported File Types

Type	Media Type	Use Case
PDF	`application/pdf`	Documents, reports, invoices
PNG	`image/png`	Screenshots, diagrams, charts
JPEG	`image/jpeg`	Photos, scanned documents
GIF	`image/gif`	Simple graphics, animations (first frame)
WebP	`image/webp`	Web images

File Size Limits

Images: Up to 20 MB per image
PDFs: Up to 32 MB or 100 pages
Multiple files: You can send multiple files in a single request

Practical Examples

Example 1: Invoice Data Extraction

Extract structured data from an invoice PDF:

Python:

Python

client = anthropic.Anthropic()

with open("invoice.pdf", "rb") as f:
    pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    system="You are an invoice data extraction engine. "
           "Extract all data and return valid JSON only. No other text.",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": """Extract the following from this invoice:
{
  "invoice_number": "",
  "date": "",
  "due_date": "",
  "vendor": {
    "name": "",
    "address": ""
  },
  "customer": {
    "name": "",
    "address": ""
  },
  "line_items": [
    {
      "description": "",
      "quantity": 0,
      "unit_price": 0,
      "total": 0
    }
  ],
  "subtotal": 0,
  "tax": 0,
  "total": 0
}"""
                }
            ]
        }
    ]
)

invoice_data = json.loads(message.content[0].text)
print(json.dumps(invoice_data, indent=2))

Example 2: Contract Analysis

Analyze a legal contract and extract key clauses:

TypeScript:

TypeScript

const client = new Anthropic();

const pdfData = fs.readFileSync("contract.pdf").toString("base64");

const message = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 8192,
  system: "You are an expert legal document analyzer. "
    + "Identify key clauses, obligations, risks, and important dates.",
  messages: [
    {
      role: "user",
      content: [
        {
          type: "document",
          source: {
            type: "base64",
            media_type: "application/pdf",
            data: pdfData
          }
        },
        {
          type: "text",
          text: `Analyze this contract and provide:
1. **Parties involved** and their roles
2. **Key obligations** for each party
3. **Important dates** (start, end, renewal, deadlines)
4. **Payment terms** and amounts
5. **Termination clauses** and conditions
6. **Liability and indemnification** provisions
7. **Potential risks** or unusual clauses
8. **Confidentiality** requirements`
        }
      ]
    }
  ]
});

console.log(message.content[0].text);

Example 3: Resume Parsing

Parse a resume PDF into structured data:

Python:

Python

client = anthropic.Anthropic()

with open("resume.pdf", "rb") as f:
    pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    system="You are a resume parser. Extract all information into structured JSON. "
           "Return valid JSON only.",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "text",
                    "text": """Parse this resume into the following structure:
{
  "personal_info": {
    "name": "",
    "email": "",
    "phone": "",
    "location": "",
    "linkedin": "",
    "website": ""
  },
  "summary": "",
  "experience": [
    {
      "title": "",
      "company": "",
      "location": "",
      "start_date": "",
      "end_date": "",
      "highlights": []
    }
  ],
  "education": [
    {
      "degree": "",
      "institution": "",
      "graduation_date": "",
      "gpa": null
    }
  ],
  "skills": {
    "technical": [],
    "languages": [],
    "certifications": []
  }
}"""
                }
            ]
        }
    ]
)

resume_data = json.loads(message.content[0].text)
print(json.dumps(resume_data, indent=2))

Example 4: Research Paper Summary

Summarize a research paper with key findings:

TypeScript:

TypeScript

const client = new Anthropic();

const pdfData = fs.readFileSync("research-paper.pdf").toString("base64");

const message = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 4096,
  messages: [
    {
      role: "user",
      content: [
        {
          type: "document",
          source: {
            type: "base64",
            media_type: "application/pdf",
            data: pdfData
          }
        },
        {
          type: "text",
          text: `Please provide a comprehensive summary of this research paper:

1. **Title and Authors**
2. **Research Question / Hypothesis**
3. **Methodology** (brief description)
4. **Key Findings** (bullet points)
5. **Conclusions**
6. **Limitations** mentioned by the authors
7. **Future Work** suggested
8. **Significance** - why this matters

Keep the summary accessible to a non-specialist audience.`
        }
      ]
    }
  ]
});

console.log(message.content[0].text);

Combining Images and PDFs in One Request

You can send multiple files of different types in a single request:

Python:

Python

client = anthropic.Anthropic()

# Load a PDF
with open("floor-plan.pdf", "rb") as f:
    pdf_data = base64.standard_b64encode(f.read()).decode("utf-8")

# Load an image
with open("photo-of-room.jpg", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    }
                },
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/jpeg",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Here is the floor plan (PDF) and a photo of the current room. "
                            "Compare the floor plan with the photo and identify any "
                            "discrepancies or areas that differ from the plan."
                }
            ]
        }
    ]
)

print(message.content[0].text)

TypeScript:

TypeScript

const client = new Anthropic();

const pdfData = fs.readFileSync("floor-plan.pdf").toString("base64");
const imageData = fs.readFileSync("photo-of-room.jpg").toString("base64");

const message = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  max_tokens: 4096,
  messages: [
    {
      role: "user",
      content: [
        {
          type: "document",
          source: {
            type: "base64",
            media_type: "application/pdf",
            data: pdfData
          }
        },
        {
          type: "image",
          source: {
            type: "base64",
            media_type: "image/jpeg",
            data: imageData
          }
        },
        {
          type: "text",
          text: "Here is the floor plan (PDF) and a photo of the current room. "
            + "Compare the floor plan with the photo and identify any "
            + "discrepancies or areas that differ from the plan."
        }
      ]
    }
  ]
});

console.log(message.content[0].text);

Cost Considerations

Understanding how file inputs affect pricing is important for production applications:

PDF Costs

Each page of a PDF is converted to an image for visual processing
Estimated cost: ~1,600 tokens per page (standard page)
A 10-page PDF costs approximately 16,000 input tokens
Text-heavy PDFs with simple layouts cost less than image-heavy ones

Image Costs

Images are resized to fit within Claude's processing limits
Estimated cost: varies by resolution
- Small image (up to 384x384): ~170 tokens
- Medium image (up to 768x768): ~680 tokens
- Large image (up to 1568x1568): ~1,590 tokens

Cost Optimization Tips

Use text extraction first: If the PDF is text-based, extract text client-side and send as plain text -- much cheaper.

Python
# Cheaper approach for text-based PDFs

with pdfplumber.open("document.pdf") as pdf:
    text = "\n".join(page.extract_text() for page in pdf.pages)

# Send extracted text instead of the PDF
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    messages=[
        {"role": "user", "content": f"Analyze this document:\n\n{text}"}
    ]
)

Send only relevant pages: If you only need data from specific pages, extract and send just those pages.
Resize images: Downscale large images before sending -- Claude does not need 4K resolution to understand most content.
Use the Files API: For files you query multiple times, upload once to avoid repeated base64 encoding costs.
Batch similar documents: Process multiple similar documents in one request when possible.

Best Practices

1. Be Specific in Your Questions

Instead of "What's in this PDF?", ask "Extract the invoice number, date, and total amount from this PDF."

2. Combine System Prompts with File Inputs

Use system prompts to set the extraction format and rules:

Python
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=4096,
    system="You are a document analysis engine. "
           "Always return structured JSON. "
           "Set missing fields to null. "
           "Never fabricate information not present in the document.",
    messages=[...]
)

3. Handle Errors Gracefully

Python

client = anthropic.Anthropic()

try:
    message = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "document",
                        "source": {
                            "type": "base64",
                            "media_type": "application/pdf",
                            "data": pdf_data
                        }
                    },
                    {
                        "type": "text",
                        "text": "Extract data from this document."
                    }
                ]
            }
        ]
    )
except anthropic.BadRequestError as e:
    print(f"Invalid request: {e}")
    # File too large, unsupported format, etc.
except anthropic.APIError as e:
    print(f"API error: {e}")
    # Server-side issue, retry with backoff

4. Validate Extracted Data

Always validate the structure and content of extracted data before using it in your application.

Key Takeaways

Claude processes both text-based and visual/scanned PDFs
Use base64 encoding for direct file uploads or URLs for publicly accessible files
The Files API lets you upload once and reference by ID -- ideal for repeated use
Supported types include PDF, PNG, JPEG, GIF, and WebP
You can combine multiple file types (images + PDFs) in a single request
Cost optimization matters -- use text extraction for text PDFs, resize images, and send only relevant pages
Always use specific prompts with file inputs for best extraction results
System prompts combined with file inputs create powerful document processing pipelines

Module 5

7/7

🎭 System Prompts Masterclass

Vision — Claude Can See 👁️

7/7