HomeEnterprise & CloudClaude on Google Vertex AI

intermediate12 min read· Module 11, Lesson 3

🔵Claude on Google Vertex AI

Run Claude through GCP Vertex AI — project setup and configuration

What Is Google Vertex AI?

Google Vertex AI is a fully managed machine-learning platform on Google Cloud Platform (GCP). It provides a unified surface for training, deploying, and consuming ML models — including third-party foundation models such as Anthropic's Claude family.

When you access Claude through Vertex AI you are not calling the Anthropic API directly. Instead, every request is routed through GCP infrastructure, which means billing, IAM, networking, and compliance all stay inside your existing Google Cloud organization.

Why Use Claude on Vertex AI?

Reason	Details
GCP-native billing	Claude usage appears on your consolidated GCP invoice — no separate Anthropic billing account required.
IAM & security	Access is governed by GCP IAM roles and policies you already manage.
Compliance	Data residency, VPC Service Controls, and CMEK encryption all apply automatically.
Private networking	Reach the API over Private Google Access or Private Service Connect — traffic never leaves the Google backbone.
Unified tooling	Use the same `gcloud` CLI, Terraform provider, and Cloud Console you use for everything else.
Enterprise support	A single GCP support contract covers both infrastructure and model access.

If your organization is already invested in GCP, Vertex AI is often the path of least resistance for adopting Claude at scale.

Enabling Claude Models on Vertex AI

Before you can call Claude you must enable the model inside your GCP project.

Step 1 — Enable the Vertex AI API

Terminal
gcloud services enable aiplatform.googleapis.com --project=my-project

Step 2 — Request Access to Claude Models

Open the Vertex AI → Model Garden in the Cloud Console.
Search for Claude and select the model you want (e.g. Claude Sonnet 4).
Click Enable and accept the terms of service.

You may need the roles/aiplatform.user or roles/aiplatform.admin role to complete this step.

Step 3 — Verify Availability

Terminal
gcloud ai models list \
  --region=us-east5 \
  --project=my-project \
  --filter="displayName~claude"

If the model appears in the output you are ready to start making requests.

Service Account Setup

Vertex AI uses GCP service accounts for authentication — there are no separate API keys.

Create a Dedicated Service Account

Terminal
gcloud iam service-accounts create claude-caller \
  --display-name="Claude Vertex Caller" \
  --project=my-project

Grant the Required Role

The minimum role is Vertex AI User (roles/aiplatform.user):

Terminal
gcloud projects add-iam-policy-binding my-project \
  --member="serviceAccount:claude-caller@my-project.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

Generate a Key (for Local Development)

Terminal

gcloud iam service-accounts keys create key.json \
  --iam-account=claude-caller@my-project.iam.gserviceaccount.com

Set the environment variable so the SDK picks it up automatically:

Terminal
export GOOGLE_APPLICATION_CREDENTIALS="$(pwd)/key.json"

Production tip: Prefer Workload Identity Federation over exported key files. It eliminates long-lived credentials entirely.

Using the Anthropic SDK with Vertex AI

Anthropic provides a first-party Python SDK that supports Vertex AI as a backend. Install the extra dependency:

Terminal

pip install "anthropic[vertex]"

Client Configuration

Python
from anthropic import AnthropicVertex

client = AnthropicVertex(
    project_id="my-project",
    region="us-east5",
)

The client automatically reads credentials from the environment (GOOGLE_APPLICATION_CREDENTIALS or Application Default Credentials).

Making a Simple Request

Python
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain Vertex AI in two sentences."}
    ],
)

print(message.content[0].text)

TypeScript / Node.js

Terminal

npm install @anthropic-ai/sdk

TypeScript

const client = new AnthropicVertex({
  projectId: "my-project",
  region: "us-east5",
});

const message = await client.messages.create({
  model: "claude-sonnet-4-20250514",
  maxTokens: 1024,
  messages: [
    { role: "user", content: "Explain Vertex AI in two sentences." },
  ],
});

console.log(message.content[0].text);

Model IDs on Vertex AI

Model identifiers on Vertex follow the Anthropic naming convention but must include the full version suffix:

Model	Vertex Model ID
Claude Opus 4	`claude-opus-4-20250514`
Claude Sonnet 4	`claude-sonnet-4-20250514`
Claude Haiku 3.5	`claude-3-5-haiku-20241022`

Check the Vertex Model Garden for the latest available IDs — Anthropic publishes new versions periodically and older snapshots may be deprecated.

Regional, Multi-Region, and Global Endpoints

Vertex AI is a regional service. When you create a client you must specify a region where Claude is available.

Available Regions (as of early 2025)

Region	Location
`us-east5`	Columbus, Ohio
`us-central1`	Iowa
`europe-west1`	Belgium
`europe-west4`	Netherlands
`asia-southeast1`	Singapore

How the Endpoint Is Constructed

The SDK builds the endpoint URL automatically:

https://{REGION}-aiplatform.googleapis.com/v1/projects/{PROJECT}/locations/{REGION}/publishers/anthropic/models/{MODEL}:streamRawPredict

Multi-Region Routing

Vertex AI does not natively load-balance across regions for Claude. If you need multi-region resilience you must implement it at the application level:

Python
from anthropic import AnthropicVertex

REGIONS = ["us-east5", "europe-west1", "asia-southeast1"]

def get_client():
    region = random.choice(REGIONS)
    return AnthropicVertex(project_id="my-project", region=region)

Choose regions that are close to your users and that satisfy any data-residency requirements your organization has.

API Differences from Direct Anthropic API

Although the Anthropic SDK abstracts most differences, a few things change when you go through Vertex:

Feature	Direct API	Vertex AI
Authentication	API key header	GCP OAuth 2.0 / ADC
Billing	Anthropic account	GCP billing account
Rate limits	Per-org Anthropic limits	Per-project GCP quotas
Endpoint	`api.anthropic.com`	`{REGION}-aiplatform.googleapis.com`
Model access	Enabled by default	Must enable in Model Garden
Networking	Public internet	Supports VPC-SC and private endpoints
System prompt	Supported	Supported (same SDK field)
Streaming	Supported	Supported
Tool use	Supported	Supported
Vision	Supported	Supported

Quota Management

Vertex AI quotas are managed through the GCP Console or gcloud:

Terminal
gcloud alpha services quota list \
  --service=aiplatform.googleapis.com \
  --project=my-project \
  --filter="metric~anthropic"

You can request quota increases directly from the Quotas page in the console.

Pricing

When you use Claude through Vertex AI, pricing consists of two components:

Model usage — charged per input and output token, at rates set by Anthropic.
Vertex AI platform fee — GCP may add a small margin on top of the base price.

Pricing is published on the Vertex AI pricing page and may differ slightly from the rates listed on anthropic.com.

Cost Optimization Tips

Use the right model. Haiku is significantly cheaper than Sonnet; Sonnet is cheaper than Opus. Pick the smallest model that meets your quality bar.
Cache system prompts. Prompt caching (beta) can reduce input token costs by up to 90 % for repeated prefixes.
Set max_tokens thoughtfully. A lower limit prevents runaway completions.
Monitor with BigQuery export. Export Vertex AI logs to BigQuery and build dashboards to track cost per feature, per team, or per customer.

Terminal
# Example: export Vertex AI audit logs to BigQuery
gcloud logging sinks create vertex-usage-sink \
  bigquery.googleapis.com/projects/my-project/datasets/vertex_logs \
  --log-filter='resource.type="aiplatform.googleapis.com/Endpoint"'

Full Working Example

Below is a complete Python script that authenticates, calls Claude on Vertex AI, and handles errors gracefully:

Python
from anthropic import AnthropicVertex, APIError

PROJECT_ID = "my-gcp-project"
REGION = "us-east5"
MODEL = "claude-sonnet-4-20250514"

def main():
    client = AnthropicVertex(
        project_id=PROJECT_ID,
        region=REGION,
    )

    try:
        response = client.messages.create(
            model=MODEL,
            max_tokens=2048,
            system="You are a helpful cloud-architecture assistant.",
            messages=[
                {
                    "role": "user",
                    "content": (
                        "Compare Cloud Run and GKE for serving "
                        "a Python ML inference API. "
                        "Give pros and cons in a table."
                    ),
                }
            ],
        )

        for block in response.content:
            if block.type == "text":
                print(block.text)

        print(f"\nTokens used — input: {response.usage.input_tokens}, "
              f"output: {response.usage.output_tokens}")

    except APIError as e:
        print(f"API error {e.status_code}: {e.message}", file=sys.stderr)
        sys.exit(1)

if __name__ == "__main__":
    main()

Streaming Example

Python
from anthropic import AnthropicVertex

client = AnthropicVertex(
    project_id="my-gcp-project",
    region="us-east5",
)

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about cloud computing."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

print()  # newline after stream ends

Troubleshooting

Problem	Likely Cause	Fix
403 Forbidden	Missing IAM role	Grant `roles/aiplatform.user` to the caller identity.
404 Not Found	Model not enabled	Enable the model in Model Garden.
Region unavailable	Claude not in that region	Switch to a supported region like `us-east5`.
Quota exceeded	Per-project RPM/TPM limit hit	Request a quota increase in the console.
Auth error locally	ADC not configured	Run `gcloud auth application-default login`.

Summary

Vertex AI lets you run Claude inside GCP with native IAM, billing, and networking.
Enable models in the Model Garden, create a service account, and install the SDK.
Use AnthropicVertex instead of Anthropic — the rest of the API is identical.
Choose regions that match your latency and compliance needs.
Monitor costs via BigQuery log exports and set budgets in the GCP Console.

Module 11

3/6

🟠 Claude on Amazon Bedrock

Claude on Microsoft Azure Foundry 🟦

3/6