HomeEnterprise & CloudClaude on Google Vertex AI
intermediate12 min read· Module 11, Lesson 3

🔵Claude on Google Vertex AI

Run Claude through GCP Vertex AI — project setup and configuration

What Is Google Vertex AI?

Google Vertex AI is a fully managed machine-learning platform on Google Cloud Platform (GCP). It provides a unified surface for training, deploying, and consuming ML models — including third-party foundation models such as Anthropic's Claude family.

When you access Claude through Vertex AI you are not calling the Anthropic API directly. Instead, every request is routed through GCP infrastructure, which means billing, IAM, networking, and compliance all stay inside your existing Google Cloud organization.


Why Use Claude on Vertex AI?

ReasonDetails
GCP-native billingClaude usage appears on your consolidated GCP invoice — no separate Anthropic billing account required.
IAM & securityAccess is governed by GCP IAM roles and policies you already manage.
ComplianceData residency, VPC Service Controls, and CMEK encryption all apply automatically.
Private networkingReach the API over Private Google Access or Private Service Connect — traffic never leaves the Google backbone.
Unified toolingUse the same gcloud CLI, Terraform provider, and Cloud Console you use for everything else.
Enterprise supportA single GCP support contract covers both infrastructure and model access.

If your organization is already invested in GCP, Vertex AI is often the path of least resistance for adopting Claude at scale.


Enabling Claude Models on Vertex AI

Before you can call Claude you must enable the model inside your GCP project.

Step 1 — Enable the Vertex AI API

Terminal
gcloud services enable aiplatform.googleapis.com --project=my-project

Step 2 — Request Access to Claude Models

  1. Open the Vertex AI → Model Garden in the Cloud Console.
  2. Search for Claude and select the model you want (e.g. Claude Sonnet 4).
  3. Click Enable and accept the terms of service.

You may need the roles/aiplatform.user or roles/aiplatform.admin role to complete this step.

Step 3 — Verify Availability

Terminal
gcloud ai models list \ --region=us-east5 \ --project=my-project \ --filter="displayName~claude"

If the model appears in the output you are ready to start making requests.


Service Account Setup

Vertex AI uses GCP service accounts for authentication — there are no separate API keys.

Create a Dedicated Service Account

Terminal
gcloud iam service-accounts create claude-caller \ --display-name="Claude Vertex Caller" \ --project=my-project

Grant the Required Role

The minimum role is Vertex AI User (roles/aiplatform.user):

Terminal
gcloud projects add-iam-policy-binding my-project \ --member="serviceAccount:claude-caller@my-project.iam.gserviceaccount.com" \ --role="roles/aiplatform.user"

Generate a Key (for Local Development)

Terminal
gcloud iam service-accounts keys create key.json \ --iam-account=claude-caller@my-project.iam.gserviceaccount.com

Set the environment variable so the SDK picks it up automatically:

Terminal
export GOOGLE_APPLICATION_CREDENTIALS="$(pwd)/key.json"

Production tip: Prefer Workload Identity Federation over exported key files. It eliminates long-lived credentials entirely.


Using the Anthropic SDK with Vertex AI

Anthropic provides a first-party Python SDK that supports Vertex AI as a backend. Install the extra dependency:

Terminal
pip install "anthropic[vertex]"

Client Configuration

Python
from anthropic import AnthropicVertex client = AnthropicVertex( project_id="my-project", region="us-east5", )

The client automatically reads credentials from the environment (GOOGLE_APPLICATION_CREDENTIALS or Application Default Credentials).

Making a Simple Request

Python
message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[ {"role": "user", "content": "Explain Vertex AI in two sentences."} ], ) print(message.content[0].text)

TypeScript / Node.js

Terminal
npm install @anthropic-ai/sdk
TypeScript
const client = new AnthropicVertex({ projectId: "my-project", region: "us-east5", }); const message = await client.messages.create({ model: "claude-sonnet-4-20250514", maxTokens: 1024, messages: [ { role: "user", content: "Explain Vertex AI in two sentences." }, ], }); console.log(message.content[0].text);

Model IDs on Vertex AI

Model identifiers on Vertex follow the Anthropic naming convention but must include the full version suffix:

ModelVertex Model ID
Claude Opus 4claude-opus-4-20250514
Claude Sonnet 4claude-sonnet-4-20250514
Claude Haiku 3.5claude-3-5-haiku-20241022

Check the Vertex Model Garden for the latest available IDs — Anthropic publishes new versions periodically and older snapshots may be deprecated.


Regional, Multi-Region, and Global Endpoints

Vertex AI is a regional service. When you create a client you must specify a region where Claude is available.

Available Regions (as of early 2025)

RegionLocation
us-east5Columbus, Ohio
us-central1Iowa
europe-west1Belgium
europe-west4Netherlands
asia-southeast1Singapore

How the Endpoint Is Constructed

The SDK builds the endpoint URL automatically:

https://{REGION}-aiplatform.googleapis.com/v1/projects/{PROJECT}/locations/{REGION}/publishers/anthropic/models/{MODEL}:streamRawPredict

Multi-Region Routing

Vertex AI does not natively load-balance across regions for Claude. If you need multi-region resilience you must implement it at the application level:

Python
from anthropic import AnthropicVertex REGIONS = ["us-east5", "europe-west1", "asia-southeast1"] def get_client(): region = random.choice(REGIONS) return AnthropicVertex(project_id="my-project", region=region)

Choose regions that are close to your users and that satisfy any data-residency requirements your organization has.


API Differences from Direct Anthropic API

Although the Anthropic SDK abstracts most differences, a few things change when you go through Vertex:

FeatureDirect APIVertex AI
AuthenticationAPI key headerGCP OAuth 2.0 / ADC
BillingAnthropic accountGCP billing account
Rate limitsPer-org Anthropic limitsPer-project GCP quotas
Endpointapi.anthropic.com{REGION}-aiplatform.googleapis.com
Model accessEnabled by defaultMust enable in Model Garden
NetworkingPublic internetSupports VPC-SC and private endpoints
System promptSupportedSupported (same SDK field)
StreamingSupportedSupported
Tool useSupportedSupported
VisionSupportedSupported

Quota Management

Vertex AI quotas are managed through the GCP Console or gcloud:

Terminal
gcloud alpha services quota list \ --service=aiplatform.googleapis.com \ --project=my-project \ --filter="metric~anthropic"

You can request quota increases directly from the Quotas page in the console.


Pricing

When you use Claude through Vertex AI, pricing consists of two components:

  1. Model usage — charged per input and output token, at rates set by Anthropic.
  2. Vertex AI platform fee — GCP may add a small margin on top of the base price.

Pricing is published on the Vertex AI pricing page and may differ slightly from the rates listed on anthropic.com.

Cost Optimization Tips

  • Use the right model. Haiku is significantly cheaper than Sonnet; Sonnet is cheaper than Opus. Pick the smallest model that meets your quality bar.
  • Cache system prompts. Prompt caching (beta) can reduce input token costs by up to 90 % for repeated prefixes.
  • Set max_tokens thoughtfully. A lower limit prevents runaway completions.
  • Monitor with BigQuery export. Export Vertex AI logs to BigQuery and build dashboards to track cost per feature, per team, or per customer.
Terminal
# Example: export Vertex AI audit logs to BigQuery gcloud logging sinks create vertex-usage-sink \ bigquery.googleapis.com/projects/my-project/datasets/vertex_logs \ --log-filter='resource.type="aiplatform.googleapis.com/Endpoint"'

Full Working Example

Below is a complete Python script that authenticates, calls Claude on Vertex AI, and handles errors gracefully:

Python
from anthropic import AnthropicVertex, APIError PROJECT_ID = "my-gcp-project" REGION = "us-east5" MODEL = "claude-sonnet-4-20250514" def main(): client = AnthropicVertex( project_id=PROJECT_ID, region=REGION, ) try: response = client.messages.create( model=MODEL, max_tokens=2048, system="You are a helpful cloud-architecture assistant.", messages=[ { "role": "user", "content": ( "Compare Cloud Run and GKE for serving " "a Python ML inference API. " "Give pros and cons in a table." ), } ], ) for block in response.content: if block.type == "text": print(block.text) print(f"\nTokens used — input: {response.usage.input_tokens}, " f"output: {response.usage.output_tokens}") except APIError as e: print(f"API error {e.status_code}: {e.message}", file=sys.stderr) sys.exit(1) if __name__ == "__main__": main()

Streaming Example

Python
from anthropic import AnthropicVertex client = AnthropicVertex( project_id="my-gcp-project", region="us-east5", ) with client.messages.stream( model="claude-sonnet-4-20250514", max_tokens=1024, messages=[{"role": "user", "content": "Write a haiku about cloud computing."}], ) as stream: for text in stream.text_stream: print(text, end="", flush=True) print() # newline after stream ends

Troubleshooting

ProblemLikely CauseFix
403 ForbiddenMissing IAM roleGrant roles/aiplatform.user to the caller identity.
404 Not FoundModel not enabledEnable the model in Model Garden.
Region unavailableClaude not in that regionSwitch to a supported region like us-east5.
Quota exceededPer-project RPM/TPM limit hitRequest a quota increase in the console.
Auth error locallyADC not configuredRun gcloud auth application-default login.

Summary

  • Vertex AI lets you run Claude inside GCP with native IAM, billing, and networking.
  • Enable models in the Model Garden, create a service account, and install the SDK.
  • Use AnthropicVertex instead of Anthropic — the rest of the API is identical.
  • Choose regions that match your latency and compliance needs.
  • Monitor costs via BigQuery log exports and set budgets in the GCP Console.