🔵Claude on Google Vertex AI
Run Claude through GCP Vertex AI — project setup and configuration
What Is Google Vertex AI?
Google Vertex AI is a fully managed machine-learning platform on Google Cloud Platform (GCP). It provides a unified surface for training, deploying, and consuming ML models — including third-party foundation models such as Anthropic's Claude family.
When you access Claude through Vertex AI you are not calling the Anthropic API directly. Instead, every request is routed through GCP infrastructure, which means billing, IAM, networking, and compliance all stay inside your existing Google Cloud organization.
Why Use Claude on Vertex AI?
| Reason | Details |
|---|---|
| GCP-native billing | Claude usage appears on your consolidated GCP invoice — no separate Anthropic billing account required. |
| IAM & security | Access is governed by GCP IAM roles and policies you already manage. |
| Compliance | Data residency, VPC Service Controls, and CMEK encryption all apply automatically. |
| Private networking | Reach the API over Private Google Access or Private Service Connect — traffic never leaves the Google backbone. |
| Unified tooling | Use the same gcloud CLI, Terraform provider, and Cloud Console you use for everything else. |
| Enterprise support | A single GCP support contract covers both infrastructure and model access. |
If your organization is already invested in GCP, Vertex AI is often the path of least resistance for adopting Claude at scale.
Enabling Claude Models on Vertex AI
Before you can call Claude you must enable the model inside your GCP project.
Step 1 — Enable the Vertex AI API
gcloud services enable aiplatform.googleapis.com --project=my-projectStep 2 — Request Access to Claude Models
- Open the Vertex AI → Model Garden in the Cloud Console.
- Search for Claude and select the model you want (e.g. Claude Sonnet 4).
- Click Enable and accept the terms of service.
You may need the
roles/aiplatform.userorroles/aiplatform.adminrole to complete this step.
Step 3 — Verify Availability
gcloud ai models list \
--region=us-east5 \
--project=my-project \
--filter="displayName~claude"If the model appears in the output you are ready to start making requests.
Service Account Setup
Vertex AI uses GCP service accounts for authentication — there are no separate API keys.
Create a Dedicated Service Account
gcloud iam service-accounts create claude-caller \
--display-name="Claude Vertex Caller" \
--project=my-projectGrant the Required Role
The minimum role is Vertex AI User (roles/aiplatform.user):
gcloud projects add-iam-policy-binding my-project \
--member="serviceAccount:claude-caller@my-project.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"Generate a Key (for Local Development)
gcloud iam service-accounts keys create key.json \
--iam-account=claude-caller@my-project.iam.gserviceaccount.comSet the environment variable so the SDK picks it up automatically:
export GOOGLE_APPLICATION_CREDENTIALS="$(pwd)/key.json"Production tip: Prefer Workload Identity Federation over exported key files. It eliminates long-lived credentials entirely.
Using the Anthropic SDK with Vertex AI
Anthropic provides a first-party Python SDK that supports Vertex AI as a backend. Install the extra dependency:
pip install "anthropic[vertex]"Client Configuration
from anthropic import AnthropicVertex
client = AnthropicVertex(
project_id="my-project",
region="us-east5",
)The client automatically reads credentials from the environment
(GOOGLE_APPLICATION_CREDENTIALS or Application Default Credentials).
Making a Simple Request
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain Vertex AI in two sentences."}
],
)
print(message.content[0].text)TypeScript / Node.js
npm install @anthropic-ai/sdk
const client = new AnthropicVertex({
projectId: "my-project",
region: "us-east5",
});
const message = await client.messages.create({
model: "claude-sonnet-4-20250514",
maxTokens: 1024,
messages: [
{ role: "user", content: "Explain Vertex AI in two sentences." },
],
});
console.log(message.content[0].text);Model IDs on Vertex AI
Model identifiers on Vertex follow the Anthropic naming convention but must include the full version suffix:
| Model | Vertex Model ID |
|---|---|
| Claude Opus 4 | claude-opus-4-20250514 |
| Claude Sonnet 4 | claude-sonnet-4-20250514 |
| Claude Haiku 3.5 | claude-3-5-haiku-20241022 |
Check the Vertex Model Garden for the latest available IDs — Anthropic publishes new versions periodically and older snapshots may be deprecated.
Regional, Multi-Region, and Global Endpoints
Vertex AI is a regional service. When you create a client you must specify a region where Claude is available.
Available Regions (as of early 2025)
| Region | Location |
|---|---|
us-east5 | Columbus, Ohio |
us-central1 | Iowa |
europe-west1 | Belgium |
europe-west4 | Netherlands |
asia-southeast1 | Singapore |
How the Endpoint Is Constructed
The SDK builds the endpoint URL automatically:
https://{REGION}-aiplatform.googleapis.com/v1/projects/{PROJECT}/locations/{REGION}/publishers/anthropic/models/{MODEL}:streamRawPredict
Multi-Region Routing
Vertex AI does not natively load-balance across regions for Claude. If you need multi-region resilience you must implement it at the application level:
from anthropic import AnthropicVertex
REGIONS = ["us-east5", "europe-west1", "asia-southeast1"]
def get_client():
region = random.choice(REGIONS)
return AnthropicVertex(project_id="my-project", region=region)Choose regions that are close to your users and that satisfy any data-residency requirements your organization has.
API Differences from Direct Anthropic API
Although the Anthropic SDK abstracts most differences, a few things change when you go through Vertex:
| Feature | Direct API | Vertex AI |
|---|---|---|
| Authentication | API key header | GCP OAuth 2.0 / ADC |
| Billing | Anthropic account | GCP billing account |
| Rate limits | Per-org Anthropic limits | Per-project GCP quotas |
| Endpoint | api.anthropic.com | {REGION}-aiplatform.googleapis.com |
| Model access | Enabled by default | Must enable in Model Garden |
| Networking | Public internet | Supports VPC-SC and private endpoints |
| System prompt | Supported | Supported (same SDK field) |
| Streaming | Supported | Supported |
| Tool use | Supported | Supported |
| Vision | Supported | Supported |
Quota Management
Vertex AI quotas are managed through the GCP Console or gcloud:
gcloud alpha services quota list \
--service=aiplatform.googleapis.com \
--project=my-project \
--filter="metric~anthropic"You can request quota increases directly from the Quotas page in the console.
Pricing
When you use Claude through Vertex AI, pricing consists of two components:
- Model usage — charged per input and output token, at rates set by Anthropic.
- Vertex AI platform fee — GCP may add a small margin on top of the base price.
Pricing is published on the Vertex AI pricing page and may differ slightly from the rates listed on anthropic.com.
Cost Optimization Tips
- Use the right model. Haiku is significantly cheaper than Sonnet; Sonnet is cheaper than Opus. Pick the smallest model that meets your quality bar.
- Cache system prompts. Prompt caching (beta) can reduce input token costs by up to 90 % for repeated prefixes.
- Set
max_tokensthoughtfully. A lower limit prevents runaway completions. - Monitor with BigQuery export. Export Vertex AI logs to BigQuery and build dashboards to track cost per feature, per team, or per customer.
# Example: export Vertex AI audit logs to BigQuery
gcloud logging sinks create vertex-usage-sink \
bigquery.googleapis.com/projects/my-project/datasets/vertex_logs \
--log-filter='resource.type="aiplatform.googleapis.com/Endpoint"'Full Working Example
Below is a complete Python script that authenticates, calls Claude on Vertex AI, and handles errors gracefully:
from anthropic import AnthropicVertex, APIError
PROJECT_ID = "my-gcp-project"
REGION = "us-east5"
MODEL = "claude-sonnet-4-20250514"
def main():
client = AnthropicVertex(
project_id=PROJECT_ID,
region=REGION,
)
try:
response = client.messages.create(
model=MODEL,
max_tokens=2048,
system="You are a helpful cloud-architecture assistant.",
messages=[
{
"role": "user",
"content": (
"Compare Cloud Run and GKE for serving "
"a Python ML inference API. "
"Give pros and cons in a table."
),
}
],
)
for block in response.content:
if block.type == "text":
print(block.text)
print(f"\nTokens used — input: {response.usage.input_tokens}, "
f"output: {response.usage.output_tokens}")
except APIError as e:
print(f"API error {e.status_code}: {e.message}", file=sys.stderr)
sys.exit(1)
if __name__ == "__main__":
main()Streaming Example
from anthropic import AnthropicVertex
client = AnthropicVertex(
project_id="my-gcp-project",
region="us-east5",
)
with client.messages.stream(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": "Write a haiku about cloud computing."}],
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
print() # newline after stream endsTroubleshooting
| Problem | Likely Cause | Fix |
|---|---|---|
| 403 Forbidden | Missing IAM role | Grant roles/aiplatform.user to the caller identity. |
| 404 Not Found | Model not enabled | Enable the model in Model Garden. |
| Region unavailable | Claude not in that region | Switch to a supported region like us-east5. |
| Quota exceeded | Per-project RPM/TPM limit hit | Request a quota increase in the console. |
| Auth error locally | ADC not configured | Run gcloud auth application-default login. |
Summary
- Vertex AI lets you run Claude inside GCP with native IAM, billing, and networking.
- Enable models in the Model Garden, create a service account, and install the SDK.
- Use
AnthropicVertexinstead ofAnthropic— the rest of the API is identical. - Choose regions that match your latency and compliance needs.
- Monitor costs via BigQuery log exports and set budgets in the GCP Console.