HomeEnterprise & CloudClaude on Microsoft Azure Foundry

intermediate12 min read· Module 11, Lesson 4

🟦Claude on Microsoft Azure Foundry

Run Claude through Azure — setup and deployment

What Is Azure AI Foundry?

Azure AI Foundry (formerly Azure AI Studio) is Microsoft's unified platform for building, deploying, and managing AI applications on Azure. It provides a single hub where you can discover, test, and consume foundation models from multiple providers — including Anthropic's Claude family.

When you access Claude through Azure AI Foundry you are not calling the Anthropic API directly. Instead, every request is routed through Microsoft Azure infrastructure. This means billing, identity management, networking, and compliance all stay inside your existing Azure organization.

Think of Azure AI Foundry as a "model catalog + deployment engine" within Azure. You pick the model you want, deploy it to an endpoint, and Azure handles everything else — infrastructure, scaling, and enterprise security.

Why Use Claude on Azure Foundry?

Reason	Details
Azure-native billing	Claude usage appears on your consolidated Azure invoice — no separate Anthropic billing account required.
Entra ID (Azure AD)	Access is governed by Microsoft Entra ID roles and policies you already manage — no separate API keys needed.
Compliance	Azure compliance certifications (SOC 2, HIPAA, ISO 27001, FedRAMP) apply to model endpoints automatically.
Private networking	Reach the API through Azure Private Endpoints and VNet integration — traffic never leaves the Microsoft backbone.
Unified tooling	Use the same Azure CLI, Azure Portal, Bicep/ARM templates, and Terraform provider you use for everything else.
Enterprise support	A single Microsoft Unified Support contract covers both infrastructure and model access.
Content safety	Azure AI Content Safety filters can be layered on top of Claude responses for additional moderation.

If your organization is already invested in Azure, Foundry is often the path of least resistance for adopting Claude at scale.

Subscription Requirements

Before you can use Claude on Azure AI Foundry, you need:

An active Azure subscription — Pay-As-You-Go, Enterprise Agreement, or CSP all work.
Sufficient quota — Claude models require quota allocation in your target region.
Resource provider registration — The Microsoft.MachineLearningServices resource provider must be registered in your subscription.
Contributor or higher role — You need at least Contributor access on the resource group where you will create the Foundry resource.

To register the resource provider via CLI:

Terminal
az provider register --namespace Microsoft.MachineLearningServices
az provider show --namespace Microsoft.MachineLearningServices --query "registrationState"

Note: Registration can take a few minutes. Wait until the state shows Registered.

Creating an Azure AI Foundry Resource

Step 1 — Create a Resource Group (if needed)

Terminal
az group create \\
  --name rg-ai-foundry \\
  --location eastus2

Step 2 — Create the AI Foundry Hub

You can create a Foundry hub through the Azure Portal or the CLI.

Portal method:

Open the Azure Portal and search for Azure AI Foundry.
Click + Create and select Hub.
Choose your subscription, resource group, and region.
Give it a name (e.g. hub-ai-production).
Configure networking (public or private endpoint).
Click Review + Create.

CLI method:

Terminal
az ml workspace create \\
  --name hub-ai-production \\
  --resource-group rg-ai-foundry \\
  --kind hub \\
  --location eastus2

Step 3 — Create a Project Inside the Hub

A project is a workspace within the hub where you deploy models and manage endpoints.

Terminal
az ml workspace create \\
  --name project-claude \\
  --resource-group rg-ai-foundry \\
  --kind project \\
  --hub-id /subscriptions/<sub-id>/resourceGroups/rg-ai-foundry/providers/Microsoft.MachineLearningServices/workspaces/hub-ai-production \\
  --location eastus2

Provisioning Claude Deployments

Once your hub and project exist, you can deploy Claude models from the Model Catalog.

Using the Portal

Open your AI Foundry project in the Portal.
Navigate to Model Catalog in the left menu.
Filter by provider → Anthropic.
Select the model you want (e.g. Claude Sonnet 4).
Click Deploy.
Choose a deployment name (e.g. claude-sonnet-4-prod).
Select the deployment type:
- Serverless API (Pay-as-you-go) — recommended for most workloads.
- Managed compute — for dedicated throughput needs.
Accept the Anthropic usage terms.
Click Deploy.

Using the CLI

Terminal
az ml serverless-endpoint create \\
  --resource-group rg-ai-foundry \\
  --workspace-name project-claude \\
  --name claude-sonnet-4-prod \\
  --model-id azureml://registries/azureml-anthropic/models/Claude-sonnet-4

Important: Not all Claude models are available in every Azure region. Check the Model Catalog for availability in your target region.

Available Model IDs on Azure Foundry

Model identifiers on Azure AI Foundry differ from both the direct Anthropic API and other cloud providers:

Anthropic Model	Azure Foundry Model ID
claude-opus-4-6-20250514	Claude-opus-4
claude-sonnet-4-6-20250514	Claude-sonnet-4
claude-haiku-3-5-20241022	Claude-3.5-haiku

Important: Always use the Azure Foundry model ID when deploying or referencing models. The direct Anthropic model names will not work. The deployment name you choose during provisioning is what you use in API calls.

Endpoint Differences from the Direct API

When using Claude through Azure AI Foundry, several key differences apply compared to the direct Anthropic API:

Aspect	Direct Anthropic API	Azure AI Foundry
Base URL	`https://api.anthropic.com`	`https://<deployment-name>.<region>.models.ai.azure.com`
Auth header	`x-api-key: sk-ant-...`	`Authorization: Bearer <token>` or `api-key: <key>`
API path	`/v1/messages`	`/v1/messages` (same)
API version	`anthropic-version` header	`api-version` query parameter
Billing	Anthropic account	Azure subscription
Rate limits	Per Anthropic tier	Per Azure quota allocation
Content filtering	Anthropic safety	Azure AI Content Safety + Anthropic safety

The request and response body format is largely the same — Azure AI Foundry uses the Messages API format, so your existing message structures, system prompts, and tool definitions carry over.

Authentication

Azure AI Foundry supports two authentication methods:

Method 1 — API Key (Simple)

Each serverless deployment generates a primary and secondary key.

Terminal
# Retrieve your deployment keys
az ml serverless-endpoint get-credentials \\
  --resource-group rg-ai-foundry \\
  --workspace-name project-claude \\
  --name claude-sonnet-4-prod

Use the key in your requests:

Terminal
curl -X POST "https://claude-sonnet-4-prod.eastus2.models.ai.azure.com/v1/messages" \\
  -H "api-key: <your-api-key>" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "claude-sonnet-4-6-20250514",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

Method 2 — Microsoft Entra ID (Recommended for Production)

Use managed identities or service principals for keyless authentication:

Terminal
# Get a token using Azure CLI
TOKEN=$(az account get-access-token \\
  --resource https://ml.azure.com \\
  --query accessToken -o tsv)

curl -X POST "https://claude-sonnet-4-prod.eastus2.models.ai.azure.com/v1/messages" \\
  -H "Authorization: Bearer $TOKEN" \\
  -H "Content-Type: application/json" \\
  -d '{
    "model": "claude-sonnet-4-6-20250514",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ]
  }'

Best practice: Always use Entra ID authentication in production. Reserve API keys for local development and quick prototyping.

SDK Configuration

Using the Anthropic SDK

The official Anthropic SDK does not natively support Azure AI Foundry the way it supports Bedrock or Vertex AI. Instead, you use the Azure AI Inference SDK or call the REST API directly.

Using the Azure AI Inference SDK — Python

Terminal

pip install azure-ai-inference azure-identity

Python
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

endpoint = "https://claude-sonnet-4-prod.eastus2.models.ai.azure.com"
api_key = "<your-api-key>"

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(api_key),
)

response = client.complete(
    messages=[
        {"role": "user", "content": "What is the capital of France?"}
    ],
    max_tokens=1024,
)

print(response.choices[0].message.content)

Using Entra ID with the SDK

Python
from azure.ai.inference import ChatCompletionsClient
from azure.identity import DefaultAzureCredential

endpoint = "https://claude-sonnet-4-prod.eastus2.models.ai.azure.com"

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=DefaultAzureCredential(),
)

response = client.complete(
    messages=[
        {"role": "user", "content": "Explain machine learning briefly."}
    ],
    max_tokens=1024,
)

print(response.choices[0].message.content)

Using the Azure AI Inference SDK — TypeScript

Terminal

npm install @azure-rest/ai-inference @azure/core-auth

TypeScript

const endpoint = "https://claude-sonnet-4-prod.eastus2.models.ai.azure.com";
const apiKey = "<your-api-key>";

const client = ModelClient(endpoint, new AzureKeyCredential(apiKey));

async function main() {
  const response = await client.path("/chat/completions").post({
    body: {
      messages: [
        { role: "user", content: "What is the capital of France?" }
      ],
      max_tokens: 1024,
    },
  });

  if (response.status === "200") {
    console.log(response.body.choices[0].message.content);
  }
}

main();

Pricing on Azure Foundry

Azure AI Foundry uses pay-as-you-go pricing for serverless Claude deployments. You are billed per token — both input and output tokens are metered.

Model	Input (per 1M tokens)	Output (per 1M tokens)
Claude Opus 4	~$15.00	~$75.00
Claude Sonnet 4	~$3.00	~$15.00
Claude 3.5 Haiku	~$0.80	~$4.00

Note: Pricing may vary slightly from direct Anthropic pricing. Azure may add a small markup. Always check the Azure Pricing Calculator for current rates in your region.

Key pricing considerations:

No minimum commitment for serverless deployments.
Azure Reserved Capacity may offer discounts for high-volume usage.
Data transfer within the same region is free; cross-region traffic incurs standard Azure networking charges.
Billing appears as a line item under your Azure AI services consumption.

Beta Features and Limitations

Some Anthropic features may not be available immediately on Azure AI Foundry:

Feature	Status on Azure Foundry
Messages API	Fully supported
Streaming	Fully supported
Vision (image inputs)	Supported
Tool use / Function calling	Supported
Extended thinking	May have delayed availability
Prompt caching	May not be supported
Batch API	Not available
Computer use	Not available
PDF input	Check availability

Tip: Azure AI Foundry typically lags a few weeks behind the direct Anthropic API for new features. Always check the Azure documentation for the latest feature matrix.

Content Safety Filters

Azure adds an additional content safety layer on top of Claude's built-in safety:

Requests and responses pass through Azure AI Content Safety.
You can configure filter severity levels (low, medium, high) per category.
Categories include: hate, sexual, violence, and self-harm.
Filtered content returns a content_filter response instead of model output.

Complete Example — Building a Chat Application

Here is a full working example that ties everything together:

Python
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage, AssistantMessage
from azure.identity import DefaultAzureCredential

# Configuration
ENDPOINT = os.environ.get(
    "AZURE_CLAUDE_ENDPOINT",
    "https://claude-sonnet-4-prod.eastus2.models.ai.azure.com"
)

# Use Entra ID in production, API key for development
if os.environ.get("AZURE_CLAUDE_API_KEY"):
    from azure.core.credentials import AzureKeyCredential
    credential = AzureKeyCredential(os.environ["AZURE_CLAUDE_API_KEY"])
else:
    credential = DefaultAzureCredential()

client = ChatCompletionsClient(
    endpoint=ENDPOINT,
    credential=credential,
)

def chat(user_message: str, history: list = None) -> str:
    """Send a message and return Claude's response."""
    if history is None:
        history = []

    messages = [
        SystemMessage(content="You are a helpful assistant for Azure developers."),
        *history,
        UserMessage(content=user_message),
    ]

    response = client.complete(
        messages=messages,
        max_tokens=2048,
        temperature=0.7,
    )

    assistant_reply = response.choices[0].message.content
    history.append(UserMessage(content=user_message))
    history.append(AssistantMessage(content=assistant_reply))

    return assistant_reply

# Usage
if __name__ == "__main__":
    conversation_history = []

    while True:
        user_input = input("You: ")
        if user_input.lower() in ("exit", "quit"):
            break

        reply = chat(user_input, conversation_history)
        print(f"Claude: {reply}")

Troubleshooting Common Issues

Issue	Cause	Solution
`401 Unauthorized`	Invalid or expired key/token	Regenerate API key or refresh Entra ID token
`404 Not Found`	Wrong endpoint URL	Verify deployment name and region in the URL
`429 Too Many Requests`	Rate limit exceeded	Request quota increase in Azure Portal
`ModelNotFound`	Model not deployed	Deploy the model from Model Catalog first
Content filtered	Azure Content Safety triggered	Adjust filter levels or modify prompt
Region unavailable	Model not available in chosen region	Deploy in a supported region (e.g. eastus2, westus)

Summary

Azure AI Foundry gives enterprise Azure customers a seamless way to adopt Claude without leaving their existing cloud ecosystem. The key steps are:

Create an Azure AI Foundry hub and project.
Deploy the Claude model you need from the Model Catalog.
Authenticate using Entra ID (production) or API keys (development).
Call the endpoint using the Azure AI Inference SDK or REST API.
Monitor usage through Azure Cost Management and the Foundry dashboard.

By keeping everything inside Azure, you get unified billing, centralized identity management, compliance inheritance, and enterprise-grade networking — all while using the same Claude models available on the direct API.

Module 11

4/6

🔵 Claude on Google Vertex AI

Data Residency & Compliance 🏛️

4/6