🟦Claude on Microsoft Azure Foundry
Run Claude through Azure — setup and deployment
What Is Azure AI Foundry?
Azure AI Foundry (formerly Azure AI Studio) is Microsoft's unified platform for building, deploying, and managing AI applications on Azure. It provides a single hub where you can discover, test, and consume foundation models from multiple providers — including Anthropic's Claude family.
When you access Claude through Azure AI Foundry you are not calling the Anthropic API directly. Instead, every request is routed through Microsoft Azure infrastructure. This means billing, identity management, networking, and compliance all stay inside your existing Azure organization.
Think of Azure AI Foundry as a "model catalog + deployment engine" within Azure. You pick the model you want, deploy it to an endpoint, and Azure handles everything else — infrastructure, scaling, and enterprise security.
Why Use Claude on Azure Foundry?
| Reason | Details |
|---|---|
| Azure-native billing | Claude usage appears on your consolidated Azure invoice — no separate Anthropic billing account required. |
| Entra ID (Azure AD) | Access is governed by Microsoft Entra ID roles and policies you already manage — no separate API keys needed. |
| Compliance | Azure compliance certifications (SOC 2, HIPAA, ISO 27001, FedRAMP) apply to model endpoints automatically. |
| Private networking | Reach the API through Azure Private Endpoints and VNet integration — traffic never leaves the Microsoft backbone. |
| Unified tooling | Use the same Azure CLI, Azure Portal, Bicep/ARM templates, and Terraform provider you use for everything else. |
| Enterprise support | A single Microsoft Unified Support contract covers both infrastructure and model access. |
| Content safety | Azure AI Content Safety filters can be layered on top of Claude responses for additional moderation. |
If your organization is already invested in Azure, Foundry is often the path of least resistance for adopting Claude at scale.
Subscription Requirements
Before you can use Claude on Azure AI Foundry, you need:
- An active Azure subscription — Pay-As-You-Go, Enterprise Agreement, or CSP all work.
- Sufficient quota — Claude models require quota allocation in your target region.
- Resource provider registration — The
Microsoft.MachineLearningServicesresource provider must be registered in your subscription. - Contributor or higher role — You need at least Contributor access on the resource group where you will create the Foundry resource.
To register the resource provider via CLI:
az provider register --namespace Microsoft.MachineLearningServices
az provider show --namespace Microsoft.MachineLearningServices --query "registrationState"Note: Registration can take a few minutes. Wait until the state shows
Registered.
Creating an Azure AI Foundry Resource
Step 1 — Create a Resource Group (if needed)
az group create \\
--name rg-ai-foundry \\
--location eastus2Step 2 — Create the AI Foundry Hub
You can create a Foundry hub through the Azure Portal or the CLI.
Portal method:
- Open the Azure Portal and search for Azure AI Foundry.
- Click + Create and select Hub.
- Choose your subscription, resource group, and region.
- Give it a name (e.g.
hub-ai-production). - Configure networking (public or private endpoint).
- Click Review + Create.
CLI method:
az ml workspace create \\
--name hub-ai-production \\
--resource-group rg-ai-foundry \\
--kind hub \\
--location eastus2Step 3 — Create a Project Inside the Hub
A project is a workspace within the hub where you deploy models and manage endpoints.
az ml workspace create \\
--name project-claude \\
--resource-group rg-ai-foundry \\
--kind project \\
--hub-id /subscriptions/<sub-id>/resourceGroups/rg-ai-foundry/providers/Microsoft.MachineLearningServices/workspaces/hub-ai-production \\
--location eastus2Provisioning Claude Deployments
Once your hub and project exist, you can deploy Claude models from the Model Catalog.
Using the Portal
- Open your AI Foundry project in the Portal.
- Navigate to Model Catalog in the left menu.
- Filter by provider → Anthropic.
- Select the model you want (e.g. Claude Sonnet 4).
- Click Deploy.
- Choose a deployment name (e.g.
claude-sonnet-4-prod). - Select the deployment type:
- Serverless API (Pay-as-you-go) — recommended for most workloads.
- Managed compute — for dedicated throughput needs.
- Accept the Anthropic usage terms.
- Click Deploy.
Using the CLI
az ml serverless-endpoint create \\
--resource-group rg-ai-foundry \\
--workspace-name project-claude \\
--name claude-sonnet-4-prod \\
--model-id azureml://registries/azureml-anthropic/models/Claude-sonnet-4Important: Not all Claude models are available in every Azure region. Check the Model Catalog for availability in your target region.
Available Model IDs on Azure Foundry
Model identifiers on Azure AI Foundry differ from both the direct Anthropic API and other cloud providers:
| Anthropic Model | Azure Foundry Model ID |
|---|---|
| claude-opus-4-6-20250514 | Claude-opus-4 |
| claude-sonnet-4-6-20250514 | Claude-sonnet-4 |
| claude-haiku-3-5-20241022 | Claude-3.5-haiku |
Important: Always use the Azure Foundry model ID when deploying or referencing models. The direct Anthropic model names will not work. The deployment name you choose during provisioning is what you use in API calls.
Endpoint Differences from the Direct API
When using Claude through Azure AI Foundry, several key differences apply compared to the direct Anthropic API:
| Aspect | Direct Anthropic API | Azure AI Foundry |
|---|---|---|
| Base URL | https://api.anthropic.com | https://<deployment-name>.<region>.models.ai.azure.com |
| Auth header | x-api-key: sk-ant-... | Authorization: Bearer <token> or api-key: <key> |
| API path | /v1/messages | /v1/messages (same) |
| API version | anthropic-version header | api-version query parameter |
| Billing | Anthropic account | Azure subscription |
| Rate limits | Per Anthropic tier | Per Azure quota allocation |
| Content filtering | Anthropic safety | Azure AI Content Safety + Anthropic safety |
The request and response body format is largely the same — Azure AI Foundry uses the Messages API format, so your existing message structures, system prompts, and tool definitions carry over.
Authentication
Azure AI Foundry supports two authentication methods:
Method 1 — API Key (Simple)
Each serverless deployment generates a primary and secondary key.
# Retrieve your deployment keys
az ml serverless-endpoint get-credentials \\
--resource-group rg-ai-foundry \\
--workspace-name project-claude \\
--name claude-sonnet-4-prodUse the key in your requests:
curl -X POST "https://claude-sonnet-4-prod.eastus2.models.ai.azure.com/v1/messages" \\
-H "api-key: <your-api-key>" \\
-H "Content-Type: application/json" \\
-d '{
"model": "claude-sonnet-4-6-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}'Method 2 — Microsoft Entra ID (Recommended for Production)
Use managed identities or service principals for keyless authentication:
# Get a token using Azure CLI
TOKEN=$(az account get-access-token \\
--resource https://ml.azure.com \\
--query accessToken -o tsv)
curl -X POST "https://claude-sonnet-4-prod.eastus2.models.ai.azure.com/v1/messages" \\
-H "Authorization: Bearer $TOKEN" \\
-H "Content-Type: application/json" \\
-d '{
"model": "claude-sonnet-4-6-20250514",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Explain quantum computing in simple terms."}
]
}'Best practice: Always use Entra ID authentication in production. Reserve API keys for local development and quick prototyping.
SDK Configuration
Using the Anthropic SDK
The official Anthropic SDK does not natively support Azure AI Foundry the way it supports Bedrock or Vertex AI. Instead, you use the Azure AI Inference SDK or call the REST API directly.
Using the Azure AI Inference SDK — Python
pip install azure-ai-inference azure-identityfrom azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
endpoint = "https://claude-sonnet-4-prod.eastus2.models.ai.azure.com"
api_key = "<your-api-key>"
client = ChatCompletionsClient(
endpoint=endpoint,
credential=AzureKeyCredential(api_key),
)
response = client.complete(
messages=[
{"role": "user", "content": "What is the capital of France?"}
],
max_tokens=1024,
)
print(response.choices[0].message.content)Using Entra ID with the SDK
from azure.ai.inference import ChatCompletionsClient
from azure.identity import DefaultAzureCredential
endpoint = "https://claude-sonnet-4-prod.eastus2.models.ai.azure.com"
client = ChatCompletionsClient(
endpoint=endpoint,
credential=DefaultAzureCredential(),
)
response = client.complete(
messages=[
{"role": "user", "content": "Explain machine learning briefly."}
],
max_tokens=1024,
)
print(response.choices[0].message.content)Using the Azure AI Inference SDK — TypeScript
npm install @azure-rest/ai-inference @azure/core-auth
const endpoint = "https://claude-sonnet-4-prod.eastus2.models.ai.azure.com";
const apiKey = "<your-api-key>";
const client = ModelClient(endpoint, new AzureKeyCredential(apiKey));
async function main() {
const response = await client.path("/chat/completions").post({
body: {
messages: [
{ role: "user", content: "What is the capital of France?" }
],
max_tokens: 1024,
},
});
if (response.status === "200") {
console.log(response.body.choices[0].message.content);
}
}
main();Pricing on Azure Foundry
Azure AI Foundry uses pay-as-you-go pricing for serverless Claude deployments. You are billed per token — both input and output tokens are metered.
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Claude Opus 4 | ~$15.00 | ~$75.00 |
| Claude Sonnet 4 | ~$3.00 | ~$15.00 |
| Claude 3.5 Haiku | ~$0.80 | ~$4.00 |
Note: Pricing may vary slightly from direct Anthropic pricing. Azure may add a small markup. Always check the Azure Pricing Calculator for current rates in your region.
Key pricing considerations:
- No minimum commitment for serverless deployments.
- Azure Reserved Capacity may offer discounts for high-volume usage.
- Data transfer within the same region is free; cross-region traffic incurs standard Azure networking charges.
- Billing appears as a line item under your Azure AI services consumption.
Beta Features and Limitations
Some Anthropic features may not be available immediately on Azure AI Foundry:
| Feature | Status on Azure Foundry |
|---|---|
| Messages API | Fully supported |
| Streaming | Fully supported |
| Vision (image inputs) | Supported |
| Tool use / Function calling | Supported |
| Extended thinking | May have delayed availability |
| Prompt caching | May not be supported |
| Batch API | Not available |
| Computer use | Not available |
| PDF input | Check availability |
Tip: Azure AI Foundry typically lags a few weeks behind the direct Anthropic API for new features. Always check the Azure documentation for the latest feature matrix.
Content Safety Filters
Azure adds an additional content safety layer on top of Claude's built-in safety:
- Requests and responses pass through Azure AI Content Safety.
- You can configure filter severity levels (low, medium, high) per category.
- Categories include: hate, sexual, violence, and self-harm.
- Filtered content returns a
content_filterresponse instead of model output.
Complete Example — Building a Chat Application
Here is a full working example that ties everything together:
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage, AssistantMessage
from azure.identity import DefaultAzureCredential
# Configuration
ENDPOINT = os.environ.get(
"AZURE_CLAUDE_ENDPOINT",
"https://claude-sonnet-4-prod.eastus2.models.ai.azure.com"
)
# Use Entra ID in production, API key for development
if os.environ.get("AZURE_CLAUDE_API_KEY"):
from azure.core.credentials import AzureKeyCredential
credential = AzureKeyCredential(os.environ["AZURE_CLAUDE_API_KEY"])
else:
credential = DefaultAzureCredential()
client = ChatCompletionsClient(
endpoint=ENDPOINT,
credential=credential,
)
def chat(user_message: str, history: list = None) -> str:
"""Send a message and return Claude's response."""
if history is None:
history = []
messages = [
SystemMessage(content="You are a helpful assistant for Azure developers."),
*history,
UserMessage(content=user_message),
]
response = client.complete(
messages=messages,
max_tokens=2048,
temperature=0.7,
)
assistant_reply = response.choices[0].message.content
history.append(UserMessage(content=user_message))
history.append(AssistantMessage(content=assistant_reply))
return assistant_reply
# Usage
if __name__ == "__main__":
conversation_history = []
while True:
user_input = input("You: ")
if user_input.lower() in ("exit", "quit"):
break
reply = chat(user_input, conversation_history)
print(f"Claude: {reply}")Troubleshooting Common Issues
| Issue | Cause | Solution |
|---|---|---|
401 Unauthorized | Invalid or expired key/token | Regenerate API key or refresh Entra ID token |
404 Not Found | Wrong endpoint URL | Verify deployment name and region in the URL |
429 Too Many Requests | Rate limit exceeded | Request quota increase in Azure Portal |
ModelNotFound | Model not deployed | Deploy the model from Model Catalog first |
| Content filtered | Azure Content Safety triggered | Adjust filter levels or modify prompt |
| Region unavailable | Model not available in chosen region | Deploy in a supported region (e.g. eastus2, westus) |
Summary
Azure AI Foundry gives enterprise Azure customers a seamless way to adopt Claude without leaving their existing cloud ecosystem. The key steps are:
- Create an Azure AI Foundry hub and project.
- Deploy the Claude model you need from the Model Catalog.
- Authenticate using Entra ID (production) or API keys (development).
- Call the endpoint using the Azure AI Inference SDK or REST API.
- Monitor usage through Azure Cost Management and the Foundry dashboard.
By keeping everything inside Azure, you get unified billing, centralized identity management, compliance inheritance, and enterprise-grade networking — all while using the same Claude models available on the direct API.