HomeBuilding Real ProjectsProject: AI Data Dashboard
advanced20 min read· Module 8, Lesson 8

📈Project: AI Data Dashboard

Analyze CSV data with Claude's vision and structured outputs to generate insights

Project: AI Data Dashboard

Overview

In this project you will build a complete AI-powered data analysis pipeline in Python. The pipeline reads any CSV file, sends the data to Claude, and produces a structured JSON analysis plus a human-readable Markdown report.

By the end you will have a reusable tool that can:

  • Detect column types automatically
  • Identify trends, outliers, and statistical summaries
  • Generate chart descriptions (even without a charting library)
  • Output a polished Markdown report ready for stakeholders

Prerequisites

RequirementWhy
Python 3.10+async / structural pattern matching
anthropic SDKClaude API access
pandasCSV reading and quick stats
An Anthropic API keySet as ANTHROPIC_API_KEY

Install the dependencies:

Terminal
pip install anthropic pandas

Step 1 — Project Structure

Create a folder and the files we need:

ai-data-dashboard/ ├── dashboard.py # Main script ├── analysis_schema.py # Pydantic-style JSON schema ├── report_generator.py # Markdown report builder ├── sample_data.csv # Any CSV for testing └── output/ ├── analysis.json └── report.md

Step 2 — Define the Structured Output Schema

We want Claude to return a predictable JSON structure so the rest of the code can consume it without guessing.

Python
# analysis_schema.py ANALYSIS_SCHEMA = { "type": "object", "properties": { "dataset_summary": { "type": "object", "properties": { "row_count": {"type": "integer"}, "column_count": {"type": "integer"}, "columns": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "dtype": {"type": "string"}, "missing_pct": {"type": "number"}, "unique_values": {"type": "integer"} }, "required": ["name", "dtype", "missing_pct", "unique_values"] } } }, "required": ["row_count", "column_count", "columns"] }, "statistics": { "type": "array", "items": { "type": "object", "properties": { "column": {"type": "string"}, "mean": {"type": "number"}, "median": {"type": "number"}, "std_dev": {"type": "number"}, "min": {"type": "number"}, "max": {"type": "number"} }, "required": ["column", "mean", "median", "std_dev", "min", "max"] } }, "trends": { "type": "array", "items": { "type": "object", "properties": { "description": {"type": "string"}, "confidence": {"type": "string"}, "affected_columns": { "type": "array", "items": {"type": "string"} } }, "required": ["description", "confidence", "affected_columns"] } }, "outliers": { "type": "array", "items": { "type": "object", "properties": { "column": {"type": "string"}, "value": {"type": "number"}, "row_index": {"type": "integer"}, "reason": {"type": "string"} }, "required": ["column", "value", "row_index", "reason"] } }, "chart_suggestions": { "type": "array", "items": { "type": "object", "properties": { "chart_type": {"type": "string"}, "title": {"type": "string"}, "x_axis": {"type": "string"}, "y_axis": {"type": "string"}, "description": {"type": "string"} }, "required": ["chart_type", "title", "x_axis", "y_axis", "description"] } }, "key_insights": { "type": "array", "items": {"type": "string"} } }, "required": [ "dataset_summary", "statistics", "trends", "outliers", "chart_suggestions", "key_insights" ] }

Why a schema?

When you pass this schema to Claude via the structured output feature the model is forced to return valid JSON that matches every required field. No manual parsing or regex needed.


Step 3 — Read the CSV and Build a Data Summary

Python
# dashboard.py (part 1) from analysis_schema import ANALYSIS_SCHEMA def read_csv(path: str) -> pd.DataFrame: """Read a CSV and return a DataFrame.""" df = pd.read_csv(path) print(f"Loaded {len(df)} rows, {len(df.columns)} columns from {path}") return df def build_data_summary(df: pd.DataFrame) -> str: """Create a text summary Claude can consume.""" lines = [] lines.append(f"Dataset: {len(df)} rows x {len(df.columns)} columns") lines.append("") lines.append("Column info:") for col in df.columns: dtype = str(df[col].dtype) missing = df[col].isna().sum() unique = df[col].nunique() lines.append(f" - {col}: type={dtype}, missing={missing}, unique={unique}") lines.append("") lines.append("First 5 rows (CSV):") lines.append(df.head().to_csv(index=False)) lines.append("") lines.append("Descriptive statistics:") lines.append(df.describe(include="all").to_string()) return "\n".join(lines)

Key decisions

  • We send the first 5 rows so Claude understands the shape of the data.
  • We include describe() output so the model has raw stats to reference.
  • Keeping the payload text-based avoids token-heavy base64 images.

Step 4 — Call Claude with Structured Outputs

Python
# dashboard.py (part 2) def analyze_with_claude(data_summary: str) -> dict: """Send the data summary to Claude and get structured analysis.""" client = anthropic.Anthropic() # uses ANTHROPIC_API_KEY env var system_prompt = """You are a senior data analyst AI. You will receive a summary of a CSV dataset including column metadata, sample rows, and descriptive statistics. Your task: 1. Summarize the dataset structure. 2. Compute or confirm key statistics for every numeric column. 3. Identify trends (time-based, correlations, patterns). 4. Flag outliers with a clear reason. 5. Suggest charts that would best visualize the data. 6. Provide 3-5 key insights a business user would care about. Be precise. Use the numbers from the provided statistics. If you are unsure about a trend, say so in the confidence field.""" message = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, system=system_prompt, messages=[ { "role": "user", "content": ( "Analyze this dataset and return a structured JSON analysis.\n\n" + data_summary ), } ], # --- STRUCTURED OUTPUT --- response_format={ "type": "json_schema", "json_schema": { "name": "data_analysis", "schema": ANALYSIS_SCHEMA, "strict": True } } ) # The response text is guaranteed valid JSON matching our schema result = json.loads(message.content[0].text) return result

How structured outputs work

  • You pass response_format with your JSON schema.
  • Claude constrains its generation so the output always matches the schema.
  • strict: True means Claude will not add extra keys.
  • You can safely call json.loads() without try/except for malformed JSON.

Step 5 — Generate the Markdown Report

Python
# report_generator.py from datetime import datetime def generate_report(analysis: dict, csv_name: str) -> str: """Turn the structured analysis into a Markdown report.""" lines = [] now = datetime.now().strftime("%Y-%m-%d %H:%M") # Header lines.append(f"# Data Analysis Report") lines.append(f"**Source:** {csv_name} ") lines.append(f"**Generated:** {now} ") lines.append("") # Dataset summary ds = analysis["dataset_summary"] lines.append("## Dataset Overview") lines.append(f"- **Rows:** {ds['row_count']}") lines.append(f"- **Columns:** {ds['column_count']}") lines.append("") lines.append("| Column | Type | Missing % | Unique |") lines.append("|--------|------|-----------|--------|") for col in ds["columns"]: lines.append( f"| {col['name']} | {col['dtype']} | {col['missing_pct']:.1f}% | {col['unique_values']} |" ) lines.append("") # Statistics lines.append("## Key Statistics") lines.append("| Column | Mean | Median | Std Dev | Min | Max |") lines.append("|--------|------|--------|---------|-----|-----|") for s in analysis["statistics"]: lines.append( f"| {s['column']} | {s['mean']:.2f} | {s['median']:.2f} " f"| {s['std_dev']:.2f} | {s['min']:.2f} | {s['max']:.2f} |" ) lines.append("") # Trends lines.append("## Identified Trends") for t in analysis["trends"]: cols = ", ".join(t["affected_columns"]) lines.append(f"- **{t['description']}** ") lines.append(f" Confidence: {t['confidence']} | Columns: {cols}") lines.append("") # Outliers lines.append("## Outliers Detected") if analysis["outliers"]: lines.append("| Column | Value | Row | Reason |") lines.append("|--------|-------|----|--------|") for o in analysis["outliers"]: lines.append(f"| {o['column']} | {o['value']} | {o['row_index']} | {o['reason']} |") else: lines.append("No significant outliers detected.") lines.append("") # Chart suggestions lines.append("## Recommended Charts") for i, c in enumerate(analysis["chart_suggestions"], 1): lines.append(f"### Chart {i}: {c['title']}") lines.append(f"- **Type:** {c['chart_type']}") lines.append(f"- **X-axis:** {c['x_axis']}") lines.append(f"- **Y-axis:** {c['y_axis']}") lines.append(f"- {c['description']}") lines.append("") # Key insights lines.append("## Key Insights") for insight in analysis["key_insights"]: lines.append(f"- {insight}") lines.append("") return "\n".join(lines)

Step 6 — Wire Everything Together

Python
# dashboard.py (part 3 — append to the same file) from report_generator import generate_report def main(): if len(sys.argv) < 2: print("Usage: python dashboard.py <path-to-csv>") sys.exit(1) csv_path = sys.argv[1] csv_name = os.path.basename(csv_path) # 1. Read df = read_csv(csv_path) # 2. Summarize summary = build_data_summary(df) # 3. Analyze print("Sending data to Claude for analysis...") analysis = analyze_with_claude(summary) # 4. Save JSON os.makedirs("output", exist_ok=True) json_path = "output/analysis.json" with open(json_path, "w") as f: json.dump(analysis, f, indent=2) print(f"Structured analysis saved to {json_path}") # 5. Generate report report = generate_report(analysis, csv_name) report_path = "output/report.md" with open(report_path, "w") as f: f.write(report) print(f"Markdown report saved to {report_path}") # 6. Print key insights to terminal print("\n=== KEY INSIGHTS ===") for insight in analysis["key_insights"]: print(f" • {insight}") if __name__ == "__main__": main()

Running the Project

Terminal
# Set your API key # Run with any CSV python dashboard.py sample_data.csv

Expected output

Loaded 1200 rows, 8 columns from sample_data.csv Sending data to Claude for analysis... Structured analysis saved to output/analysis.json Markdown report saved to output/report.md === KEY INSIGHTS === • Revenue grew 23% quarter-over-quarter driven by the Enterprise segment. • Customer churn spiked in March — investigate support ticket volume. • The "price" column has 3 outliers above $10,000 that may be data entry errors.

Step 7 — Adding a Sample CSV for Testing

Create a quick test file so you can try the pipeline immediately:

Python
# generate_sample.py np.random.seed(42) n = 500 data = { "date": pd.date_range("2024-01-01", periods=n, freq="D"), "revenue": np.random.normal(5000, 1200, n).round(2), "customers": np.random.poisson(150, n), "region": np.random.choice(["North", "South", "East", "West"], n), "product": np.random.choice(["Basic", "Pro", "Enterprise"], n, p=[0.5, 0.35, 0.15]), "satisfaction": np.clip(np.random.normal(4.0, 0.8, n), 1, 5).round(1), } df = pd.DataFrame(data) # Inject a few outliers df.loc[42, "revenue"] = 25000.00 df.loc[99, "revenue"] = -500.00 df.loc[200, "satisfaction"] = 1.0 df.to_csv("sample_data.csv", index=False) print(f"Generated sample_data.csv with {len(df)} rows")

Step 8 — Example JSON Output

Here is what output/analysis.json looks like (abbreviated):

JSON
{ "dataset_summary": { "row_count": 500, "column_count": 6, "columns": [ { "name": "date", "dtype": "datetime", "missing_pct": 0.0, "unique_values": 500 }, { "name": "revenue", "dtype": "float64", "missing_pct": 0.0, "unique_values": 498 } ] }, "statistics": [ { "column": "revenue", "mean": 5032.14, "median": 4985.50, "std_dev": 1245.32, "min": -500.0, "max": 25000.0 } ], "trends": [ { "description": "Revenue shows a slight upward trend over the year", "confidence": "medium", "affected_columns": ["revenue", "date"] } ], "outliers": [ { "column": "revenue", "value": 25000.0, "row_index": 42, "reason": "Value is 16+ standard deviations above the mean" }, { "column": "revenue", "value": -500.0, "row_index": 99, "reason": "Negative revenue likely indicates a data entry error" } ], "chart_suggestions": [ { "chart_type": "line", "title": "Revenue Over Time", "x_axis": "date", "y_axis": "revenue", "description": "A line chart showing daily revenue to visualize trends and seasonality." } ], "key_insights": [ "Average daily revenue is approximately $5,032 with moderate variability.", "Two significant outliers in the revenue column require investigation.", "Customer satisfaction averages 4.0/5.0 across all regions." ] }

How It All Connects

CSV File │ ▼ ┌──────────────────┐ │ read_csv() │ ← pandas reads the file │ build_summary() │ ← text summary for Claude └───────┬──────────┘ │ ▼ ┌──────────────────────────────────────┐ │ Claude API (structured output) │ │ - system prompt: "data analyst" │ │ - response_format: JSON schema │ │ - returns validated JSON │ └───────┬──────────────────────────────┘ │ ├──► output/analysis.json │ ▼ ┌──────────────────────┐ │ generate_report() │ ← turns JSON into Markdown └───────┬──────────────┘ │ ├──► output/report.md │ ▼ Terminal: key insights printed

Extending the Project

EnhancementDescription
Add chartsUse matplotlib or plotly to render the suggested charts
Multi-fileAccept a folder of CSVs and produce a combined report
StreamingUse streaming to show analysis in real time
VisionConvert a chart image to base64, send to Claude for description
Web UIWrap the pipeline in Flask or Streamlit for a browser dashboard

Common Mistakes to Avoid

  1. Sending too much data — Claude has a context window; summarize large CSVs instead of sending every row.
  2. Ignoring schema validation — Always use structured outputs so downstream code never breaks on unexpected JSON shapes.
  3. Hardcoding column names — Keep the pipeline generic so it works with any CSV.
  4. Skipping the system prompt — Without a clear role ("senior data analyst") the analysis will be shallow.

Recap

  • You built a full Python pipeline: CSV in, JSON analysis + Markdown report out.
  • Structured outputs guarantee the JSON matches your schema every time.
  • The report generator turns machine-readable data into human-readable documents.
  • This pattern (read → summarize → analyze → report) is reusable across domains.

You now have a production-ready foundation for AI-powered data analysis.