HomeFoundationsAI Safety & Responsible Use

beginner10 min read· Module 1, Lesson 9

🛡️AI Safety & Responsible Use

Hallucinations, limitations, ethics, and how to use AI responsibly

AI Safety & Responsible Use

AI is an incredibly powerful tool — but like any powerful tool, it needs to be used responsibly. In this lesson, you will learn about the limitations, risks, and ethical considerations of using AI, and how to be a responsible AI user.

Understanding Hallucinations

We introduced hallucinations in the previous lesson. Now let us dive deeper into what they are, why they matter, and how to protect yourself from them.

What Are Hallucinations?

A hallucination is when an AI generates information that is factually incorrect, fabricated, or misleading — but presents it with the same confidence as accurate information.

Real-World Examples of Hallucinations

Type	Example
Fake citations	"According to Smith et al. (2019) in the Journal of AI Research..." — a paper that does not exist
Wrong facts	"The Eiffel Tower was built in 1910" — it was actually completed in 1889
Invented code	Referencing API methods or libraries that do not exist
Fake URLs	Generating links like `https://docs.example.com/guide` that lead nowhere
Confident errors	"Python was created by James Gosling" — that is Java's creator; Python was created by Guido van Rossum

Why Hallucinations Happen

Statistical pattern completion — LLMs generate the most probable next token, not the most factually correct one
Training data conflicts — The model may have seen conflicting information and blends them
Gaps in training data — When the model has limited information on a topic, it fills gaps with plausible-sounding text
Prompt ambiguity — Vague prompts give the model more room to generate inaccurate content

How to Detect and Prevent Hallucinations

Always follow the VERIFY framework:

Step	Action	How
V — Validate	Check critical facts	Cross-reference with authoritative sources
E — Examine	Look for specifics	Be suspicious of very specific claims (dates, numbers, names)
R — Research	Check sources	If the AI cites a paper, Google it to confirm it exists
I — Inspect	Review code	Test any generated code; do not assume it works
F — Flag	Note uncertainty	If something feels "too perfect," it might be fabricated
Y — Yourself	Use judgment	You are the expert on your domain — trust your instincts

Fact-Checking AI Outputs

Being a responsible AI user means systematically verifying AI-generated content. Here is a practical approach.

The Three-Layer Verification Method

Layer 1: Quick Sanity Check

Does this pass the "smell test"?
Are dates and numbers in the right ballpark?
Do the claims align with what you already know?

Layer 2: Source Verification

Can you find the cited sources independently?
Do the URLs lead to real pages?
Are the referenced people, organizations, and events real?

Layer 3: Expert Review

For high-stakes content (medical, legal, financial), have a domain expert review
For code, run tests and security reviews
For data, validate against known datasets

What Requires Extra Scrutiny

High Risk (Always Verify)	Medium Risk (Spot Check)	Lower Risk (General Review)
Medical information	Technical tutorials	Creative writing
Legal advice	Historical facts	Brainstorming ideas
Financial guidance	Code snippets	Explanations of concepts
Scientific claims	Statistics	Summarization
People's biographies	Product comparisons	Translation

Things Claude Refuses to Do

Claude, like all responsible AI systems, has built-in safety measures. There are categories of requests that Claude will decline.

Categories of Refused Requests

Harmful content creation
- Weapons manufacturing instructions
- Malware or hacking tools
- Content designed to harass or threaten individuals
Illegal activities
- Facilitating fraud or scams
- Creating counterfeit documents
- Helping circumvent the law in harmful ways
Dangerous misinformation
- Creating convincing fake news
- Impersonating real people
- Generating deceptive content at scale
Privacy violations
- Doxxing or revealing private information
- Helping with unauthorized surveillance
- Extracting personal data from individuals
Child safety
- Any content that exploits or harms minors
- This is an absolute, non-negotiable boundary

Why Refusals Are a Feature, Not a Bug

These safety boundaries exist because:

AI at scale can amplify harm significantly
The potential for misuse is enormous
Responsible AI companies build these safeguards deliberately
Users benefit from knowing the tool will not assist with dangerous activities

If Claude refuses your request, consider whether a reframing could express your legitimate need more clearly. Often, refusals happen because a request sounds harmful even if the intent is benign.

Understanding Bias in AI

AI models can reflect and amplify biases present in their training data. Being aware of this is critical for responsible use.

Types of AI Bias

Bias Type	Description	Example
Representation bias	Some groups are underrepresented in training data	Model performs better for English than other languages
Stereotyping bias	Training data contains societal stereotypes	Associating certain professions with specific genders
Confirmation bias	Model reinforces existing beliefs	Giving answers that align with common assumptions
Cultural bias	Western-centric training data dominates	Defaulting to US-centric examples and norms
Temporal bias	Information reflects a specific time period	Not reflecting recent social or cultural changes

How to Mitigate Bias

Be aware — Know that bias exists in all AI systems
Specify context — Provide diverse perspectives in your prompts
Challenge outputs — Ask "Is this response biased?" when reviewing results
Diversify sources — Do not rely solely on AI for information
Report issues — Flag biased outputs to help improve systems

Privacy and Data Considerations

Using AI responsibly includes being careful about the data you share with it.

What You Should Never Share with AI

Data Type	Risk Level	Why
Passwords and API keys	Critical	Could be logged or exposed
Personal health records	Critical	HIPAA and privacy regulations
Social security numbers	Critical	Identity theft risk
Credit card numbers	Critical	Financial fraud risk
Private keys and secrets	Critical	Security compromise
Confidential business data	High	Competitive and legal risk
Personal conversations	Medium	Privacy concerns
Customer PII	High	Regulatory compliance

Best Practices for Data Privacy

Anonymize data before sharing with AI — replace names, emails, and identifiers with placeholders
Use placeholders for sensitive values: Instead of: "My API key is sk-abc123xyz" Use: "My API key is [YOUR_API_KEY]"
Check your organization's AI policy — many companies have specific rules about AI usage
Use appropriate data handling — understand where your data goes and how it is stored
Review outputs before sharing — ensure AI did not inadvertently include sensitive information

When NOT to Use AI

AI is powerful but not appropriate for every situation. Here are scenarios where you should be cautious or avoid AI entirely.

Do Not Rely on AI For:

Critical Decisions Without Human Review

Medical diagnosis or treatment plans
Legal advice or case strategy
Financial investment decisions
Safety-critical engineering

Situations Requiring Real-Time Accuracy

Emergency response coordination
Current breaking news verification
Real-time market data analysis
Live system debugging in production

Tasks Requiring Accountability

Signing legal documents
Making hiring or firing decisions
Writing official regulatory filings
Certifying compliance

Sensitive Personal Situations

Crisis counseling (use real hotlines)
Child welfare decisions
Mental health diagnosis
Relationship advice for serious situations

The Human-in-the-Loop Principle

The gold standard for responsible AI use is Human-in-the-Loop (HITL):

AI generates — Let the AI create a first draft or analysis
Human reviews — A qualified person reviews the output carefully
Human decides — The final decision is always made by a human
Human is accountable — Responsibility stays with the person, not the AI

Constitutional AI: How Claude Is Aligned

Claude uses a unique approach to safety called Constitutional AI (CAI), developed by Anthropic.

What Is Constitutional AI?

Rather than just using human labelers to rate responses, Constitutional AI gives the model a set of principles (a "constitution") that it follows.

How It Works

Step	Process	Description
1	Initial Training	The model is trained on text data normally
2	Red Teaming	Researchers try to make the model produce harmful outputs
3	Self-Critique	The model is trained to evaluate its own responses against principles
4	Revision	The model learns to revise harmful responses into safe ones
5	RLHF	Reinforcement Learning from Human Feedback further refines behavior

Claude's Core Principles

Claude is designed to be:

Helpful — Genuinely assist users with their tasks
Harmless — Avoid generating dangerous or harmful content
Honest — Be truthful and transparent about limitations

What This Means for You

Claude will tell you when it is uncertain rather than making something up
Claude will refuse harmful requests rather than complying
Claude aims to be transparent about what it can and cannot do
Claude tries to present balanced views rather than one-sided perspectives

Best Practices Checklist

Here is your comprehensive checklist for responsible AI use. Refer to this regularly.

Before Using AI

Identify whether AI is appropriate for this task
Determine the risk level (low, medium, high, critical)
Remove sensitive data from prompts
Set clear expectations for what you need

While Using AI

Be specific in your prompts to reduce hallucination risk
Ask the AI to cite sources or explain its reasoning
Request multiple perspectives when dealing with opinions
Note any claims that seem too specific or too perfect

After Getting AI Output

Fact-check critical claims against authoritative sources
Test any generated code before deploying
Have domain experts review high-stakes content
Verify URLs, citations, and references independently
Check for bias or one-sided perspectives

Ongoing Practices

Stay updated on AI capabilities and limitations
Share responsible AI practices with your team
Report bugs, biases, or safety issues when you find them
Maintain healthy skepticism — trust but verify
Remember that AI is a tool, not an authority

Common Myths vs. Reality

Myth	Reality
"AI is always right"	AI frequently makes mistakes — always verify important information
"AI understands me"	AI processes patterns in text, it does not truly understand meaning
"AI has opinions"	AI generates responses based on patterns, not personal beliefs
"AI is objective"	AI reflects biases in its training data
"AI can replace experts"	AI is a tool that augments experts, not a replacement
"AI remembers our conversations"	Each API call is independent unless you explicitly send history
"More expensive model = always better"	The right model depends on the task; bigger is not always better
"AI-generated content is not copyrightable"	This is a complex, evolving legal area that varies by jurisdiction

Ethical Decision Framework

When faced with an ethical question about AI use, apply this framework:

The ETHICS Test

Letter	Question	If "No"...
E — Explain	Can you explain your AI use openly?	Reconsider the approach
T — Trust	Would people trust you if they knew?	Change the approach
H — Harm	Could this harm anyone?	Add safeguards or stop
I — Integrity	Does this maintain your professional integrity?	Do not proceed
C — Compliance	Does this comply with laws and policies?	Do not proceed
S — Sustainable	Is this a practice you would recommend to others?	Reconsider

Key Takeaways

Hallucinations are inevitable — Always verify important AI-generated content
Use the VERIFY framework — Systematically check facts, sources, and claims
Respect safety boundaries — Claude's refusals protect you and others
Bias exists in all AI — Be aware and actively mitigate it
Protect privacy — Never share sensitive data with AI systems
Know when NOT to use AI — Some situations require human expertise exclusively
Constitutional AI matters — Understanding how Claude is aligned helps you use it effectively
Human-in-the-Loop — Always keep a qualified human in the decision chain
Stay ethical — Use the ETHICS test for difficult decisions
AI is a tool — A powerful one, but still a tool that requires responsible human guidance

The most effective AI users are not the ones who trust AI blindly — they are the ones who know exactly when to trust it, when to verify it, and when to set it aside entirely.

Module 1

9/9

💰 How AI Pricing Works

How the Internet Works (HTTP Basics) 🌐

9/9