Data & Bias

AI learns from human data — and humans aren't always fair.

Select your year level:

Where does bias come from?

Bias in AI rarely comes from malicious intent. It usually enters through the data. There are three main sources:

Historical bias

Training data reflects past inequity. If historical data shows women in fewer leadership roles, an AI trained on it will encode that inequality — even if society has changed.

Representation bias

Some groups are underrepresented in training data. A face recognition system trained mostly on one demographic will perform worse for others — sometimes with dangerous consequences.

Measurement bias

The thing being measured is itself flawed. If we use arrest records as a proxy for crime, we encode policing patterns, not actual crime rates — and the AI amplifies existing inequities.

Why does it matter?

AI systems are being deployed in high-stakes decisions that affect people's lives. When those systems carry bias, the consequences are real.

💼 Hiring

Amazon scrapped an AI hiring tool in 2018 after discovering it systematically downgraded women's CVs.

🏥 Healthcare

Pulse oximeters and medical AI have been shown to perform worse for patients with darker skin tones.

⚖️ Criminal justice

AI risk scores used in US courts to guide sentencing have been found to be racially biased.

🏦 Finance

AI-powered lending systems have been found to charge higher interest rates to minority applicants.

💡 Scale matters: A small bias in a model deployed to millions of people can cause harm at an enormous scale — far beyond what any individual human decision-maker could.

🔬 Bias Audit Tool

Simulate how training data balance affects AI fairness across two groups.

⚙️ Adjust Training Data

Drag the slider to change how much training data comes from each group. See how it affects the AI's accuracy for each group.

Group A: 70% of training dataGroup B: 30% of training data

← More B dataMore A data →

Group AGroup B

Fairness Score

Moderate bias

There's a noticeable gap. One group is getting worse results.

💡 Try setting Group A to 95% — then watch what happens to Group B's accuracy. This simulates what happens when AI training data is dominated by one group.

⚖️ Did you know? In 2016, an AI called COMPAS was used in US courts to predict whether defendants were likely to reoffend. ProPublica found it was twice as likely to falsely flag Black defendants as high risk compared to white defendants. The company disputed the findings — but the case sparked a global conversation about fairness in algorithmic decision-making.

What you've learned

✓AI learns patterns from data — if the data is biased, the AI will be too
✓Bias in AI can cause real harm to real people — in hiring, healthcare, and justice
✓Diverse, representative training data is one of the most important steps to fairer AI
✓Even well-intentioned AI systems can produce unfair outcomes — critical evaluation is essential

← M5: PredictionModule 6 of 6