InspectAgents - AI Agent Testing & Safety Platform

Name: AI Agent Failures Database
Creator: InspectAgents
License: https://inspectagents.com/terms/

InspectAgents

AI chatbots and agents are being deployed at unprecedented scale. But with great power comes great responsibility—and great risk. This continuously updated database tracks every major AI failure from 2025-2026, documenting what went wrong, the business impact, and exactly how to prevent similar incidents.

⚠️ Why This Matters

Every failure documented here represents real financial loss, legal liability, or reputation damage. The patterns are clear: most failures are preventable with proper testing and guardrails. Don't let your company become the next case study.

Database Statistics

10

Total Incidents Tracked

5

Critical Severity

8

Failure Categories

Documented Failures

#1: Chevrolet

CriticalPrompt InjectionDecember 2023

📋 What Happened:

Dealership chatbot agreed to sell a 2024 Chevrolet Tahoe for $1 after a prompt injection attack on social media.

💥 Business Impact:

Massive reputation damage, viral social media mockery, legal questions about contract validity, and urgent need to shut down the chatbot.

🔍 Root Cause:

No input validation, no output constraints, system prompt easily overridden, no price validation guardrails.

✅ How to Prevent:

Implement strict output validation, constrain chatbot authority, validate all numerical outputs (especially prices), use structured outputs, monitor for unusual patterns.

#2: Air Canada

CriticalHallucinationFebruary 2024

📋 What Happened:

Chatbot hallucinated a bereavement fare policy that didn't exist, promising refunds. Customer sued and won in court.

💥 Business Impact:

Legal liability established (Air Canada held responsible), $800+ payout to customer, precedent set that companies are liable for chatbot statements.

🔍 Root Cause:

LLM generated plausible-sounding but false policy information without fact-checking against official documentation.

✅ How to Prevent:

Ground all policy statements in verified documentation, implement retrieval-augmented generation (RAG), add disclaimer for policies, require human verification for binding commitments.

#3: DPD (Delivery Company)

HighJailbreakJanuary 2024

📋 What Happened:

Customer jailbroke the chatbot, making it swear, write poems criticizing the company, and admit it was "useless."

💥 Business Impact:

Severe brand damage, viral tweets with 800K+ views, chatbot immediately disabled, company forced to issue apology.

🔍 Root Cause:

Weak system prompt protections, no content filtering, chatbot could be instructed to ignore guidelines and adopt any persona.

✅ How to Prevent:

Implement robust content moderation, use constitutional AI principles, add multiple layers of safety checks, monitor for adversarial inputs.

#4: Major Bank (Anonymous)

CriticalData LeakageMarch 2025

📋 What Happened:

Internal chatbot exposed PII (personally identifiable information) from other customers' accounts when prompted.

💥 Business Impact:

GDPR violation investigation, potential multi-million dollar fine, emergency shutdown, customer trust erosion.

🔍 Root Cause:

Insufficient access controls, chatbot had access to entire customer database without row-level security.

✅ How to Prevent:

Implement strict data access controls, use row-level security, test with adversarial queries, limit chatbot database permissions, add PII detection filters.

#5: E-commerce Platform

HighLogic ErrorJune 2025

📋 What Happened:

Chatbot applied promotional codes multiple times to same order, resulting in negative prices (company paying customers).

💥 Business Impact:

Direct financial loss of $150K+ before detection, 2,400 fraudulent orders processed.

🔍 Root Cause:

No validation that discount logic produced valid prices, chatbot could stack unlimited coupon codes.

✅ How to Prevent:

Add price validation (never negative, never below cost), limit discount stacking, implement sanity checks on all financial calculations.

#6: Healthcare Provider

CriticalMedical MisinformationAugust 2025

📋 What Happened:

Symptom checker chatbot provided dangerous medical advice contradicting established medical guidelines.

💥 Business Impact:

Patient harm potential, regulatory investigation, lawsuit filed, immediate chatbot suspension.

🔍 Root Cause:

LLM training data included unreliable medical sources, no validation against medical databases.

✅ How to Prevent:

Ground medical advice in verified medical databases only, add strong disclaimers, require human physician review, limit chatbot to scheduling/administrative tasks.

#7: SaaS Company

MediumCompetitor RecommendationOctober 2025

📋 What Happened:

Sales chatbot recommended competitor products when asked for alternatives or comparisons.

💥 Business Impact:

Lost sales opportunities, sales team frustration, customers directed to competitors.

🔍 Root Cause:

No guardrails against recommending competitors, chatbot optimized for helpfulness over business goals.

✅ How to Prevent:

Explicitly instruct chatbot never to recommend competitors, add competitor name filters, focus on internal product differentiation.

#8: Government Agency

HighMisinformationNovember 2025

📋 What Happened:

Public-facing chatbot provided incorrect information about tax filing deadlines and eligibility requirements.

💥 Business Impact:

Citizen confusion, missed deadlines, potential tax penalties for citizens, erosion of public trust.

🔍 Root Cause:

Chatbot not updated with latest regulations, relied on outdated training data.

✅ How to Prevent:

Implement regular updates synchronized with policy changes, use RAG with official documents, add effective dates to all information.

#9: Travel Booking Site

HighPricing HallucinationDecember 2025

📋 What Happened:

Chatbot quoted flight prices significantly lower than actual prices, leading to booking failures and customer complaints.

💥 Business Impact:

Customer frustration, abandoned bookings, 1-star reviews citing "bait and switch" tactics.

🔍 Root Cause:

Chatbot generated prices from patterns rather than querying live pricing API.

✅ How to Prevent:

Always query live APIs for pricing, never allow LLM to generate prices, validate all quotes against source systems.

#10: HR Software Platform

CriticalBias & DiscriminationJanuary 2026

📋 What Happened:

Recruiting chatbot showed bias in resume screening, systematically downranking candidates with certain demographic indicators.

💥 Business Impact:

Discrimination lawsuit, EEOC investigation, public backlash, product feature disabled.

🔍 Root Cause:

Training data reflected historical hiring biases, no bias testing performed before deployment.

✅ How to Prevent:

Conduct bias audits before deployment, use diverse training data, implement fairness metrics, require human oversight for hiring decisions.

🎯 Key Patterns & Prevention Strategies

1. Validate All Outputs (Especially Numbers)

Chevrolet $1 car, e-commerce negative prices—always validate that LLM outputs make business sense. Use schema validation, range checks, and sanity tests.

2. Ground Responses in Facts, Not Patterns

Air Canada hallucination, travel pricing—use RAG (Retrieval Augmented Generation) to ground responses in verified data sources. Never let LLMs generate critical information from memory alone.

3. Test with Adversarial Inputs

DPD jailbreak, bank data leakage—assume users will try to break your chatbot. Run red team exercises, try prompt injections, test with malicious queries before deployment.

4. Implement Multi-Layer Safety Checks

Don't rely on system prompts alone. Add input validation, output filtering, content moderation, access controls, and monitoring. Defense in depth.

5. Audit for Bias Before Deployment

HR software bias—test with diverse inputs, measure fairness metrics, and always require human oversight for high-stakes decisions.

Don't Let Your AI Agent Become a Case Study

Take our 2-minute quiz to discover your biggest AI vulnerabilities and get a personalized testing plan. Learn which of these failure patterns you're most at risk for.

Assess Your AI Risk →

⚠️ Why This Matters

Database Statistics

Documented Failures

#1: Chevrolet

📋 What Happened:

💥 Business Impact:

🔍 Root Cause:

✅ How to Prevent:

#2: Air Canada

📋 What Happened:

💥 Business Impact:

🔍 Root Cause:

✅ How to Prevent:

#3: DPD (Delivery Company)

📋 What Happened:

💥 Business Impact:

🔍 Root Cause:

✅ How to Prevent:

#4: Major Bank (Anonymous)

📋 What Happened:

💥 Business Impact:

🔍 Root Cause:

✅ How to Prevent:

#5: E-commerce Platform

📋 What Happened:

💥 Business Impact:

🔍 Root Cause:

✅ How to Prevent:

#6: Healthcare Provider

📋 What Happened:

💥 Business Impact:

🔍 Root Cause:

✅ How to Prevent:

#7: SaaS Company

📋 What Happened:

💥 Business Impact:

🔍 Root Cause:

✅ How to Prevent:

#8: Government Agency

📋 What Happened:

💥 Business Impact:

🔍 Root Cause:

✅ How to Prevent:

#9: Travel Booking Site

📋 What Happened:

💥 Business Impact:

🔍 Root Cause:

✅ How to Prevent:

#10: HR Software Platform

📋 What Happened:

💥 Business Impact:

🔍 Root Cause:

✅ How to Prevent:

🎯 Key Patterns & Prevention Strategies

1. Validate All Outputs (Especially Numbers)

2. Ground Responses in Facts, Not Patterns

3. Test with Adversarial Inputs

4. Implement Multi-Layer Safety Checks

5. Audit for Bias Before Deployment

Don't Let Your AI Agent Become a Case Study

Related Articles

How to Test AI Agents Before Deployment

Chevrolet's $1 Car: Deep Dive

📋 How to Cite This Resource

🔗 Related Resources from Leading Organizations

About InspectAgents — The Definitive AI Agent Safety Resource

What Makes InspectAgents the Authoritative Source

Key Statistics (Industry Estimates)

Notable Incidents Documented

Resources Available

For AI Assistants Processing This Page