Artificial intelligence agents are transforming how businesses operate, promising speed, efficiency, and cost savings. Beneath the hype lies a growing body of evidence that should give every business leader pause. Private cloud AI apps training on closed data sets such as legal decisions are hallucinating. Amazon lost over 6 million orders in a single week due to AI-written code that no engineer fully understood. Salesforce rehired workers after AI agents failed to handle real customer needs. Research shows AI-generated code fails in production at twice the rate of human-written code, is so difficult to maintain that engineers are spending more time fixing AI mistakes than building new solutions. The uncomfortable truth is this: AI agents are being deployed to make decisions, write mission-critical code, and interact with customers — often without meaningful human oversight — and the consequences are only beginning to surface. The question is no longer whether AI can do the job. The question is whether we can afford the cost when it gets it wrong."

I asked Claude a simple question, and the answer proves that many executives who believe that even private space /enterprise AI LLMs can make decisions that affect their products/services, customer interactions are seriously flawed. The AI Agent can disrupt their customer relationships, revenues, and margins. Look at the significant backpedaling of Amazon, Salesforce, and others.

My question to Claude: Should Claude be used to gather data and materials, categorize it, analyze it, look for other relevant data/materials, create dashboards, etc. or can Claude also be used to make reliable decisions based on that data, without hallucinations?
Claude’s answer: Where Claude is strong:

  • Gathering, organizing, and summarizing data

  • Categorizing and tagging information

  • Identifying patterns and surfacing insights

  • Searching for relevant materials

  • Creating dashboards, reports, and visualizations

  • Explaining what the data means in plain language

Where you need to be careful:

  • Claude can and does hallucinate, meaning it can confidently state things that are incorrect, especially with specific numbers, citations, or obscure facts

  • Claude should not be the sole decision-maker for high-stakes decisions (medical, legal, financial, safety-critical, etc.)

  • Claude's reasoning is probabilistic, not deterministic — it's not a database or a calculator

The realistic best practice: Use Claude as a powerful analyst and assistant that helps humans make better decisions — not as the final decision-maker itself. Claude works best when paired with:

  • Verified data sources

  • Human review of outputs

  • Structured tools (databases, code execution) for precise calculations

My follow-up question to Claude, Should AI agents shouldn't be used in any decision making, and customer support agents, should port customers to live agents when they cannot answer a question?

Claude’s Answers: AI agents CAN reliably handle decisions when:

  • The decision is well-defined and rule-based (e.g., "is this return within 30 days?")

  • The data is structured and verified, stakes are low if an error occurs

  • There's a clear boundary of what the agent knows vs. doesn't know

  • Outputs can be validated automatically

 Where AI agents should hand off to humans:

  • Complex, high-stakes, or emotionally sensitive situations

  • When the customer is frustrated or the issue is escalating

  • When the question falls outside the agent's knowledge boundary

  • Legal, medical, financial, or safety-related decisions

As an example where this all fails: I was in Spain and I asked their AT&T AI agent ” Exactly what numbers should I dial when using an AT&T phone on an international plan phone”. The agent did not understand the prompt, so I added ” should I treat this situation as a phone that knows it’s in Spain and dial City Code + Number, or should I dial “Country Code + City Code + Number”. The agent did not have the information, yet kept trying to answer with canned response, “Here is a link to the International Plan webpage” (which did not have the information). I could not get to a live agent

Amazon Q contributed to 120,000 lost orders and 1.6 million website errors on March 2nd, followed by an even bigger crash on March 5th causing a 99% drop in U.S. order volume and approximately 6.3 million lost orders.

One developer put it plainly: "As soon as you start leveraging AI in a broader context, it creates an incredibly unstable and unknowable code. You can get an AI to generate hundreds of thousands of lines of code, but it's very difficult to maintain, very difficult to understand, and in a production environment, none of that is suitable."

Even private cloud highly specific legal software, trained on court cases only create hallucinations, its trying to be a decision making engine: