Beyond the Hype: The Inconvenient Truths About AI Agent and AI Code Deployment
Artificial intelligence agents are transforming how businesses operate, promising speed, efficiency, and cost savings. Beneath the hype lies a growing body of evidence that should give every business leader pause. Private cloud AI apps training on closed data sets such as legal decisions are hallucinating. Amazon lost over 6 million orders in a single week due to AI-written code that no engineer fully understood. Salesforce rehired workers after AI agents failed to handle real customer needs. Research shows AI-generated code fails in production at twice the rate of human-written code, is so difficult to maintain that engineers are spending more time fixing AI mistakes than building new solutions. The uncomfortable truth is this: AI agents are being deployed to make decisions, write mission-critical code, and interact with customers — often without meaningful human oversight — and the consequences are only beginning to surface. The question is no longer whether AI can do the job. The question is whether we can afford the cost when it gets it wrong."
I asked Claude a simple question, and the answer proves that many executives who believe that even private space /enterprise AI LLMs can make decisions that affect their products/services, customer interactions are seriously flawed. The AI Agent can disrupt their customer relationships, revenues, and margins. Look at the significant backpedaling of Amazon, Salesforce, and others.
My question to Claude: Should Claude be used to gather data and materials, categorize it, analyze it, look for other relevant data/materials, create dashboards, etc. or can Claude also be used to make reliable decisions based on that data, without hallucinations?
Claude’s answer: Where Claude is strong:
Gathering, organizing, and summarizing data
Categorizing and tagging information
Identifying patterns and surfacing insights
Searching for relevant materials
Creating dashboards, reports, and visualizations
Explaining what the data means in plain language
Where you need to be careful:
Claude can and does hallucinate, meaning it can confidently state things that are incorrect, especially with specific numbers, citations, or obscure facts
Claude should not be the sole decision-maker for high-stakes decisions (medical, legal, financial, safety-critical, etc.)
Claude's reasoning is probabilistic, not deterministic — it's not a database or a calculator
The realistic best practice: Use Claude as a powerful analyst and assistant that helps humans make better decisions — not as the final decision-maker itself. Claude works best when paired with:
Verified data sources
Human review of outputs
Structured tools (databases, code execution) for precise calculations
My follow-up question to Claude, Should AI agents shouldn't be used in any decision making, and customer support agents, should port customers to live agents when they cannot answer a question?
Claude’s Answers: AI agents CAN reliably handle decisions when:
The decision is well-defined and rule-based (e.g., "is this return within 30 days?")
The data is structured and verified, stakes are low if an error occurs
There's a clear boundary of what the agent knows vs. doesn't know
Outputs can be validated automatically
Where AI agents should hand off to humans:
Complex, high-stakes, or emotionally sensitive situations
When the customer is frustrated or the issue is escalating
When the question falls outside the agent's knowledge boundary
Legal, medical, financial, or safety-related decisions
As an example where this all fails: I was in Spain and I asked their AT&T AI agent ” Exactly what numbers should I dial when using an AT&T phone on an international plan phone”. The agent did not understand the prompt, so I added ” should I treat this situation as a phone that knows it’s in Spain and dial City Code + Number, or should I dial “Country Code + City Code + Number”. The agent did not have the information, yet kept trying to answer with canned response, “Here is a link to the International Plan webpage” (which did not have the information). I could not get to a live agent
On the rehiring side, roughly a third of companies that conducted AI layoffs had rehired 25% to 50% of the roles they cut.
Another 35.6% rehired more than half of those fired because of AI, including 1 in 3 employers who spent more on restaffing than saved from layoffs.
https://fortune.com/2026/03/12/amazon-retail-site-outages-ai-agent-inaccurate-advice
Amazon Q contributed to 120,000 lost orders and 1.6 million website errors on March 2nd, followed by an even bigger crash on March 5th causing a 99% drop in U.S. order volume and approximately 6.3 million lost orders.
One developer put it plainly: "As soon as you start leveraging AI in a broader context, it creates an incredibly unstable and unknowable code. You can get an AI to generate hundreds of thousands of lines of code, but it's very difficult to maintain, very difficult to understand, and in a production environment, none of that is suitable."
Even private cloud highly specific legal software, trained on court cases only create hallucinations, its trying to be a decision making engine: