What AI Still Can't Do in Financial Services

2026-03-09 22:33:41

Banks poured $30 billion into AI in 2025. Yet 70% of AI implementations in banking show no measurable ROI, and only 26% of customers trust AI with their financial data. As someone who works closely with financial institutions deploying these systems, I want to cut through the noise and share what I consistently see holding AI back in practice.

The Integration Wall

Every payment processor, lender, and bank runs on different data formats, proprietary APIs, and incompatible legacy systems. AI agents need unified data streams to function, but financial infrastructure is deliberately fragmented for security and competitive reasons.

In practice, this means teams end up spending more time building translation layers between systems than actually training models. Open banking initiatives are gaining ground, but frameworks like the Model Context Protocol remain emerging technologies rather than established infrastructure. The technical challenge compounds an organizational one: deploying AI isn’t just installing software. It means rewiring how decisions get made, how data flows, and how staff interact with systems. Federal Reserve research shows productivity often drops initially as teams struggle to adapt.

Bias Doesn’t Disappear; It Scales

AI doesn’t create bias. It scales existing bias with mathematical precision. When a model trains on decades of credit decisions that reflect historical redlining and discriminatory lending practices, it doesn’t detect the discrimination. It identifies patterns that optimise for those historical outcomes.

The Hello Digit case makes this concrete. Their automated savings algorithm repeatedly caused the overdrafts it was designed to prevent, resulting in a $2.7 million CFPB penalty. Freddie Mac and Fannie Mae’s mortgage algorithms reproduced racial discrimination patterns without engineers ever explicitly programming them. Even when protected characteristics are removed from models, proxy variables (ZIP codes, employment history, spending patterns) quietly carry the same signals.

Courts have confirmed that deploying a model that produces discriminatory outcomes is legally equivalent to implementing a discriminatory policy, regardless of intent.

Hallucinations Are Not Acceptable Quirks

In consumer applications, AI hallucinations are a nuisance. In financial services, they are compliance violations. Large language models are designed to generate plausible-sounding text, not necessarily accurate text. They present fabricated loan calculations and invented regulatory requirements with the same confidence they display when correct.

NatWest specifically partnered with IBM to build their AI assistant Cora+ with safeguards against this for exactly this reason. Wells Fargo’s 2018 mortgage modification error illustrates the stakes clearly: a single calculation mistake didn’t affect one file. It caused more than 500 people to lose their homes and denied hundreds more the loan modifications they qualified for. When AI makes a calculation error at scale, every downstream process that depends on that output is compromised.

Regulatory Compliance and AI’s Experimental Nature are in Direct Tension

The EU AI Act requires institutions to explain exactly how AI reached every decision, prove the model doesn’t discriminate, and demonstrate continuous monitoring for drift. DORA, which took effect in early 2025, requires real-time incident detection and full traceability across infrastructure.

The practical problem: certifying a system that learns and adapts continuously is inherently difficult. Financial institutions end up freezing model versions for regulatory review, which means cutting-edge AI capability will always outpace compliant AI deployment. Only 9% of UK bank executives feel prepared for upcoming AI regulations.

Operating across jurisdictions compounds this further. What satisfies the EU may not satisfy UK or US requirements. There is no global standard.

Where AI Actually Delivers Today

The gap between hype and production reality is worth naming directly. AI is working well in fraud detection (around 95% accuracy with acceptable false positive rates), transaction monitoring, document processing, and basic customer-facing queries. These work because they are narrow, well-defined problems with clear success metrics and tolerable error rates.

What remains stuck in pilot purgatory: autonomous credit decisioning, unsupervised lending decisions, fully automated compliance monitoring. These require levels of reliability and explainability that current AI cannot consistently deliver at scale under regulatory scrutiny.

Grasshopper Bank’s CTO puts it plainly: AI might assist in portfolio monitoring, but the final credit approval must always be human. Klarna’s story reinforces the point from a different angle. After publicly claiming their chatbot replaced 700 employees, CEO Sebastian Siemiatkowski reversed course, lifted an 18-month hiring freeze, and acknowledged that customers need the option of human contact, especially under financial stress.

Strategic Decisions Remain Entirely Human

No AI trained on historical data would have predicted that people wanted to split restaurant bills through an app, invest spare change automatically, or rent homes to strangers. None of these behaviors existed in any training set.

The 2025 growth in prediction markets, stablecoin frameworks like the GENIUS Act, and the embedded finance shift were all conceived by people who recognised changing customer behavior and regulatory opportunity before any data made it visible. AI excels at scaling these ideas once they exist. It does not generate them.

McKinsey estimates AI could add $200-340 billion annually to global banking. That value comes from AI executing and scaling strategies that humans designed, not from AI inventing those strategies independently.

The Practical Takeaway for Financial Leaders

The institutions winning with AI are not those with the most advanced algorithms. They’re the ones that have been clear-eyed about what AI can and cannot do, built governance frameworks before deployment, and designed workflows where humans retain decision authority for high-stakes outcomes.

The question worth asking is not “what can AI do?” It’s “what should we build, and how can AI help us build and deliver it faster and at greater scale than we could otherwise?” The first question treats AI as the objective. The second treats it as a tool in the service of a strategy. That distinction makes all the difference.

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.