Why are enterprise AI costs rising despite falling token prices?

Token prices have fallen 60-80% in a year, but adoption-driven volume growth outpaces the price declines. Uber burned its entire 2026 AI budget in four months as Claude Code adoption hit 84% of engineers. 85% of companies miss AI cost forecasts by more than 10%.

What should CDOs do about the AI cost and governance challenge?

Three priorities: budget for usage volume not unit cost (implement chargebacks and token efficiency), govern agents not just models (designate an agentic ops lead and build agent registries), and measure causation not correlation (set up holdback groups to prove AI ROI).

The $2 Trillion Bet: The AI Economy Just Flipped. Here's What CDOs Should Do About It.

Q: What is the AI governance gap in enterprises?

80% of enterprise applications shipped in Q1 2026 embed at least one AI agent, but only one in five companies has a mature governance model for autonomous agents. 76% of organizations say their AI governance cannot keep pace with employee AI use.

Big Tech will spend $720 billion on AI infrastructure in 2026. Token prices fell 60-80%. But enterprise AI bills are exploding, not shrinking. The paradox is real, and CDOs who solve the cost-governance equation now are the ones who keep their budget.

Nidhi VichareMay 28, 2026

14 min read

The InferenceAI StrategyEnterprise AICDOLeadershipAI InfrastructureToken EconomicsAI GovernanceData StrategyAI Agents

Available for advisory and speaking

✉Get insights delivered

Cheaper AI is costing enterprises more. That sounds wrong. It is not. Token prices fell 60-80% this year. Then Uber burned its entire 2026 AI budget in four months. The paradox is structural, and the data leaders who understand it will be the ones who keep their budgets.

TL;DR

The AI infrastructure buildout is the largest capital allocation in technology history. The winners will not be the companies that spend the most. They will be the ones that govern the spend.

Big Tech will pour $720 billion into AI infrastructure in 2026 alone. Token prices have fallen 60-80% in a single year. But enterprise AI bills are exploding, not shrinking. Uber burned its entire 2026 AI budget in four months. 85% of companies miss their AI cost forecasts by more than 10%. The paradox is real: cheaper AI is costing enterprises more, because volume growth is outpacing price declines. Meanwhile, 76% of organizations admit their governance cannot keep pace with AI adoption. CDOs who solve the cost-governance equation now are the ones who keep their budget when the CFO comes asking.

$720B

BIG TECH AI CAPEX, 2026

60-80%

DROP IN PER-TOKEN COSTS

76%

GOVERNANCE CAN'T KEEP PACE

The Infrastructure Arms Race

Four companies are making the largest concentrated capital bet in technology history. Amazon, Microsoft, Google, and Meta will collectively spend between $690 billion and $720 billion on AI infrastructure in 2026, up roughly 90% from 2025. Wall Street projects the combined figure crosses $1 trillion by 2027.

The $720 Billion Infrastructure Sprint: Big Tech AI Capex 2025 vs 2026

Where is the money going? GPU clusters, custom silicon, and data centers. NVIDIA's data center segment generated $197.3 billion in FY2026 revenue alone. The company has booked 800,000 to 850,000 wafers of TSMC's advanced packaging capacity, consuming over half of total global output and starving competitors of supply. Microsoft attributed $25 billion of its record 2026 budget specifically to rising memory chip and component costs. This is not software spend. This is physical infrastructure at a scale the industry has never attempted. (I break down the full economics, from GPU pricing to nuclear power deals, in the AI Infrastructure chapter on ai.nidhivichare.com.)

The data platforms are making their own bets. Snowflake acquired Observe for ~$1 billion, is acquiring Natoma (an enterprise MCP platform for AI agents), and deepened its $200 million OpenAI partnership for native model access inside Cortex AI. Databricks acquired Neon and launched Lakebase, embedding serverless Postgres directly into the Lakehouse. A telling detail: over 80% of Neon database instances were created automatically by AI agents, not humans.

The platforms are not adding AI features to their data products. They are rebuilding their data products around AI agents. If your data strategy still treats AI as a workload that sits on top of the platform, you are already behind the vendors.

And the connective tissue between all of it is MCP. The Model Context Protocol hit 97 million monthly downloads in just 16 months. Over 9,600 MCP servers are registered, with 41% of surveyed software organizations now running MCP in limited or broad production. Anthropic donated MCP to the Linux Foundation's new Agentic AI Foundation in December 2025, with OpenAI, AWS, Google, Microsoft, and GitHub as supporting members. MCP is now on the same governance path as Kubernetes. That is not a niche protocol. That is infrastructure.

The Cost Paradox

The Cost Paradox: per-token cost is falling, total enterprise spend is rising

This is the section most analysis gets wrong. The headline story is that AI costs are plummeting. Token prices have fallen 60-80% in the past year. GPT-4-class performance that cost $30 per million input tokens in early 2024 now costs $2-3. DeepSeek V4, released in April 2026, delivers frontier-level agentic performance at $0.30 per million input tokens. The cheapest models today are more capable than the most expensive models from 18 months ago.

The industry celebrates this deflation. Enterprises budget for it. And then the bills arrive.

Uber burned its entire 2026 AI budget in four months. Claude Code adoption jumped from 32% to 84% of its 5,000-engineer organization. Monthly API costs per engineer ranged from $500 to $2,000. 95% of Uber engineers now use AI tools monthly, and 70% of committed code originates from AI. The COO is now publicly questioning whether the company can afford this level of productivity at scale.

Andrej Karpathy saw this coming. He coined "vibe coding" in February 2025. By December 2025, the latest models crossed a threshold: he stopped correcting his AI agents and started trusting them. At Sequoia AI Ascent this spring, he named what comes after: agentic engineering, the discipline of preserving the quality bar of professional software while coordinating stochastic, capable agents. You are not allowed to introduce vulnerabilities because of vibe coding. Uber got the speed. They skipped the discipline.

Uber is not an outlier. Microsoft reportedly canceled most of its direct Claude Code licenses, redirecting engineers to GitHub Copilot CLI. A Mavvrik survey found that 85% of companies miss their AI cost forecasts by more than 10%, with 84% reporting that AI spending cuts gross margins by over six percentage points.

The paradox is structural. Think of token pricing like electricity pricing for a factory. The per-kilowatt cost keeps dropping. But if you install 100x more machines on the floor, your electric bill still goes up. Enterprise agent systems routinely consume millions of tokens per workflow. Multi-agent orchestration multiplies that further. The net effect: despite dramatic price reductions, total AI inference spend is growing.

I cover this pattern in depth in the Token Efficiency & Agent Economics chapter on ai.nidhivichare.com, where I teach the engineering discipline of turning AI from a money pit into a profit engine. The core insight: 67% of AI compute is now spent on inference, not training. Training is a capital expenditure you pay once. Inference is an operating expense that scales with every user, every query, every agent invocation. Most enterprises go through a predictable four-phase cycle:

Phase	Timeline	What Happens	Monthly Cost
Prototyping	Months 1-3	Build proof-of-concept. Nobody tracks tokens.	$500
Token Maxing	Months 3-9	It works. Teams stuff context windows. Use best model for everything.	$50,000
The Reckoning	Months 9-18	Finance notices the bill. Per-user economics do not work.	$200,000+
Optimization	Months 12-24	Apply efficiency patterns. Token usage drops 50-95% while maintaining quality.	$10,000-$50,000

Uber is in Phase 3: The Reckoning. Karpathy's own research points to the fix: he demonstrated that an 8:1 token-to-parameter ratio outperforms the classic Chinchilla 20:1 rule, and built GPT-2 level training for $100 (down from $43,000 in 2019). The efficiency gains exist. They just have not reached enterprise operations yet. Most enterprises scaling AI agents will hit the same wall. The margin math is unforgiving: if your agent costs $0.50 per interaction and earns $0.30 in revenue, you lose $20,000 per month at 100,000 interactions. At one million interactions, you lose $200,000. This is not a pricing problem. It is an engineering problem. And the fix is token efficiency: doing the same work with 50-99% fewer tokens while maintaining quality.

The providers themselves are not immune. OpenAI generated approximately $3.7 billion in revenue in 2025 and still lost an estimated $5 billion, spending $1.35 for every dollar earned. OpenAI, Google, Anthropic, and Meta are all pricing inference below cost to capture market share. That subsidy will not last forever. When it ends, enterprises without token efficiency discipline will face a second reckoning.

The Governance Gap

The Governance Gap: adoption is racing ahead, governance is standing still

Karpathy has long described software evolving in stages: Software 1.0 (explicit code), Software 2.0 (learned weights via neural networks), and now Software 3.0 (programming by prompting, where LLMs act as the interpreter). When 70% of committed code originates from AI agents, as it does at Uber, governance is no longer optional. It is the engineering discipline that makes Software 3.0 work at enterprise scale.

Here is the number that should concern every data leader: 80% of enterprise applications shipped or updated in Q1 2026 embed at least one AI agent. But only one in five companies has a mature governance model for autonomous AI agents.

The gap is not theoretical. 67% of executives believe their company has already suffered a data leak or breach from unapproved AI tools. 36% lack any formal plan for supervising AI agents. And 76% say their company's AI governance does not keep pace with employee use of AI technology.

The regulatory environment is adding urgency. The EU AI Act becomes fully applicable on August 2, 2026. The Colorado AI Act takes effect on June 30, 2026, requiring impact assessments and appeal rights for high-risk AI systems. California's AI Transparency Act follows on August 2, 2026, mandating watermarks and detection tools for AI-generated content. A single chatbot deployed across regions may need EU Article 50 labels, Colorado assessments, and California consumer notices simultaneously. Multistate compliance stacking is now a real operational concern, not a future risk.

The governance gap is not just a risk management problem. It is a cost problem. Ungoverned AI agents consume tokens without budget constraints, access data without audit trails, and make decisions without explainability. Governance and cost control are the same discipline.

The industry is responding, but slowly. 56% of enterprises now name a dedicated "AI agent owner" or "agentic ops" lead, up from 11% in 2024. 75% of data leaders say employees need upskilling in data literacy, and 74% need AI literacy training. The Informatica Global CDO Report for 2026 identifies data governance and AI literacy as the key accelerators for AI adoption. The bottleneck, as always, is not model capability. 57% of leaders say the barrier is data reliability.

The CDO Playbook

The infrastructure is being built. The costs are real. The governance is lagging. What should a CDO do about it? Three moves, in priority order.

01 Budget for Usage, Not Unit Cost

The Uber story is the canary. Per-token pricing is a misleading input for enterprise AI budgets. A 60% drop in token cost means nothing when adoption drives a 500% increase in volume. Build consumption caps at the department and workflow level. Implement chargebacks so teams feel the cost of the tokens they consume. Track value-per-token, not total tokens used. That metric, the transition from measuring volume to measuring value, is the hallmark of a mature AI organization.

The patterns exist: model routing (send 40% of requests to budget models), prompt caching (90% savings on repeated inputs), early commitment (classify before reasoning to save 70-80% on routine tasks), and batch APIs (50% cost reduction for non-real-time work). I teach all of these in the Token Efficiency & Agent Economics chapter. The engineering discipline is mature. The enterprise adoption is not.

Key insight: The difference between an AI product that loses $200,000/month and one that generates $200,000/month is not the model. It is token efficiency. Same quality. Same user experience. The difference is entirely engineering.

02 Govern the Agent, Not Just the Model

Most AI governance frameworks were built for models: bias testing, accuracy metrics, approval workflows. Agents are fundamentally different. They act autonomously, chain tools together, access data without preview, and make decisions in sequences no human designed. Governing a model is like approving a document. Governing an agent is like supervising an employee.

56% of enterprises now name a dedicated agentic ops lead. If you have not designated one, you are behind. The role owns the agent registry, consumption governance (token budgets per agent, per workflow, per department), and audit trails for explainability and compliance. MCP is the enabling standard, with 97 million monthly downloads and Linux Foundation governance. But the protocol alone does not solve governance. Your agents still need a contract.

Key insight: The EU AI Act becomes fully applicable in August 2026. Colorado and California laws take effect this summer. If your agents do not have governance contracts today, you are building compliance debt that compounds with every deployment.

03 Measure Causation, Not Correlation

The $720 billion being spent on AI infrastructure will face CFO scrutiny. Every enterprise AI initiative will eventually be asked the same question: "Did the AI investment cause the revenue, or would it have happened anyway?"

Most teams cannot answer that question. They measure correlation: the AI launched, revenue went up, therefore the AI drove the revenue. That logic does not survive a CFO who knows the economy shifted, a competitor exited, and three other features launched the same quarter.

Causation requires a holdback group set up before launch. A subset of users that does not receive the AI feature, measured under the same market conditions as the treatment group. The delta between them is the platform's actual lift. Not estimated. Not modeled. Observed.

I have written extensively about this framework and applied it at Samsung Ads to causally attribute $1B+ in ad revenue across four streams. The full methodology is in Most AI Strategies Will Quietly Fail in 2027.

Key insight: The teams that build causal measurement into their AI strategy from day one walk into the 2027 budget review with per-stream causal numbers. The teams that do not will be the ones explaining why their AI initiative deserves another year.

The Bet

The $2 trillion in AI infrastructure that Big Tech will deploy by 2030 is reshaping every enterprise data strategy. The compute is cheaper than it has ever been. The models are more capable than they have ever been. The agents are more autonomous than they have ever been.

None of that matters if you cannot control the cost, govern the access, and prove the return.

The CDOs who win are not the ones who deploy the most AI. They are the ones who govern the spend, engineer the efficiency, and prove the causation. That is the only bet that survives a budget review.

The infrastructure is someone else's problem. The ROI is yours.

The AI Cost Reckoning: Why cheaper AI is costing enterprises more

Sources: Tom's Hardware (Big Tech Capex $725B); Futurum Group (AI Capex 2026); Fortune (Uber AI Budget); Deloitte (State of AI 2026); Informatica (CDO Insights 2026); Linux Foundation (Agentic AI Foundation); TokenMix (AI Pricing War 2026); SiliconANGLE (Snowflake Agentic Enterprise); OneTrust (AI Regulation 2026); Agentic AI Institute (Enterprise Adoption 2026).

About the author

Nidhi Vichare is an enterprise AI and data executive, platform architect, and author of The Meaning Layer. She writes about enterprise AI strategy, data architecture, causal measurement, AI ROI, agentic systems, and modern leadership for senior data and AI leaders.

The InferenceStay Connected

Enterprise AI strategy, data architecture, and the leadership decisions that drive measurable business lift.

Speaking & AdvisoryFor board, advisory, speaking, and strategic conversations in enterprise data and AI.

Reach out for a discussion