The Financial Times broke the story yesterday, and the headline alone tells you everything:

"We created a monster": companies rein in AI usage as costs strain budgets

Amazon. Walmart. Uber. The companies that spent the last two years racing to put AI tools in every employee's hands are now installing speed bumps. Caps on API usage. Canceled licenses. Internal memos telling employees to stop "tokenmaxxing" — burning as many AI tokens as possible just because they can.

This isn't a minor budget trim. This is a reversal.

And if you're building a business on top of someone else's API, you need to understand exactly what's happening — because the signals coming out of the enterprise right now are the canary in the coal mine for the entire AI economy.

The Numbers That Broke the Camel's Back

Let's start with concrete data points, because the scale of what's happening is genuinely wild.

Uber burned through its entire 2026 AI budget in four months. The company's CTO Praveen Neppalli Naga told The Information in April that aggressive internal adoption — complete with leaderboards encouraging AI usage — had blown past the full-year forecast by early Q2. The company is now spending more on AI inference than on human engineers for some categories of work. That isn't a typo.

Microsoft — who you'd think would be all-in on AI for obvious reasons — reportedly canceled most of its internal Claude Code licenses, moving thousands of developers back to GitHub Copilot CLI instead. The stated reason was "converging on a single tool." Multiple sources told The Verge and Fortune that the real reason was cost. Microsoft's own internal AI tab had become one of its fastest-growing operational expenses. Six months after pushing widespread adoption, the plug got pulled.

Nvidia's VP of Applied Deep Learning Bryan Catanzaro put it bluntly: "For my team, the cost of compute is far beyond the costs of the employees." Nvidia — the company selling the picks and shovels — is telling you their own AI bill is higher than their human payroll for the same work.

Swan AI founder Amos Bar-Joseph publicly shared a $113,000 monthly AI bill for a team of four people. Four. People. One hundred and thirteen thousand dollars a month.

And the FT reports that Amazon, Walmart, and other early adopters are introducing caps or discouraging "wasteful" AI usage patterns across the board.

The punchline from the Economist, published five days ago: "The rage for tokenmaxxing is coming to an end."

How Did We Get Here?

The mechanism is straightforward, and it's baked into the business model of every major AI API provider.

Token-based pricing means more usage = more revenue for the provider, more cost for the customer. This is obvious in retrospect. But companies spent 2024 and early 2025 treating AI adoption as a binary metric — "are we using enough AI?" — with no governor on what "enough" meant. Internal leaderboards tracked who consumed the most tokens. Amazon executives reportedly pushed employees to "toxenmaxx." Meta ran an internal leaderboard called "Claudeonomics."

You know what happens when you tell people to use as much of a paid resource as possible and measure their performance by how much they consume? They consume a lot. Shocking, I know.

The problem is that the unit costs have been falling — Gartner predicts inference on a trillion-parameter model will cost ~90% less by 2030. But total costs are exploding because consumption is growing even faster. Goldman Sachs forecasts agentic AI will drive a 24-fold increase in token consumption by 2030, hitting 120 quadrillion tokens per month. That's not a typo either.

The paradox: as AI gets cheaper per token, the systems that use it eat exponentially more tokens. Cheaper inference doesn't mean cheaper AI. It means more AI, which means more total spend.

The Three Cost Drivers That Matter

The specific forces driving the crunch are worth naming:

1. Agent proliferation. The Economist: "AI agents — bots that can read, interpret and act — use masses of processing power and have started to run up huge bills." A single agentic workflow chains together multiple model calls — planning, execution, reflection, tool-use. Each step burns tokens. And as every major SaaS product adds its own agent, the cumulative bill for a company running hundreds of software subscriptions becomes catastrophic.

2. Chain-of-thought and reasoning model usage. Advanced reasoning models (o3, Claude Opus 4.5, etc.) generate 10-100x more tokens per request than a simple completion. Every internal-use application that was switched from GPT-4o to a reasoning model quietly multiplied its per-query cost by an order of magnitude.

3. The API pricing transition. Both OpenAI and Anthropic are shifting toward usage-based pricing as they head toward their IPOs. The "all you can eat" enterprise deals that masked the per-token cost are being restructured. Companies are seeing the real numbers for the first time, and they're not happy about it.

What This Means for Developers

If you're building on top of the big API providers, there are three implications you need to internalize.

First, your downstream customers are getting budget-conscious. If Uber and Microsoft are capping AI spend, the companies a tier below them are going to be even more sensitive. If you're selling an AI-powered product, expect procurement to ask harder questions about token consumption, cost per query, and ROI. The era of "it uses AI, therefore it's valuable" is ending.

Second, the "cheap AI" narrative from open-weight models is about to get much louder. The FT story is breaking at exactly the moment when DeepSeek V4, Qwen 3.6, and GLM-5-2 are delivering frontier-competitive performance at a fraction of the API cost. The Hacker News thread on the FT piece captured this perfectly: "Deepseek and GLM do slop just as well as Opus 4.8 and GPT 5.5 at a fraction of the cost."

The enterprise cost crunch is an accelerant for the open-weight adoption curve. When your CFO sees that the budget blew through in four months and an open model can do 85% of the job for 10% of the cost, the conversation shifts fast.

Third, platform dependency just got riskier. The same Forbes analysis that broke the Uber/Microsoft cost data also reported that OpenAI has missed key revenue targets, with CFO Sarah Friar privately expressing concern that the company "might not be able to pay for future computing contracts if revenue doesn't grow fast enough." The AI giants are under their own financial pressure — which means API pricing, access, and service levels are all variables, not constants.

Is This a Correction or a Ceiling?

The answer is: it depends on what we're talking about.

For AI replacing human labor — the grand thesis that drove $740 billion in global tech CapEx this year — the numbers don't work yet. When Nvidia's own VP says his compute bill exceeds his salary bill, the unit economics of AI-for-human-replacement are in negative territory. That could change as costs fall and capabilities improve, but it hasn't changed yet.

For AI as a productivity multiplier for knowledge workers — the thesis is still intact, but it needs financial discipline. The companies that win will be the ones that measure ROI in outcomes, not token counts. The Business Insider piece on the emerging "AI caste system" captured the risk: teams with big AI budgets will look like the best ideas because they had resources, not because they had merit. Sunk-cost fallacy looms large.

For the frontier API business model — this is the most interesting question. The enterprise pullback is happening on the consumption side while the providers are simultaneously burning cash and heading toward public markets. Something has to give. Either costs come down dramatically (which the providers can't afford), or value delivery improves dramatically (which the models haven't done yet), or we're entering a period of contraction in the API market that benefits open-weight alternatives.

The Bottom Line

The FT story isn't a surprise to anyone who's been watching the financials. The surprise is that it took this long to spill into public view.

Enterprise AI spending has gone from "all you can eat" to "actually, can we see the menu?" in about six months. The tokenmaxxing era is over. What comes next will be defined not by who can burn the most tokens, but by who can extract the most value per token.

That's a healthier dynamic for the industry in the long run. But it's going to be painful for companies that built their strategies on cheap, unlimited AI — because that world just ended, and the bill collectors are knocking.

This story was triggered by the Financial Times report "'We created a monster': companies rein in AI usage as costs strain budgets" (June 18, 2026). Supporting data from the Economist (June 14), Fortune (May 22), Forbes (May 27), Axios (May 28), and Business Insider (June 12).