The speed of the blowout was staggering. Uber rolled out Claude Code in late 2025 and actively encouraged usage through internal leaderboards that ranked developers by token consumption . By February, Claude Code usage had nearly doubled. By March, 84% of Uber developers were classified as agentic-coding users, and 65–72% of code inside IDE-based tools was AI-generated
. Uber's internal AI coding agent now produces roughly 1,800 code changes per week
. The company essentially gamified maximum token usage — and got exactly what it incentivized.
The root cause wasn't just enthusiasm. Uber built its budget around a per-seat SaaS mental model that worked for two decades of predictable software licensing . Generative AI pricing operates on a fundamentally different principle: every token processed costs money, and the bill scales with how heavily people use the tool, not with how many have access. Gartner reports that agentic workflows burn 5x to 30x more tokens per task than static chatbot interactions, creating a cost curve that traditional forecasting cannot accommodate
.
Uber measured its expenses but not its gains. How much time was saved per engineer? How many bugs were avoided? What moved in revenue or rider experience? The company didn't have clear answers .
In a May 2026 interview with Business Insider, Uber's operations chief Andrew Macdonald made the tension explicit. After conversations with senior engineering leaders, Macdonald said it was becoming "harder to justify" the money the company is spending on AI "tokenmaxxing" . He acknowledged that higher token usage was not translating into a proportional increase in useful consumer features: "That link is not there yet, right? I think maybe implicitly there is more that is getting shipped, but it's very hard to draw a line between one of those stats and, 'Okay, now the business is moving faster'"
.
The CTO himself acknowledged the company is "back to the drawing board" on AI cost governance . The internal dynamic reveals a classic incentive mismatch: leadership pushed tool adoption aggressively — with leaderboards, public rankings, and CTO encouragement — and then discovered that unconstrained token consumption creates runaway costs with no natural governor
. Engineers rationally used the tools as much as they were rewarded for using. The business now rationally questions whether any of that consumption moves the needle on margins, rider experience, or revenue.
Uber is not an outlier. Microsoft has reported similar findings that AI-powered coding assistants can be more expensive than the human labor they are meant to augment . The structural challenge is the same across the enterprise: generative AI tools are priced per-token, their value is difficult to isolate and measure, and the incentives inside engineering organizations push toward maximum consumption rather than maximum efficiency.
Gartner's 5x–30x token multiplier for agentic workflows applies across the industry . Anthropic's Claude Code alone hit $2.5 billion in annualized revenue by February 2026, up from $1 billion in November 2025 — the fastest enterprise software ramp in history
. The spending is real. The returns are not yet visible.
The Uber case surfaces a challenge no major company has solved: how do you budget for a technology whose cost scales with usage, whose output quality is difficult to measure, and whose adoption you need to encourage to stay competitive — all while the CFO needs to see a clear P&L impact? Until enterprises build governance models that connect token spend to specific, measurable business outcomes, the "tokenmaxxing" problem will spread beyond Uber. The company that figures out how to measure and optimize AI's real return on investment — rather than just its token consumption — will have an advantage that matters far more than any leaderboard ranking.
Comments
0 comments