According to reporting by the Financial Times, the gaming behavior became severe enough that it measurably increased Amazon's computing costs . An Amazon senior vice president, Dave Treadwell, reportedly told staff, "Please don't use AI just for the sake of using AI"
. Amazon later confirmed the leaderboard had been deprecated, with a spokesperson telling Business Insider that the tool "was never intended to promote the use of AI for usage's sake"
. The company is now pivoting from tracking raw token counts to a metric it calls "normalised deployments" to measure productive AI-driven work rather than volume
.
Microsoft began granting thousands of employees across its Experiences + Devices division—encompassing Windows, Microsoft 365, Teams, Outlook, and Surface engineering teams—access to Anthropic's Claude Code in December 2025 . The experiment proved popular, but token-based billing quickly became a financial problem. Multiple reports indicate the program consumed its full annual AI budget within months, and the company began canceling most internal licenses on May 14, 2026
.
The hard deadline for the transition is June 30, 2026, the final day of Microsoft's fiscal year. This timing frames the cancellation as much about budget hygiene as product strategy . Affected engineers are being directed to transition to GitHub Copilot CLI, a tool Microsoft owns outright
. The company has emphasized that Anthropic's Claude models remain accessible through Microsoft Foundry and inside Microsoft 365 Copilot, but the interface and cost ownership model are changing significantly
.
Perhaps the most dramatic example of runaway cost comes from Uber. CTO Praveen Neppalli Naga confirmed to The Information in April 2026 that the company had already exhausted its full-year AI tools budget—less than four months into the fiscal year . The primary driver was the rapid, broad adoption of Anthropic's Claude Code across a workforce of roughly 5,000 engineers after a December 2025 rollout
.
Uber also relied on an internal team leaderboard that ranked engineering groups by AI usage volume, which accelerated Claude Code's adoption from 32% to 84% of developers in two months . By April, 95% of Uber engineers were using AI tools monthly and 70% of committed code was AI-generated
. Individual engineers were reportedly incurring between $500 and $2,000 per month in API costs
.
Despite these staggering adoption numbers, the business case has proven elusive. Uber COO Andrew Macdonald publicly stated on the Rapid Response podcast that he could not draw a direct connection between the AI spending and consumer product improvements. "That link is not there yet," he said. "Maybe implicitly there's more that is getting shipped, but it's very hard to draw a line between one of those stats and 'Okay now we're actually producing like 25% more useful consumer features'" . CTO Naga told The Information, "I'm back to the drawing board because the budget I thought I would need is blown away already"
.
At the root of many of these incidents is a management failure captured by Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure" . Companies eager to demonstrate AI adoption created internal leaderboards ranking employees or teams by token consumption or AI tool invocation counts. Workers, behaving rationally, optimized for the metric rather than the outcome. The result was an explosion of low-value, unnecessary AI calls that produced leaderboard rankings but no additional business value while directly inflating infrastructure costs
.
The practice was not limited to Amazon and Uber. Multiple reports indicate tokenmaxxing has been observed at other major technology companies, though Amazon's public removal of its leaderboard has become the most visible symbol of the practice's failure .
The common thread across these incidents is not that AI tools have failed, but that measuring and rewarding raw consumption creates perverse incentives that can be more expensive than the work AI is meant to replace. Companies are now pivoting away from adoption volume as a metric and toward questions of measurable business value: did AI assistance actually improve what was shipped?
What began as a race to adopt AI is turning into a forced exercise in cost discipline. The era of "consume as many tokens as possible" is ending, and the era of "justify the cost with actual output" is beginning.
Comments
0 comments