This means a short text query may consume very little quota, while a long conversation using advanced tools or coding assistance can drain it quickly. Google says this approach better reflects the reality that some AI tasks require far more computing resources than others.
The compute‑based system applies across Gemini’s subscription tiers, but each plan receives a different usage budget.
According to Google’s support documentation:
At Google I/O 2026, the company also introduced the $100/month AI Ultra plan, which includes significantly larger usage limits—such as a five‑times higher limit in the Antigravity coding environment compared with AI Pro.
In other words, the key differentiator between plans is no longer just feature access—it is the amount of compute budget available to the user.
The immediate problem was predictability. Under the old system, users could roughly estimate how many prompts they had left. With compute‑based quotas, that predictability largely disappeared.
Because usage depends on task complexity, heavy workloads can burn through quota much faster than expected. Examples include:
Reports from developers and users showed that intensive sessions could exhaust the five‑hour allowance or even the weekly cap after only a few work sessions.
Many subscribers felt blindsided by the change. Some described the new limits as a "bait‑and‑switch," arguing that their paid plans now delivered less practical usage than before—even though Google had technically not reduced the official subscription tiers.
The backlash prompted a rapid response from Google, especially within Antigravity, the company’s AI‑powered coding tool.
Within days of the rollout:
Shortly afterward, the company tripled limits again, further expanding the available compute budget for developers using the tool.
The quick adjustments suggest Google underestimated how quickly real‑world workflows—particularly coding and agent‑based tasks—would consume the new quotas.
The controversy highlights a growing tension across the AI industry.
Modern AI systems do not have uniform costs. A short text reply might be cheap to generate, while long‑context reasoning, coding agents, or video generation can require dramatically more compute resources. That makes simple “message count” limits increasingly unrealistic for providers to sustain.
Compute‑based quotas solve the economic side of that problem—but they introduce a usability challenge. When limits depend on opaque calculations about complexity, users may struggle to predict how much access they actually have.
Google’s Gemini rollout illustrates this trade‑off clearly: a system designed to align usage with compute cost ended up confusing users and triggering backlash almost immediately.
For AI companies scaling large models, the challenge going forward is balancing three competing forces:
As AI assistants become more powerful—and more compute‑intensive—that balancing act is likely to become one of the defining product challenges of the industry.
Comments
0 comments