In response, Google introduced a hard cap on how much quota a single prompt can consume. This does not revert the system back to simple per-prompt counting, but it prevents a single operation from instantly triggering a multi-hour lockout. The practical outcome is that Pro users can now get multiple complex queries in per five-hour window rather than just one .
Perhaps the most egregious original sin of the compute-based system was that failed requests consumed quota. An internal server error, timeout, or bug that produced no useful output still ticked down the user’s usage timer.
Google has now completely eliminated this. Quota is only subtracted for successful completions. Woodward summed it up plainly: “If a request fails, you won't be charged. Our system mistakes are on us, not you” . This removes a significant source of invisible quota drain that made limits feel even smaller than they actually were.
A specific bug in the Gemini Omni-powered video tool was causing some users to see their full quota exhausted after just one or two video generations . This bug was particularly painful because it made it impossible to iterate or correct mistakes on a video project without facing total lockout.
Google confirmed the bug has been resolved. To compensate and improve the offering, the company simultaneously doubled the Omni video generation limit for Google AI Ultra subscribers, granting them more breathing room immediately .
To give all users a reliable option that will never strand them, Google exempted Gemini 3.1 Flash-Lite prompts from all quota calculations. Flash-Lite queries now cost zero compute toward the five-hour or weekly limits . This guarantees basic text and lighter coding tasks can continue uninterrupted even if a user’s Pro or Ultra quota is fully depleted.
Much of the original frustration stemmed from the lack of an accurate meter. Users often had no idea they were approaching the limit until the service cut them off, especially since a single complex prompt could jump from 0% to 100% consumption instantly.
Google committed to providing more detailed usage breakdowns and improved notifications, particularly for compute-heavy tasks like Deep Research . The company is also working on a more comprehensive dashboard that should help users understand their consumption in real-time rather than getting surprised by a hard stop.
A smaller quality-of-life improvement ensures that your chosen model (e.g., Gemini 3.1 Pro) remains sticky across sessions. It will only change if you manually switch or if hitting your limit triggers an automatic fallback to a lighter model like Flash . This prevents the frustrating experience of starting a task on Pro and finding the app silently demoted you.
It is important to note that the core architecture of the new system remains in place. Google still uses a compute-based model rather than a simple message-based one, and the five-hour rolling window with a weekly hard cap still applies to paid plans . The company has also signaled that it eventually intends to sell pay-as-you-go top-up AI credits in the Gemini app, allowing heavy users to buy more compute directly
.
Comments
0 comments