The rollout was swift and structured, moving from a limited test to full public availability in just over two weeks:
minimal, low, medium, high) for developers The consumer-facing app offers up to three distinct levels, though not all are available to every user :
On the developer side, the API offers more granular control with four levels (minimal, low, medium, high). A critical change for developers was the introduction of Gemini 3.5 Flash at Google I/O, which shifted the API default from high to medium. This means code written for earlier Gemini 3 models would automatically use a lower (faster, less expensive) reasoning level unless updated .
The choice between Standard and Extended is a trade-off between speed and depth, which also affects your usage quotas. While Google doesn't publish a simple "Extended uses 2x more quota" rule, the impact is real because of how usage limits are structured .
The Verdict: Use Standard for quick answers, simple rewrites, and chat-based tasks where speed is paramount. Switch to Extended when you need a robust analysis of a complex problem, deep code generation, or multi-step mathematical reasoning. Just be mindful that it will consume your available quota more quickly .
Thinking Levels is not an isolated feature. It's the consumer-facing tip of a three-tier reasoning strategy that Google has been assembling over the last year.
1. Deep Think: The Specialist High-Compute Mode
Deep Think is Google's most specialized reasoning product, built for advanced science, research, and engineering challenges. It's an exclusive mode for Google AI Ultra subscribers, built on top of Gemini 3.1 Pro, and is designed for tasks like turning a sketch into a 3D-printable model or solving frontier-level mathematical proofs . Google gave it a "major upgrade" in February 2026, and it exists as a distinct product from the daily-use Thinking Levels toggle
.
2. Thinking Levels: The Everyday Reasoning Dial
This is the feature discussed above, giving consumers daily, per-task control over reasoning effort on standard Flash and Pro models. It makes configurable reasoning accessible without requiring a top-tier subscription.
3. Gemini 3.5 Flash: The Developer-Facing Architecture
Unveiled at Google I/O 2026, Gemini 3.5 Flash is the technical backbone that ships the new thinking-level architecture to developers. Positioned as Google's "strongest agentic/coding model yet," it outperforms the older Gemini 3.1 Pro on key benchmarks while running roughly 4x faster . Crucially, it introduced the
thinking_level enum as the new standard, replacing the old thinking_budget parameter and defaulting to medium for faster, more cost-effective responses . This model brings the same conceptual framework behind the app's Standard/Extended toggle directly into the hands of developers.
In practice, this means the entire Gemini ecosystem—from a free user asking a quick question on the web to an enterprise developer building complex AI agents—is now operating on a single, unified reasoning architecture. You control the depth. Google's models handle the rest.
Comments
0 comments