Google’s Gemini 3.1 Flash-Lite GA release is an enterprise operations story: a low-latency Gemini 3.1 option has moved from preview to a generally available model ID, while the preview endpoint is on a short shutdown schedule [2]. That makes the next decision practical rather than speculative: which workloads should move first, what will token economics look like, and how quickly can teams migrate safely?
What changed with GA
Google’s Gemini API release notes list gemini-3.1-flash-lite as released on May 7, 2026, describing it as the generally available version of Gemini 3.1 Flash-Lite and optimized for speed, scale, and cost efficiency [2]. Google Cloud also says Gemini 3.1 Flash-Lite is generally available on the Gemini Enterprise Agent Platform and describes it as designed for ultra-low latency and high-volume tasks [
3].
The model-name change matters because the preview endpoint has a dated end-of-life path. gemini-3.1-flash-lite-preview begins deprecating on May 11, 2026 and is scheduled to shut down on May 25, 2026 [2]. New evaluations should therefore target , while existing preview deployments should be moved before the shutdown date .




