答え公開済み4 日前Last edited 一昨日29 ソース

Google、Gemma 4向けQATモデルでメモリ使用量72%削減を実現――その革新性が切り拓く世界

Googleが公開したGemma 4 QAT（量子化アウェア学習）チェックポイントは、16ビット精度と比較してメモリ使用量を約72%削減[2][5]。31Bモデルが単一のコンシューマーGPUで動作可能になり、最小のE2Bモデルは約1GBにまで圧縮[5][6]。 E2B、E4B、12B、26B A4B(MoE)、31Bの全5サイズで提供[2][4]。GGUFやvLLM向けのcompressed tensors形式、さらにモバイル最適化スキーマも用意される一方、Q4 0への単純変換では精度が低下するため、変換方法の選択が重要[5][18]。

Studio Global AIで検索して事実確認さらにトレンドページを見る

281K0

Google Gemma 4 QAT model compression unlocking mobile and consumer GPU deployment illustrated as a large neural network being compressed efficiently into a smartphone. — What are the key details of Google's June 4 release of Gemma 4 QAT models, including their quantization approach, supported model sizes andGoogle's QAT checkpoints compress Gemma 4 models by roughly 72%, enabling deployment on hardware from smartphones to consumer GPUs.
AI プロンプト
Create a landscape editorial hero image for this Studio Global article: What are the key details of Google's June 4 release of Gemma 4 QAT models, including their quantization approach, supported model sizes and. Article summary: Google provides official Quantization-Aware Training (QAT) checkpoints for Gemma 4, and the Gemma 4 lineup includes E2B, E4B, 12B, 26B A4B, and 31B sizes [1][4][5]. Here are the key details.. Topic tags: general, documentation, general web, user generated. Reference image context from search candidates: Reference image 1: visual subject "# What Is Google Gemma 4? Google Gemma 4 is the most capable open model family from DeepMind yet, shipping four sizes under Apache 2.0 with multimodal input, native reasoning, and" source context "What Is Google Gemma 4? Architecture, Benchmarks, and Why It ..." Reference image 2: visual subject "# What Is Google Gemma 4? Google
openai.com

Googleは、Gemma 4ファミリー全モデルに対応する「量子化アウェア学習（Quantization-Aware Training：QAT）」チェックポイントを公式に公開しました。これは、学習済みの16ビットモデルを後から圧縮する従来の「学習後量子化（PTQ）」とは一線を画すアプローチです。

QATは、学習プロセス自体に量子化のシミュレーションを組み込むことで、モデルが精度低下を補償する方法を自ら学習します。その結果、最終的な4ビットモデルでも、オリジナルに迫る性能を維持したまま、メモリ使用量を約72%も削減することが可能になりました。これにより、これまでハードウェアの制約から巨大モデルの導入を諦めていた開発者や研究者にとって、状況は劇的に変わります。

QATが「従来の量子化」より優れている理由

標準的な学習後量子化（PTQ）では、完全に学習されたモデルの重みを、例えばbfloat16からint4に変換します。問題は、モデルが低い精度で動作するよう学習されていないため、多くの場合、品質が著しく劣化する点です。

一方で、QATは量子化の工程を学習ループの中に直接統合します。モデルは順伝播と逆伝播の過程で量子化された値を参照するため、より制約の多い数値表現に対しても耐性を獲得します。「元の性能に迫る（near original performance）」と評される所以です。

今回公開された公式チェックポイントは、W4A16という方式を採用しています。これは、重みを4ビット整数で、活性化関数を16ビットで扱い、group_sizeは32、保存形式にはcompressed-tensorsを用います。このアプローチにより、メモリ節約とスループット向上の絶妙なバランスを実現しています。

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

Studio Global AIで検索して事実確認

人々も尋ねます