What should I do next in practice?

Model E2B boleh berjalan hanya dengan 3.2GB RAM, menjadikannya ideal untuk telefon pintar dan peranti edge, manakala model 12B sesuai untuk GPU 8GB ( 7GB) [3][5][7].

← Back to Trending

AnswersPublished4 days agoLast edited 2 days ago22 sources

Gemma 4 QAT Kini Lebih Ringan 72%: Model AI Google Sedia Beraksi di Telefon dan PC Anda

Google menyediakan checkpoint rasmi Latihan Sedar Kuantisasi (QAT) untuk kesemua lima saiz model Gemma 4: E2B, E4B, 12B, 26B A4B, dan 31B [1][4][5]. Pendekatan QAT mensimulasikan kuantisasi semasa latihan, membolehkan model 4 bit mengekalkan prestasi hampir asal sambil mengurangkan penggunaan memori sekitar 72% berb...

Search & fact-check with Studio Global AI Browse more Trending pages

281K0

Google Gemma 4 QAT model compression unlocking mobile and consumer GPU deployment illustrated as a large neural network being compressed efficiently into a smartphone. — What are the key details of Google's June 4 release of Gemma 4 QAT models, including their quantization approach, supported model sizes andGoogle's QAT checkpoints compress Gemma 4 models by roughly 72%, enabling deployment on hardware from smartphones to consumer GPUs.
AI Prompt
Create a landscape editorial hero image for this Studio Global article: What are the key details of Google's June 4 release of Gemma 4 QAT models, including their quantization approach, supported model sizes and. Article summary: Google provides official Quantization-Aware Training (QAT) checkpoints for Gemma 4, and the Gemma 4 lineup includes E2B, E4B, 12B, 26B A4B, and 31B sizes [1][4][5]. Here are the key details.. Topic tags: general, documentation, general web, user generated. Reference image context from search candidates: Reference image 1: visual subject "# What Is Google Gemma 4? Google Gemma 4 is the most capable open model family from DeepMind yet, shipping four sizes under Apache 2.0 with multimodal input, native reasoning, and" source context "What Is Google Gemma 4? Architecture, Benchmarks, and Why It ..." Reference image 2: visual subject "# What Is Google Gemma 4? Google
openai.com

Apakah Itu QAT dan Mengapa Ia Revolusioner?

Secara ringkas, Latihan Sedar Kuantisasi (QAT) ialah teknik di mana model AI dilatih sambil mengambil kira proses pemampatan (kuantisasi) yang akan dilaluinya kelak. Ini berbeza dengan kaedah lama iaitu Kuantisasi Pasca Latihan (PTQ), yang memampatkan model setelah ia selesai dilatih sepenuhnya. Proses PTQ sering mengakibatkan penurunan prestasi yang ketara .

Dengan QAT, model 'belajar' untuk mengimbangi kehilangan ketepatan semasa fasa latihan. Hasilnya, apabila model dimampatkan kepada format 4-bit (int4), ia menggunakan memori sehingga 72% lebih rendah tetapi masih mampu mengekalkan prestasi yang hampir setanding dengan model asal 16-bit (BF16) . Ini umpama mengecilkan saiz fail gambar tanpa membuatkannya kabur, kerana teknik pengecilan itu sudah dirancang sejak awal.

Checkpoint rasmi dari Google menggunakan skema W4A16 untuk model Dense Gemma 4. Ini bermakna pemberat (weight) model disimpan dalam integer 4-bit, manakala pengaktifan (activation) kekal dalam 16-bit, dengan group_size=32 dan format compressed-tensors .

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

Model	Jenis & Parameter Berkesan	Memori 4-bit (Q4_0)	Penjimatan vs BF16
Gemma 4 E2B	Rapat (Dense), 2.3B parameter	~3.2 GB	~72% lebih rendah
Gemma 4 E4B	Rapat (Dense), 4.5B parameter	~5 GB	~72% lebih rendah
Gemma 4 12B	Model multimodal bersatu (teks, imej, audio)	~7 GB	~72% lebih rendah
Gemma 4 26B A4B	Campuran Pakar (MoE), ~3.8B parameter aktif	~15 GB	~72% lebih rendah
Gemma 4 31B	Rapat (Dense), 30.7B parameter	~18–20 GB	~72% lebih rendah

Gemma 4 QAT Kini Lebih Ringan 72%: Model AI Google Sedia Beraksi di Telefon dan PC Anda

Apakah Itu QAT dan Mengapa Ia Revolusioner?

Search, cite, and publish your own answer

People also ask

What is the short answer to "Gemma 4 QAT Kini Lebih Ringan 72%: Model AI Google Sedia Beraksi di Telefon dan PC Anda"?

What are the key points to validate first?

What should I do next in practice?

Sources

Comments

Barisan Model: Daripada Telefon ke Workstation

Format Pilihan: Yang Mana Sesuai untuk Anda?

Implikasi Praktikal: AI Canggih di Hujung Jari Anda

Kaveat Penting: Kualiti Bergantung pada Format