คำตอบเผยแพร่แล้ว4 วันที่ผ่านมาLast edited เมื่อวานซืน24 แหล่งที่มา

เจาะลึก Gemma 4 QAT: โมเดล AI ตัวใหม่จาก Google ที่แรงขึ้นแต่กิน RAM น้อยลง 72%

Google ปล่อยเวอร์ชัน QAT อย่างเป็นทางการสำหรับ Gemma 4 ทุกรุ่น ได้แก่ E2B, E4B, 12B, 26B A4B และ 31B [1][4][5] Quantization คือการลดความละเอียดของตัวเลขที่ใช้ในการคำนวณ โดยการใช้ int4 (4 บิต) จะช่วยลดขนาดข้อมูลลงได้ถึง 4 เท่าเมื่อเทียบกับ BF16 [2] เทคนิค QAT แตกต่างจากการบีบอัดแบบทั่วไป (PTQ) เพราะจำลองการบีบอัดตั้ง...

ค้นหาและตรวจสอบข้อเท็จจริงด้วย Studio Global AI ดูหน้าที่กำลังมาแรงเพิ่มเติม

281K0

Google Gemma 4 QAT model compression unlocking mobile and consumer GPU deployment illustrated as a large neural network being compressed efficiently into a smartphone. — What are the key details of Google's June 4 release of Gemma 4 QAT models, including their quantization approach, supported model sizes andGoogle's QAT checkpoints compress Gemma 4 models by roughly 72%, enabling deployment on hardware from smartphones to consumer GPUs.
AI พรอมต์
Create a landscape editorial hero image for this Studio Global article: What are the key details of Google's June 4 release of Gemma 4 QAT models, including their quantization approach, supported model sizes and. Article summary: Google provides official Quantization-Aware Training (QAT) checkpoints for Gemma 4, and the Gemma 4 lineup includes E2B, E4B, 12B, 26B A4B, and 31B sizes [1][4][5]. Here are the key details.. Topic tags: general, documentation, general web, user generated. Reference image context from search candidates: Reference image 1: visual subject "# What Is Google Gemma 4? Google Gemma 4 is the most capable open model family from DeepMind yet, shipping four sizes under Apache 2.0 with multimodal input, native reasoning, and" source context "What Is Google Gemma 4? Architecture, Benchmarks, and Why It ..." Reference image 2: visual subject "# What Is Google Gemma 4? Google
openai.com

Google ได้สร้างมาตรฐานใหม่ให้กับการเข้าถึง AI ขั้นสูงด้วยการเปิดตัวเช็คพอยต์ Quantization-Aware Training (QAT) สำหรับโมเดลตระกูล Gemma 4 ในวันที่ 4 มิถุนายน 2026 นี่ไม่ใช่แค่การอัปเดตธรรมดา แต่เป็นการปฏิวัติวิธีที่เราจะรันโมเดลภาษาขนาดใหญ่บนอุปกรณ์ส่วนตัวของเรา

Quantization คืออะไร และ QAT ดียังไง?

หัวใจหลักของเรื่องนี้คือ Quantization หรือการลดความแม่นยำของตัวเลขที่ใช้เก็บและคำนวณค่าพารามิเตอร์ในโมเดล AI ลองนึกภาพว่าปกติโมเดลจะเก็บค่าต่างๆ เป็นตัวเลขทศนิยม 16 บิต (BF16) แต่ QAT ย่อเหลือเพียง 4 บิต (int4) ซึ่งคิดเป็นการลดขนาดข้อมูลลงถึง 4 เท่า

แต่การย่อข้อมูลมักทำให้คุณภาพของโมเดลลดลง ปัญหานี้คือสิ่งที่ QAT เข้ามาแก้
Post-Training Quantization (PTQ) แบบเดิมที่เราใช้กันจะบีบอัดโมเดลหลังจากเทรนเสร็จแล้ว ซึ่งมักทำให้ประสิทธิภาพตก
Quantization-Aware Training (QAT) ต่างออกไป เพราะมันจำลองกระบวนการบีบอัดนี้ตั้งแต่ ขั้นตอนการเทรน ทำให้โมเดลเรียนรู้ที่จะปรับตัวและชดเชยความผิดพลาดที่อาจเกิดขึ้นตั้งแต่แรก ส่งผลให้โมเดลเวอร์ชัน 4 บิตมีประสิทธิภาพใกล้เคียงกับเวอร์ชัน 16 บิตแบบดั้งเดิมมาก

สำหรับ Gemma 4 รุ่นตระกูล Dense (ไม่ใช่ MoE) จะใช้โครงสร้าง W4A16 กล่าวคือใช้ค่าน้ำหนัก (Weights) แบบจำนวนเต็ม 4 บิต และการกระตุ้น (Activations) แบบ 16 บิต โดยมี และมาในฟอร์แมต

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

ค้นหาและตรวจสอบข้อเท็จจริงด้วย Studio Global AI

คนยังถาม

คำตอบสั้น ๆ สำหรับ "เจาะลึก Gemma 4 QAT: โมเดล AI ตัวใหม่จาก Google ที่แรงขึ้นแต่กิน RAM น้อยลง 72%" คืออะไร

Google ปล่อยเวอร์ชัน QAT อย่างเป็นทางการสำหรับ Gemma 4 ทุกรุ่น ได้แก่ E2B, E4B, 12B, 26B A4B และ 31B [1][4][5]

ประเด็นสำคัญที่ต้องตรวจสอบก่อนคืออะไร?

ฉันควรทำอย่างไรต่อไปในทางปฏิบัติ?

เทคนิค QAT แตกต่างจากการบีบอัดแบบทั่วไป (PTQ) เพราะจำลองการบีบอัดตั้งแต่ขั้นตอนการเทรน ทำให้โมเดลเรียนรู้ที่จะชดเชยความแม่นยำที่หายไป [4]

แหล่งที่มา

Comments

0 comments

Loading comments...

โมเดล	ประเภท	หน่วยความจำเมื่อใช้ QAT 4-bit	ประหยัดกว่า BF16
E2B	Dense, พารามิเตอร์ประสิทธิผล 2.3B	~1 - 3.2 GB	~72-75%
E4B	Dense, พารามิเตอร์ประสิทธิผล 4.5B	~3 - 5 GB	~72-75%
12B	Dense, รองรับข้อความ/ภาพ/เสียง	~7 GB	~72%
26B A4B	MoE, เปิดใช้งานเพียง ~3.8B พารามิเตอร์	~15 GB	~72%
31B	Dense, พารามิเตอร์ 30.7B	~18–20 GB	~72%

เจาะลึก Gemma 4 QAT: โมเดล AI ตัวใหม่จาก Google ที่แรงขึ้นแต่กิน RAM น้อยลง 72%

Quantization คืออะไร และ QAT ดียังไง?

Search, cite, and publish your own answer

คนยังถาม

คำตอบสั้น ๆ สำหรับ "เจาะลึก Gemma 4 QAT: โมเดล AI ตัวใหม่จาก Google ที่แรงขึ้นแต่กิน RAM น้อยลง 72%" คืออะไร

ประเด็นสำคัญที่ต้องตรวจสอบก่อนคืออะไร?

ฉันควรทำอย่างไรต่อไปในทางปฏิบัติ?

แหล่งที่มา

Comments

ทำความรู้จักกับ 5 โมเดลและขนาดที่เล็กลง

คำเตือนสำคัญจาก Google

ฟอร์แมตต่างๆ เลือกใช้ยังไงให้เหมาะกับเรา

รันได้บนเครื่องเราไหม? ดูสเปกนี้

สิ่งนี้เปลี่ยนเกมอย่างไร?