答案已發布2 週前Last edited 3 天前30 來源

Google AI 連自己個名都串錯？拆解 LLM 數唔到字嘅底層原因

Google AI Overviews 連基本串字都錯——例如堅稱「Google」有兩個「P」——因為大型語言模型係用「子詞標記」嚟處理文字，而唔係逐個字母睇，天生就冇計字母呢個概念。同一個「睇唔到字母」嘅缺陷，亦都引致咗 2026 年 5 月「disregard」指令洩漏事件（搜尋簡單指令會整爛搜尋介面），同埋 2024 年嗰兩單經典大頭佛：叫大家食石頭補充礦物質、落膠水令 Pizza 芝士黐實啲。

使用 Studio Global AI 搜尋並查核事實瀏覽更多熱門頁面

917K0

Google AI Overviews spelling failure on a search results page showing incorrect letter counts for the word 'Google' — What is Google's AI Overview, why does it struggle to spell simple words like "Google" (claiming two "P"s), "poop" (claiming one "R"), and "Google's AI Overviews feature confidently miscounts letters in its own company name, exposing fundamental tokenization limits in large language models.
AI 提示
Create a landscape editorial hero image for this Studio Global article: What is Google's AI Overview, why does it struggle to spell simple words like "Google" (claiming two "P"s), "poop" (claiming one "R"), and ". Article summary: ## What is Google AI Overview?. Topic tags: general, academic, education, general web, user generated. Reference image context from search candidates: Reference image 1: visual subject "There’s also is also “exactly 1 ‘r’ in the word ‘poop’,” Google’s AI Overview says, as well as two ‘d’s in the word journalism, yet spelled it: j-o-u-r-n-a-d-i-s-m. Google did at l" source context "Why Google's AI can't spell Google (or anything else) - Yahoo Tech" Reference image 2: visual subject "# Google's AI Overview still can't spell, and the internet is very aware of it. How many 'e's are there in the word astronomical? Google's AI Overview still can't spell, and the in"
openai.com

2026 年 5 月底，Google 旗下嘅 AI 搜尋功能「AI Overviews」再次成為網民嘲笑對象，因為大家發現佢連幼稚園級數嘅任務都搞唔掂：數字母。

你問佢「Google」呢個字入面有幾多個「P」，AI 會好有信心咁答你「兩個」——但正確答案係一個。問佢「poop」有幾多個「R」，佢堅持「exactly 1」，事實上有兩個。仲會話「journalism」有兩個「D」，然後將個字串成 「j-o-u-r-n-a-d-i-s-m」 。

呢啲唔係獨立嘅技術失誤，而係一個近期尷尬篇章——特別係當 Google 一直將呢個功能定位做搜尋引擎嘅未來。問題根源追溯到同一種架構上嘅拉扯，之前就係因為呢個問題，令 AI 叫人食石頭、喺 Pizza 上面落膠水，甚至只要打一個英文字就洩漏咗自己嘅系統指令。

Google AI Overviews 係點運作？

Google AI Overviews 係直接整合咗入 Google 搜尋嘅生成式 AI 功能。佢喺 2024 年 5 月向過億美國用戶推出，隨後擴展到更多地區，背後用緊同 ChatGPT 一樣嘅大型語言模型技術，會喺搜尋結果頁頂部直接出 AI 寫嘅撮要，而唔係單單列出藍色連結。Google 嘅野心係想令搜尋變得對話式同直接啲，但執行起嚟就不斷曝露 LLM 處理資訊時嘅基本弱點。

點解 LLM 數唔到字母：拆解 Tokenization（標記化）問題

呢啲串字失敗嘅根本原因，唔係普通嘅軟件 Bug，而係一個有大量學術研究記錄、大型語言模型架構層面嘅先天性限制——子詞標記化。多份同行評審嘅論文都詳細剖析過呢種失敗模式。

以下係背後發生緊嘅事：

LLM 唔係睇一個個字母。 佢哋會將文字打散做「標記」——一個或以上嘅字元組合——用嘅係 Byte-Pair Encoding (BPE) 之類嘅演算法。一個常見嘅字例如「Google」可能會變成一個單一標記，而「journalism」就可能會拆開做子詞組件，例如


['journ', 'alism']

。模型從來都冇儲存或者處理過原始嘅字元序列。

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

使用 Studio Global AI 搜尋並查核事實

人們還問