答案已發布2 個月前Last edited 上個月23 來源

點解 Google 嘅 AI 會串錯「Google」？一個冇得即刻搞掂嘅底層缺陷

Google 嘅 AI Overview 會犯低級串字錯誤，因為大型語言模型係將文字拆成「Token」（例如成個「Google」當一個詞塊）去處理，佢根本唔會逐個字母去睇。俾用戶踢爆數錯「Google」、「poop」、「journalism」嘅字母之後，Google 承認「對 LLM 嚟講，數詞語入面嘅字母係一個已知嘅挑戰」。

使用 Studio Global AI 搜尋並查核事實瀏覽更多熱門頁面

Google AI Overview spelling errors explained: why LLMs fail at basic letter counting — What explains why Google's AI Overview makes basic spelling errors—such as claiming there are two Ps in "Google" or misspelling "journalism"Google's AI Overview confidently miscounts letters because of fundamental tokenization limitations in large language models.
AI 提示
Create a landscape editorial hero image for this Studio Global article: What explains why Google's AI Overview makes basic spelling errors—such as claiming there are two Ps in "Google" or misspelling "journalism". Article summary: Your diagnosis is essentially correct. Here is the full explanation, sourced to both the news reports and the AI research literature.. Topic tags: general, general web, user generated, academic. Reference image context from search candidates: Reference image 1: visual subject "# Google's AI Overview still can't spell, and the internet is very aware of it. A phone shows AI Overviews getting a spelling question wrong. Google's AI tools remain abysmal at an" source context "Google's AI Overview still can't spell, and the internet is very aware of it" Reference image 2: visual subject "# Google's AI Overview still can't spell, and the internet is very aware of it.
openai.com

2026 年 5 月底，有用戶發現 Google 嘅 AI Overview 功能犯咗啲小學雞級數嘅拼字錯誤。當你問佢「'Google' 呢個字入面有幾多個 'p'？」AI 會好有信心咁答「兩個」——事實係得一個。佢甚至話「journalism」呢個字有兩個「d」，然後喺同一個回覆入面將個字串成「j-o-u-r-n-a-d-i-s-m」。

Google 第二日就承認咗呢個錯誤，發聲明話：「對於大型語言模型（LLM）嚟講，數詞語入面嘅字母係一個已知嘅挑戰，我哋正努力解決呢個特定問題。」

呢啲並唔係隨機嘅 bug。呢個係每一個主流大型語言模型處理文字時，一個可以預見嘅後果——亦揭示咗一個短期內冇咁快可以修補到嘅盲點。

Tokenization 嘅問題：點解 LLM 唔係逐個字母閱讀

人類會將詞語理解為一連串獨立嘅字母。但 LLM 做嘅嘢完全唔同：佢將文字拆成 Token——呢啲 Token 可能係完整嘅詞語、子詞詞塊，或者偶爾係單個字符，具體取決於一個由類似「字節對編碼」（BPE）呢類演算法建立嘅預定義詞彙表。

詞語「Google」可能會被編碼成單一 Token ["Google"]，或者兩個 Token 例如


["Go", "ogle"]

，視乎個分詞器嘅詞彙表而定。但佢永遠唔會俾編碼成


["G", "o", "o", "g", "l", "e"]

——模型根本冇一個原生嘅方式去理解 Token 入面嘅個別字母。

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

使用 Studio Global AI 搜尋並查核事實

人們還問