답변게시됨2개월 전Last edited 지난달23 소스

구글 AI, ‘구글’ 철자도 못 쓰는 근본적인 이유

구글 AI 오버뷰는 ‘Google’의 ‘p’가 두 개라고 확신에 차서 틀린 답을 내놨다. 이는 LLM이 단어를 ‘토큰’ 덩어리로 처리할 뿐, 낱개 글자를 전혀 보지 못하기 때문에 발생하는 현상이다. 구글은 ‘단어 속 글자 수 세기는 LLM의 알려진 난제’라며 문제를 인정했지만, 연구자들은 이것이 트랜스포머 구조 자체의 태생적 맹점이라고 지적한다.

Studio Global AI로 검색 및 팩트체크 인기 페이지 더 보기

Google AI Overview spelling errors explained: why LLMs fail at basic letter counting — What explains why Google's AI Overview makes basic spelling errors—such as claiming there are two Ps in "Google" or misspelling "journalism"Google's AI Overview confidently miscounts letters because of fundamental tokenization limitations in large language models.
AI 프롬프트
Create a landscape editorial hero image for this Studio Global article: What explains why Google's AI Overview makes basic spelling errors—such as claiming there are two Ps in "Google" or misspelling "journalism". Article summary: Your diagnosis is essentially correct. Here is the full explanation, sourced to both the news reports and the AI research literature.. Topic tags: general, general web, user generated, academic. Reference image context from search candidates: Reference image 1: visual subject "# Google's AI Overview still can't spell, and the internet is very aware of it. A phone shows AI Overviews getting a spelling question wrong. Google's AI tools remain abysmal at an" source context "Google's AI Overview still can't spell, and the internet is very aware of it" Reference image 2: visual subject "# Google's AI Overview still can't spell, and the internet is very aware of it.
openai.com

2026년 5월 말, 이용자들은 구글의 생성형 검색 기능 ‘AI 오버뷰(AI Overview)’가 6세 아동 수준의 철자법 실수를 저지른다는 사실을 발견했다. “Google이라는 단어에 ‘p’가 몇 개 들어 있나요?”라는 질문에 AI는 자신 있게 “두 개”라고 답했다. 물론 정답은 한 개다. 같은 답변에서 AI는 ‘journalism’에 ‘d’가 두 개 들어간다며 철자를 ‘j-o-u-r-n-a-d-i-s-m’이라고 제시하기도 했다 .

구글은 하루 뒤 오류를 인정하며 “단어 속 글자 수 세기는 LLM(대규모 언어 모델)의 알려진 난제이며, 현재 이 문제를 해결하기 위해 작업 중”이라는 공식 입장을 내놓았다 .

하지만 이 실수들은 단순한 시스템 결함이 아니다. 모든 주요 대규모 언어 모델이 텍스트를 처리하는 방식에서 비롯된, 예측 가능한 구조적 맹점의 결과물이다. 그리고 이 맹점은 당분간 패치로 해결되기 어려워 보인다.

‘토큰화’의 함정: AI는 왜 글자를 읽지 못하는가

인간은 단어를 개별 글자(문자)의 연속으로 인식한다. 하지만 LLM은 근본적으로 다른 방식으로 접근한다. 텍스트를 **토큰(token)**이라고 불리는 조각들로 분해하는데, 이 토큰은 BPE(Byte Pair Encoding) 같은 알고리즘으로 미리 구축된 어휘집에 따라 완전한 단어일 수도, 하위 단어 조각일 수도, 때로는 단일 문자일 수도 있다 .

예를 들어 “Google”이라는 단어는 토크나이저의 종류에 따라 ["Google"]이라는 하나의 토큰으로, 혹은


["Go", "ogle"]

같은 두 개의 토큰으로 인코딩된다. 하지만 처럼 개별 문자로 인코딩되는 경우는 없다. 모델은 토큰 내부에 있는 개별 문자들에 대한 원초적인 표상 자체를 가지지 못한다 .

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

Studio Global AI로 검색 및 팩트체크

사람들은 또한 묻습니다.