答案已發布2 個月前Last edited 2 個月前12 來源

Anthropic 點樣用 AI「顯微鏡」拆解 Claude 的隱藏推理

Anthropic 用「機制可解釋性」將 Claude 部分內部活化拆成 features，再追蹤 features 如何組成 circuits；呢個係走向 AI「顯微鏡」嘅工程，唔係完整讀心術 [9][10]。重點由只睇 Claude 最終答咗乜，轉向檢視輸入字詞點樣經過部分內部路徑，變成輸出字詞 [9][10]。

使用 Studio Global AI 搜尋並查核事實瀏覽更多熱門頁面

Abstract illustration of an AI microscope examining Claude’s hidden internal reasoning circuits — Anthropic’s AI Microscope: How Claude’s Hidden Reasoning Is Being MappedAnthropic’s interpretability work aims to map parts of Claude’s internal computation into human-legible features and circuits.
AI 提示
Create a landscape editorial hero image for this Studio Global article: Anthropic’s AI Microscope: How Claude’s Hidden Reasoning Is Being Mapped. Article summary: Anthropic’s 2025 interpretability work tries to make Claude’s hidden reasoning legible by mapping internal activations into “features” and linking them into “circuits”; it is progress toward an AI “microscope,” not a.... Topic tags: ai, anthropic, claude, ai safety, ai transparency. Reference image context from search candidates: Reference image 1: visual subject "### Anthropic Develops AI 'Microscope' to Reveal the Hidden Mechanics of LLM Thought. Anthropic has unveiled new research tools designed to provide a rare glimpse into the hidden r" source context "Anthropic Develops AI 'Microscope' to Reveal the Hidden Mechanics of LLM Thought -- Campus Technology" Reference image 2: visual subject "Late 2024, Anthropic published a p
openai.com

如果你以為 Anthropic 係喺 Claude 入面搵一段「真正心聲」逐字讀出，咁就誤會咗。佢哋做緊嘅，更似係造一套科研儀器：用機制可解釋性（mechanistic interpretability）工具，將 Claude 產生回答時某些內部運算變得可以觀察、可以測試；Anthropic 自己形容，這是朝住 AI「顯微鏡」邁進。

先搞清楚：AI「顯微鏡」唔係讀心術

大型語言模型唔會自帶一份人類睇得明的說明書，解釋自己每個字點樣生成。Anthropic 指出，模型每寫一個字詞背後，都有以十億計的運算；即使係模型開發者，若無專門工具，這些運算本身都難以直接看懂。

所以「顯微鏡」呢個比喻好關鍵。Anthropic 不是話自己可以揭開 Claude 一段隱藏的私人 chain-of-thought，而是想建立工具，讓研究員檢視 Claude 書面答案底下的部分計算過程。

第一層地圖：將 activations 變成「features」

Anthropic 較早前的可解釋性研究，集中喺模型內部尋找可以理解的概念；佢哋稱之為「features」。

用白話講，模型運作時會產生大量內部訊號，原本睇落只係一堵不透明的數字牆。Feature 就似一個把手：研究員可以為某種內部活動模式命名、觀察、測試，而唔係只見到一堆無法解讀的數值。

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

使用 Studio Global AI 搜尋並查核事實

人們還問