मुझे अभ्यास में आगे क्या करना चाहिए?

Anthropic ने इस approach को Claude 3.5 Haiku पर लागू किया और इसे ‘AI biology’ पढ़ने जैसा बताया, साथ ही माना कि मौजूदा tools अभी केवल internal pathways के कुछ हिस्से ही दिखाते हैं [9][10]।

studioglobal

← Back to Trending

उत्तरप्रकाशित2 माह पहलेLast edited 2 माह पहले12 स्रोत

Anthropic Claude की छिपी reasoning को कैसे मैप कर रहा है

Anthropic Claude की hidden reasoning को समझने के लिए internal activations को ‘features’ और फिर ‘circuits’ में मैप कर रहा है; कंपनी इसे AI ‘microscope’ की दिशा में प्रगति बताती है [9][10]। यह तरीका सिर्फ Claude के अंतिम जवाब को परखने के बजाय उन computational pathways को देखने की कोशिश करता है, जो input words को outpu...

Studio Global AI के साथ खोजें और तथ्यों की जांच करें और ट्रेंडिंग पेज देखें

Abstract illustration of an AI microscope examining Claude’s hidden internal reasoning circuits — Anthropic’s AI Microscope: How Claude’s Hidden Reasoning Is Being MappedAnthropic’s interpretability work aims to map parts of Claude’s internal computation into human-legible features and circuits.
AI संकेत
Create a landscape editorial hero image for this Studio Global article: Anthropic’s AI Microscope: How Claude’s Hidden Reasoning Is Being Mapped. Article summary: Anthropic’s 2025 interpretability work tries to make Claude’s hidden reasoning legible by mapping internal activations into “features” and linking them into “circuits”; it is progress toward an AI “microscope,” not a.... Topic tags: ai, anthropic, claude, ai safety, ai transparency. Reference image context from search candidates: Reference image 1: visual subject "### Anthropic Develops AI 'Microscope' to Reveal the Hidden Mechanics of LLM Thought. Anthropic has unveiled new research tools designed to provide a rare glimpse into the hidden r" source context "Anthropic Develops AI 'Microscope' to Reveal the Hidden Mechanics of LLM Thought -- Campus Technology" Reference image 2: visual subject "Late 2024, Anthropic published a p
openai.com

← Back to Trending

उत्तरप्रकाशित2 माह पहलेLast edited 2 माह पहले12 स्रोत

Anthropic Claude की छिपी reasoning को कैसे मैप कर रहा है

Studio Global AI के साथ खोजें और तथ्यों की जांच करें और ट्रेंडिंग पेज देखें

Anthropic का Claude बाहर से जितना सहज और बातचीत करने वाला दिखता है, अंदर से वह उतना ही जटिल computational system है। कंपनी की नई interpretability कोशिश को सरल भाषा में ऐसे समझा जा सकता है: Claude से यह पूछना काफी नहीं है कि उसने कोई जवाब क्यों दिया; Anthropic उसके भीतर चल रही गणनाओं के कुछ हिस्सों को सीधे पढ़ने लायक बनाना चाहता है ।

AI ‘माइक्रोस्कोप’ का मतलब क्या है

बड़े भाषा मॉडल, यानी large language models, अपने जवाबों के साथ कोई साफ-सुथरी human-readable रसीद नहीं देते कि हर शब्द तक पहुंचने के लिए उन्होंने कौन-सा रास्ता लिया। Anthropic के मुताबिक, किसी मॉडल की response strategy हर शब्द के लिए होने वाली अरबों computations में encoded होती है, और ये computations मॉडल बनाने वालों के लिए भी अपने-आप समझ में आने वाली नहीं होतीं ।

यहीं ‘माइक्रोस्कोप’ वाला रूपक काम आता है। Anthropic यह दावा नहीं कर रहा कि उसे Claude के मन में लिखा कोई छिपा हुआ paragraph या private chain-of-thought मिल गया है। कंपनी का लक्ष्य ऐसे tools बनाना है जिनसे शोधकर्ता Claude के लिखे हुए जवाबों के नीचे चल रही computation के कुछ हिस्सों को देख, नाम दे और test कर सकें ।

पहला कदम: activations को ‘features’ में बदलना

AI model के अंदर raw numbers और activations की बहुत बड़ी दुनिया होती है। आम पाठक के लिए इसे ऐसे समझें: जैसे किसी मशीन के भीतर हजारों संकेत जल-बुझ रहे हों, लेकिन हमें नहीं पता कि कौन-सा संकेत किस बात से जुड़ा है। Anthropic की earlier interpretability work इसी अंधेरे में recognizable concepts खोजने पर केंद्रित रही है, जिन्हें कंपनी ‘features’ कहती है ।

एक feature, broadly, internal activity का ऐसा pattern है जिसे researchers किसी concept, behavior या signal के रूप में पहचानने और test करने की कोशिश करते हैं । इससे मॉडल को सिर्फ opaque numbers की दीवार मानने के बजाय, उसके अंदर सक्रिय होने वाले कुछ concepts पर पकड़ बनती है ।

यानी सवाल सिर्फ यह नहीं रहता कि Claude ने क्या कहा। सवाल यह भी बनता है कि जवाब बनाते समय उसके भीतर कौन-से concepts या signals सक्रिय हुए ।

दूसरा कदम: features को ‘circuits’ में जोड़ना

Anthropic का नया जोर सिर्फ अलग-अलग features खोजने पर नहीं, बल्कि उन्हें computational ‘circuits’ में जोड़ने पर है । कंपनी के अनुसार, यह काम input words से output words तक जाने वाले pathway के कुछ हिस्सों को उजागर करने की कोशिश है ।

यह फर्क अहम है। एक अकेला feature बता सकता है कि मॉडल के अंदर कोई concept मौजूद है या activate हुआ है। लेकिन circuit यह समझने में मदद कर सकता है कि कई internal components एक-दूसरे को कैसे प्रभावित करते हैं और मिलकर किसी response को आकार देते हैं । Reasoning जैसी दिखने वाली behavior में यही रास्ता—कौन-सा signal किसके बाद और किससे जुड़कर असर डालता है—बहुत मायने रखता है।

Claude 3.5 Haiku पर test: ‘AI biology’ पढ़ने की कोशिश

मार्च 2025 में Anthropic ने कहा कि वह दो papers साझा कर रहा है: एक paper feature-level work को circuit tracing तक बढ़ाता है, और दूसरा इसी toolset को Claude 3.5 Haiku पर लागू करता है । Claude 3.5 Haiku वाली study में सरल tasks के deep studies किए गए, जिन्हें Anthropic ने दस महत्वपूर्ण model behaviors के प्रतिनिधि उदाहरणों की तरह पेश किया ।

कंपनी ने इस काम को ‘AI biology’ देखने जैसा कहा । यह phrase बताता है कि Anthropic सिर्फ बाहर से model को grade नहीं करना चाहता—जैसे answer सही है या नहीं, भाषा fluently लिखी गई है या नहीं, safety rules follow हुए या नहीं। वह अंदर के mechanisms को समझना चाहता है, ताकि यह पता चले कि model किसी खास तरह से behave क्यों करता है ।

Claude से explanation मांगना और उसके अंदर झांकना अलग बातें हैं

जब आप Claude से पूछते हैं, ‘तुमने ऐसा जवाब क्यों दिया?’, तो उसका explanation भी आखिरकार generated text ही होता है। Anthropic की interpretability research उस text को पैदा करने वाली underlying computations की तरफ जाती है ।

इसीलिए circuit tracing अलग तरह का evidence है। यह prompt देकर model से reasoning सुनने की कोशिश नहीं है। यह neural activity को अधिक समझने योग्य structures में translate करने वाले tools के जरिए computational pathway के हिस्सों को inspect करने की कोशिश है ।

यह क्या दिखा सकता है—और क्या नहीं

इस approach से Claude के internals के कुछ हिस्से ज्यादा legible हो सकते हैं: कौन-से features relevant लगते हैं, वे features कैसे जुड़े हैं, और response बनाने में कौन-से pathways शामिल दिखते हैं । इससे researchers final output के साथ-साथ internal mechanisms की भी तुलना कर सकते हैं, बजाय इसके कि वे केवल model के जवाब पर निर्भर रहें ।

लेकिन Anthropic की अपनी framing भी सावधान है। कंपनी इसे ‘microscope’ की दिशा में progress और input words से output words तक जाने वाले pathway के ‘parts’ reveal करने वाला काम बताती है । इसलिए मौजूदा tools को Claude की हर computation का complete decoder या model के अंदर ‘क्या सोचा गया’ इसका भरोसेमंद transcript मान लेना गलत होगा ।

निष्कर्ष

Anthropic Claude की hidden reasoning को ज्यादा समझने योग्य बनाने के लिए तीन स्तरों पर काम कर रहा है: internal activations को interpretable features में बदलना, उन features को circuits के रूप में trace करना, और फिर इस map को real model behaviors पर apply करना । नतीजा Claude की computation का एक partial scientific map है—पूरी mind-reading नहीं, और हर जवाब की complete explanation भी नहीं ।

Anthropic Claude की छिपी reasoning को कैसे मैप कर रहा है

Anthropic Claude की छिपी reasoning को कैसे मैप कर रहा है

AI ‘माइक्रोस्कोप’ का मतलब क्या है

पहला कदम: activations को ‘features’ में बदलना

दूसरा कदम: features को ‘circuits’ में जोड़ना

Claude 3.5 Haiku पर test: ‘AI biology’ पढ़ने की कोशिश

Claude से explanation मांगना और उसके अंदर झांकना अलग बातें हैं

यह क्या दिखा सकता है—और क्या नहीं

निष्कर्ष

Search, cite, and publish your own answer

लोग पूछते भी हैं

"Anthropic Claude की छिपी reasoning को कैसे मैप कर रहा है" का संक्षिप्त उत्तर क्या है?

सबसे पहले सत्यापित करने योग्य मुख्य बिंदु क्या हैं?

मुझे अभ्यास में आगे क्या करना चाहिए?

सूत्र