उत्तरप्रकाशित2 माह पहलेLast edited पिछला माह14 स्रोत

ZAYA1‑8B‑Diffusion‑Preview: जब AI एक‑एक नहीं, 16 टोकन एक साथ लिखता है

Zyphra का ZAYA1‑8B‑Diffusion‑Preview, उसके ZAYA1‑8B MoE मॉडल का diffusion संस्करण है जो एक बार में 16 टोकन बनाकर 4.6× से 7.7× तक decoding speedup का दावा करता है। यह तरीका पारंपरिक token‑by‑token autoregressive जनरेशन के बजाय parallel block generation का उपयोग करता है, जिससे GPU का उपयोग अधिक कुशल हो सकता है। अगर यह...

Studio Global AI के साथ खोजें और तथ्यों की जांच करें और ट्रेंडिंग पेज देखें

Conceptual illustration of a diffusion language model generating multiple tokens in parallel — What is Zyphra’s new ZAYA1-8B-Diffusion-Preview model, how does converting its autoregressive ZAYA1-8B into a Mixture-of-Experts diffusion lDiffusion-style language models can draft multiple tokens simultaneously instead of generating them sequentially.
AI संकेत
Create a landscape editorial hero image for this Studio Global article: What is Zyphra’s new ZAYA1-8B-Diffusion-Preview model, how does converting its autoregressive ZAYA1-8B into a Mixture-of-Experts diffusion l. Article summary: Zyphra’s ZAYA1-8B-Diffusion-Preview is an experimental diffusion-language version of its ZAYA1-8B MoE model, designed to decode blocks of text in parallel rather than strictly one token at a time. Zyphra claims it can ge. Topic tags: general, academic, general web, user generated. Reference image context from search candidates: Reference image 1: visual subject "# Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches Far Above Its Weight Class. Zyphra AI has released ZAYA1-8B, a small Mixture of Experts (MoE) langu" source context "Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches Far Above Its Weight Class
openai.com

तेज़ भाषा मॉडल के लिए diffusion आधारित तरीका

AI स्टार्टअप Zyphra ने ZAYA1‑8B‑Diffusion‑Preview नाम का एक प्रयोगात्मक भाषा मॉडल पेश किया है। इसकी खासियत यह है कि यह पारंपरिक autoregressive decoding को छोड़कर diffusion‑style generation का उपयोग करता है।

जहाँ अधिकांश भाषा मॉडल एक समय में सिर्फ एक टोकन (शब्द या शब्द का हिस्सा) बनाते हैं, वहीं यह मॉडल एक साथ 16 टोकन के ब्लॉक का मसौदा तैयार कर सकता है। इसी वजह से कंपनी के अनुसार decoding प्रक्रिया में लगभग 4.6× से 7.7× तक गति बढ़ सकती है, हालांकि वास्तविक प्रदर्शन उपयोग के तरीके पर निर्भर करेगा।

दिलचस्प बात यह है कि यह मॉडल शून्य से diffusion मॉडल के रूप में प्रशिक्षित नहीं किया गया। इसके बजाय, Zyphra ने अपने मौजूदा autoregressive मॉडल को ही diffusion शैली में कन्वर्ट किया। इससे यह संकेत मिलता है कि भविष्य में अन्य LLM भी इसी तरह बदले जा सकते हैं।

बेस मॉडल: ZAYA1‑8B

Diffusion‑Preview वास्तव में Zyphra के ZAYA1‑8B मॉडल पर आधारित है। यह एक mixture‑of‑experts (MoE) आर्किटेक्चर वाला reasoning मॉडल है।

इसमें कुल 8 अरब से अधिक पैरामीटर हैं, लेकिन inference के दौरान लगभग 760 मिलियन पैरामीटर ही सक्रिय होते हैं।

MoE मॉडल में कई छोटे‑छोटे "एक्सपर्ट" नेटवर्क होते हैं। हर टोकन के लिए उनमें से कुछ ही सक्रिय होते हैं। इससे मॉडल का प्रदर्शन बनाए रखते हुए कम्प्यूट लागत कम की जा सकती है।

Autoregressive मॉडल धीमे क्यों होते हैं

आज के अधिकांश बड़े भाषा मॉडल autoregressive generation का उपयोग करते हैं। इसका मतलब है कि हर नया टोकन पिछले पूरे टेक्स्ट पर निर्भर होता है।

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

Studio Global AI के साथ खोजें और तथ्यों की जांच करें

लोग पूछते भी हैं