答案已發布19 小時前Last edited 17 小時前26 個來源

小米 MiMo 創紀錄：萬億參數模型推理速度突破 1000 tokens/秒，僅憑 8 張通用 GPU

小米 MiMo 與 TileRT 於 2026 年 6 月聯合發布 UltraSpeed 模式，在單一台標準 8 卡 GPU 節點上，首度讓萬億參數模型突破 1000 tokens/s 的生成速度，無需依賴定制晶片。速度提升的背後是三大關鍵技術：只針對 MoE 專家層做 FP4 混合精度量化、DFlash 區塊級並行推測解碼，以及 TileRT 常駐核心引擎搭配執行緒束分工的異構流水線協作。

使用 Studio Global AI 搜尋並查證事實瀏覽更多熱門頁面

12K0

Conceptual visualization of Xiaomi MiMo-V2.5-Pro-UltraSpeed achieving over 1,000 tokens per second on a trillion-parameter model using standard GPUs. — What did Xiaomi announce on June 6, 2026 regarding MiMo-V2.5-Pro-UltraSpeed, including the specific tokens-per-second milestone achieved onA conceptual representation of high-speed AI inference on standard GPU hardware.
AI 提示詞
Create a landscape editorial hero image for this Studio Global article: What did Xiaomi announce on June 6, 2026 regarding MiMo-V2.5-Pro-UltraSpeed, including the specific tokens-per-second milestone achieved on. Article summary: On **June 8, 2026** (with major reports appearing on June 9), Xiaomi's MiMo team, in collaboration with TileRT, announced **MiMo-V2.5-Pro-UltraSpeed** — a new high-speed inference mode for its trillion-parameter flagship. Topic tags: general, general web, user generated, documentation. Reference image context from search candidates: Reference image 1: visual subject "# Xiaomi rolls out MiMo V2.5 with multimodal AI and improved efficiency. Xiaomi has introduced its MiMo-V2.5 model family, adding multimodal capabilities and advancing its push int" source context "Xiaomi rolls out MiMo V2.5 with multimodal AI and improved efficiency" Reference image 2: visual subje
openai.com

2026 年 6 月 8 日，小米 MiMo 技術團隊聯合推理系統夥伴 TileRT，正式發布了 MiMo-V2.5-Pro-UltraSpeed 模式——一個為 MiMo-V2.5-Pro 模型家族量身打造的高速推理方案。官方公告的核心是一項驚人的數據：在規格完全標準的 8 卡 GPU 通用伺服器上，一個擁有萬億參數的模型，生成了超越每秒 1000 個 token 的速度。小米集團 CEO 雷軍在個人社群平台上強調，這是業界首次在該規模的模型上達成此成就。

速度里程碑的真相

不同於許多仰賴昂貴定制晶片的方案，小米與 TileRT 回報的吞吐量高於每秒 1000 tokens，在某些展示中峰值甚至逼近每秒 1200 tokens，而這一切就發生在你我熟悉的標準 8 卡 GPU 節點上。小米將這個成果形容為打破了業界長期以來的「不可能三角」：想要速度快、模型能力強，就非得使用專用硬體不可的迷思。

MiMo CEO 雷軍在一篇社群貼文中慶祝這個里程碑，並將其稱為「業界首次」在萬億參數模型上跨越 1000 tokens/s 的門檻。

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

使用 Studio Global AI 搜尋並查證事實

大家也會問