AnswersPublished19 hours agoLast edited 17 hours ago26 sources

Xiaomi MiMo Hits 1,000 Tokens/Second on a Trillion-Parameter Model Using Standard GPUs

Xiaomi and TileRT announced MiMo V2.5 Pro UltraSpeed in June 2026, the first trillion parameter model to break 1,000 tokens per second decode speed on a single standard 8 GPU server, not custom chips. The speed milestone is achieved through three coordinated techniques: FP4 mixed precision quantization targeting MoE...

Search & fact-check with Studio Global AI Browse more Trending pages

12K0

Conceptual visualization of Xiaomi MiMo-V2.5-Pro-UltraSpeed achieving over 1,000 tokens per second on a trillion-parameter model using standard GPUs. — What did Xiaomi announce on June 6, 2026 regarding MiMo-V2.5-Pro-UltraSpeed, including the specific tokens-per-second milestone achieved onA conceptual representation of high-speed AI inference on standard GPU hardware.
AI Prompt
Create a landscape editorial hero image for this Studio Global article: What did Xiaomi announce on June 6, 2026 regarding MiMo-V2.5-Pro-UltraSpeed, including the specific tokens-per-second milestone achieved on. Article summary: On **June 8, 2026** (with major reports appearing on June 9), Xiaomi's MiMo team, in collaboration with TileRT, announced **MiMo-V2.5-Pro-UltraSpeed** — a new high-speed inference mode for its trillion-parameter flagship. Topic tags: general, general web, user generated, documentation. Reference image context from search candidates: Reference image 1: visual subject "# Xiaomi rolls out MiMo V2.5 with multimodal AI and improved efficiency. Xiaomi has introduced its MiMo-V2.5 model family, adding multimodal capabilities and advancing its push int" source context "Xiaomi rolls out MiMo V2.5 with multimodal AI and improved efficiency" Reference image 2: visual subje
openai.com

On June 8, 2026, Xiaomi's MiMo team and inference partner TileRT released MiMo-V2.5-Pro-UltraSpeed, a high-speed inference mode for the MiMo-V2.5-Pro model family . The announcement centered on a single claim: a 1-trillion-parameter model hitting over 1,000 tokens per second — described by Xiaomi as a first at that scale — while running on a single standard 8-GPU commodity node rather than custom hardware .

The Speed Milestone

Xiaomi and TileRT reported sustained throughput above 1,000 tokens per second, with demos peaking near 1,200 tokens per second, on a standard 8-GPU server . The achievement breaks through what Xiaomi calls the industry's "impossible triangle" of speed, capability, and general-purpose GPU compatibility . MiMo CEO Lei Jun highlighted the milestone in a social post, describing it as the industry's first time crossing 1,000 tokens/s on a trillion-parameter model .

Studio Global AI

Search, cite, and publish your own answer

Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.

Xiaomi MiMo Hits 1,000 Tokens/Second on a Trillion-Parameter Model Using Standard GPUs

The Speed Milestone

Search, cite, and publish your own answer

People also ask

What is the short answer to "Xiaomi MiMo Hits 1,000 Tokens/Second on a Trillion-Parameter Model Using Standard GPUs"?

What are the key points to validate first?

What should I do next in practice?

Sources

Comments

Three Techniques Behind the Speedup

1. FP4 Mixed-Precision Quantization

2. DFlash Speculative Decoding

3. TileRT Persistent Kernel Engine with Warp Specialization

Pricing: "3× the Price, 10× the Output Experience"

Limited Trial Window and Access Rules

Open-Source Release

What This Means for Developers