What separates this partnership from a standard ASIC design engagement is its scope. FuriosaAI and Broadcom are not just designing a faster chip; they are building a unified, rack-scale inference platform that addresses the full system-level bottlenecks of hyperscale AI data centers .
Charlie Kawwas, president of Broadcom's Semiconductor Solutions Group, framed the partnership around systems-level performance: "Inference performance is no longer defined solely by raw compute... By pairing Furiosa's TCP architecture with Broadcom's market-leading XPU Technology and IP Platform, Ethernet scale-up and fabric switches, we are building a platform that addresses the key bottlenecks of large-scale agentic AI" . The system is being built with an all-to-all-capable topology to handle the complex communication patterns required by Mixture-of-Experts (MoE) AI models
.
FuriosaAI enters this partnership with proven commercial silicon. Its second-generation chip, called RNGD, is already in mass production on TSMC's 5nm process . The RNGD is a 180W TDP PCIe card that delivers 512 teraFLOPS of FP8 performance with 48GB of HBM3 memory and 1.5 TB/s of bandwidth. While that represents roughly 1/9th the peak compute of an Nvidia B200, it does so at about 1/5th the power consumption
.
The RNGD has been validated by major Korean enterprises, including Samsung SDS and LG AI Research, where LG is running its Exaone model family on the hardware . This commercial traction gives the startup a foundation of credibility as it targets the global hyperscale market with its third-generation platform.
A core differentiator is FuriosaAI's software stack. The company's SDK uses a general compiler that automatically maps PyTorch code directly to its silicon, bypassing the need for hand-tuned CUDA kernels. Its Virtual ISA provides developers with low-level control without the complexity of GPU programming .
FuriosaAI's design philosophy is that traditional GPUs carry a "legacy tax" from their graphics origins. Their SIMT architecture, the company argues, struggles with the irregular memory access patterns common in modern AI inference workloads. Its Tensor Contraction Processor (TCP) is a clean-sheet architecture that prioritizes high-bandwidth data movement and massive tensor operations over thread management, aiming for superior performance-per-watt and token density in power-constrained data center racks .
The FuriosaAI deal is the latest in a sweeping custom-silicon strategy by Broadcom. In October 2025, OpenAI announced a multi-year partnership with Broadcom to co-develop and deploy a staggering 10 gigawatts of custom AI accelerators and networking hardware, with the first deployment targeted for the second half of 2026 using both 3nm and 2nm designs . Broadcom's roster of custom ASIC partners also includes Microsoft, Amazon, Meta, and Google, all of which are investing billions to design purpose-built chips for their specific AI workloads
.
This wave of partnerships reflects a structural shift in the market. According to research firm TrendForce, ASIC-based AI servers are projected to account for 27.8% of total AI server shipments in 2026, a multi-year high, and are forecast to grow to nearly 40% of the market by 2030 . The growth rate of custom AI chips is telling: TrendForce data shows custom AI chip shipments from cloud providers are on track to grow 44.6% in 2026, nearly triple the 16.1% growth rate projected for merchant GPUs
.
While Nvidia still holds roughly 70% of the AI chip market, its share is expected to decline as hyperscalers pivot toward custom silicon that can deliver better efficiency for their unique software stacks . The FuriosaAI–Broadcom platform is a direct play into this trend, attempting to leapfrog from a validated 180W inference card to a 2nm, Ethernet-fabric-based system designed for the world's largest data centers.
Comments
0 comments