The second critical piece is the ShuffleBox, a completely passive optical device. One of the long-standing objections to random graph networks has been cabling complexity: connecting everything in a flat random topology seems like an unmanageable tangle. The ShuffleBox solves this by internally shuffling fiber connections, making the cabling complexity comparable to a fat-tree — and it consumes zero power .
RNG's shift from hierarchical to flat yields hard numbers that explain why AWS made the transition its new default. AWS reports that RNG networks use 69% fewer routers than equivalent fat-tree designs while delivering up to 33% better throughput . Network equipment electricity consumption drops by roughly 40%, driven largely by eliminating entire powered switch tiers
.
The cost savings range from 9% to 45% depending on scale and configuration . At AWS's size, those percentages translate into massive absolute numbers, with some projections estimating cumulative savings of up to $200 billion
.
The rollout began quietly. AWS deployed RNG in production for the first time in Dublin, Ireland in late 2024 . Throughout 2025, the company expanded the architecture to additional European sites, including data centers in Spain and Germany
. By April 2026, RNG was no longer an experiment — the research paper states plainly that "RNG is now the default datacenter network for most workloads at Amazon"
.
Importantly, this was not a disruptive, big-bang migration. AWS handled the transition as a gradual, transparent infrastructure upgrade that sits below the virtualization layer. EC2 instances, VPCs, load balancers, and every other AWS service see the same logical network they always have. Customers did not need to change a single line of code, security group rule, or API call — the physical and routing layers changed underneath them .
RNG has implications that go far beyond an interesting academic paper. A 9–45% network cost reduction at AWS scale creates a structural pricing advantage. AWS can choose to pass those savings to customers through lower prices, or reinvest them into AI compute capacity while competitors still bear the higher cost of fat-tree infrastructure .
The performance gain is especially meaningful for AI and machine learning workloads. Distributed training across hundreds or thousands of GPUs is notoriously sensitive to network bottlenecks. The 33% throughput improvement and vastly richer path diversity directly accelerate training runs, improve GPU utilization, and reduce time-to-completion for large models . When even minor network delays can add days to a training job, a flat network with edge-disjoint paths becomes a genuine competitive weapon
.
For the rest of the cloud industry, RNG represents a raised bar. Google Cloud, Microsoft Azure, and other hyperscalers predominantly still use Clos and fat-tree variants. AWS's demonstrated ability to deploy random-graph networking at production scale — turning a decades-old theoretical idea into operational reality — pressures rivals to either develop similar flattening approaches or accept permanently higher per-bit networking costs .
Comments
0 comments