When a Siri query is too complex for on-device processing, Apple turns to the cloud—but not just any cloud. According to a May 28, 2026 report from The Information, Apple will run some Siri queries on a licensed version of Gemini inside Google Cloud . Apple has approved the use of Nvidia confidential computing for that cloud processing, a hardware-level security feature that keeps data encrypted even during AI inference
.
This marks a significant evolution from the original framing of the deal. Initially, it was understood that Gemini would run exclusively inside Apple’s own PCC infrastructure . The newer reporting clarifies that Apple struggled to get the full trillion-plus-parameter model working efficiently on its internal servers and has now turned to Google Cloud with Nvidia’s assistance for some cloud queries
. Importantly, user data is not retained by Google in this arrangement
.
Apple’s most strategic advantage in this deal isn’t cloud access—it’s model distillation. Google has given Apple “complete access” to the full Gemini model inside its own data centers, not merely API-level access . Apple can use that access to perform knowledge distillation: a process where the large Gemini “teacher” model generates high-quality responses and reveals internal reasoning steps, which are then used to train much smaller “student” models that run locally on Apple devices
.
These distilled models are optimized for Apple’s custom silicon (A-series and M-series chips) and can operate without an internet connection . Crucially, the student models learn to imitate Gemini’s internal computations, not just its surface-level outputs—producing more capable on-device AI than would be possible through simple fine-tuning
.
This is the core of Apple’s on-device AI strategy: Gemini-level reasoning for everyday queries without sending user data to any cloud server.
Despite the new reliance on Google Cloud and Nvidia chips for some queries, Apple is not abandoning Private Cloud Compute. On Apple’s Q1 2026 earnings call, CEO Tim Cook stated that the new Siri will “continue to run on the device and run in Private Cloud Compute” while maintaining Apple’s “industry-leading privacy standards” .
The branding distinction is important: PCC represents Apple’s own infrastructure, running on Apple Silicon servers with stateless, ephemeral computation where user data is never stored or accessible even by Apple . The newer Google Cloud arrangement using Nvidia confidential computing operates as a parallel, specialized cloud tier—still privacy-protected at the hardware level—but distinct from the PCC architecture Apple originally described
.
WWDC 2026 is expected to be Apple’s most AI-focused developer conference yet, with Siri’s redesign taking center stage . Apple’s messaging will emphasize that most Siri queries are handled on-device by distilled student models, providing instant responses, offline operation, and complete data privacy
.
According to multiple reports, Apple will also announce an “Extensions” framework in iOS 27 that lets users select their default AI engine (Gemini, ChatGPT, or Claude) for specific tasks, while Apple Intelligence remains the default privacy-first layer . Siri itself is being rebuilt as a full chatbot with a standalone app, iMessage-style chat interface, and Dynamic Island integration
.
The key competitive advantage Apple plans to emphasize: on-device processing means user data never leaves the device for the vast majority of queries—a claim that purely cloud-based assistants cannot match .
A note on sourcing: The most recent details—specifically Apple’s use of Nvidia confidential computing inside Google Cloud—come from The Information (May 28, 2026). While the outlet is credible, this is a single-report development. The earlier parts of the story, including the $1B deal structure, distillation rights, and PCC architecture, are corroborated by Apple’s own earnings call statements, the Google-Apple joint announcement, and multiple independent reports from Bloomberg and others.
Comments
0 comments