Apple’s long-awaited Siri overhaul isn’t a simple chatbot upgrade. It’s a complete re-architecture built on a three-layer AI stack: on-device distilled models, Private Cloud Compute (PCC), and a massive licensed Gemini model running on Google Cloud with Nvidia confidential computing. The entire project is framed around privacy, and it’s all set to be showcased at WWDC 2026.
On January 12, 2026, Apple and Google announced a multi-year collaboration worth roughly $1 billion per year. The core of the deal: the next generation of Apple Foundation Models (codenamed AFM v10) is built on a custom 1.2-trillion-parameter Gemini model—about eight times larger than Apple’s previous cloud-based AI models . The two companies stated that these models will power future Apple Intelligence features, including a more personalized Siri due later in 2026
.
Unlike Apple’s existing ChatGPT integration, which explicitly hands off complex queries to OpenAI with visible branding, the Gemini-powered Siri will run invisibly in the background. The experience remains white-labeled, with zero Google branding—users simply see “Siri” .
When a Siri query is too complex for on-device processing, Apple turns to the cloud—but not just any cloud. According to a May 28, 2026 report from The Information, Apple will run some Siri queries on a licensed version of Gemini inside Google Cloud . Apple has approved the use of Nvidia confidential computing for that cloud processing, a hardware-level security feature that keeps data encrypted even during AI inference
.
This marks a significant evolution from the original framing of the deal. Initially, it was understood that Gemini would run exclusively inside Apple’s own PCC infrastructure . The newer reporting clarifies that Apple struggled to get the full trillion-plus-parameter model working efficiently on its internal servers and has now turned to Google Cloud with Nvidia’s assistance for some cloud queries
. Importantly, user data is not retained by Google in this arrangement
.
Apple’s most strategic advantage in this deal isn’t cloud access—it’s model distillation. Google has given Apple “complete access” to the full Gemini model inside its own data centers, not merely API-level access . Apple can use that access to perform knowledge distillation: a process where the large Gemini “teacher” model generates high-quality responses and reveals internal reasoning steps, which are then used to train much smaller “student” models that run locally on Apple devices
.
These distilled models are optimized for Apple’s custom silicon (A-series and M-series chips) and can operate without an internet connection . Crucially, the student models learn to imitate Gemini’s internal computations, not just its surface-level outputs—producing more capable on-device AI than would be possible through simple fine-tuning
.
This is the core of Apple’s on-device AI strategy: Gemini-level reasoning for everyday queries without sending user data to any cloud server.
Despite the new reliance on Google Cloud and Nvidia chips for some queries, Apple is not abandoning Private Cloud Compute. On Apple’s Q1 2026 earnings call, CEO Tim Cook stated that the new Siri will “continue to run on the device and run in Private Cloud Compute” while maintaining Apple’s “industry-leading privacy standards” .
The branding distinction is important: PCC represents Apple’s own infrastructure, running on Apple Silicon servers with stateless, ephemeral computation where user data is never stored or accessible even by Apple . The newer Google Cloud arrangement using Nvidia confidential computing operates as a parallel, specialized cloud tier—still privacy-protected at the hardware level—but distinct from the PCC architecture Apple originally described
.
WWDC 2026 is expected to be Apple’s most AI-focused developer conference yet, with Siri’s redesign taking center stage . Apple’s messaging will emphasize that most Siri queries are handled on-device by distilled student models, providing instant responses, offline operation, and complete data privacy
.
According to multiple reports, Apple will also announce an “Extensions” framework in iOS 27 that lets users select their default AI engine (Gemini, ChatGPT, or Claude) for specific tasks, while Apple Intelligence remains the default privacy-first layer . Siri itself is being rebuilt as a full chatbot with a standalone app, iMessage-style chat interface, and Dynamic Island integration
.
The key competitive advantage Apple plans to emphasize: on-device processing means user data never leaves the device for the vast majority of queries—a claim that purely cloud-based assistants cannot match .
A note on sourcing: The most recent details—specifically Apple’s use of Nvidia confidential computing inside Google Cloud—come from The Information (May 28, 2026). While the outlet is credible, this is a single-report development. The earlier parts of the story, including the $1B deal structure, distillation rights, and PCC architecture, are corroborated by Apple’s own earnings call statements, the Google-Apple joint announcement, and multiple independent reports from Bloomberg and others.
Studio Global AI
Use this topic as a starting point for a fresh source-backed answer, then compare citations before you share it.
Apple’s 2026 Siri strategy combines a custom 1.2 trillion parameter Gemini model, on device AI via knowledge distillation, and Nvidia confidential computing on Google Cloud—all unified under Apple’s privacy first Priv...
Apple’s 2026 Siri strategy combines a custom 1.2 trillion parameter Gemini model, on device AI via knowledge distillation, and Nvidia confidential computing on Google Cloud—all unified under Apple’s privacy first Priv... The $1 billion per year deal gives Apple full access to the Gemini model for distillation, creating smaller ‘student’ models that run locally on Apple Silicon—meaning most Siri queries never leave the device [5][28][35].
Loading comments...
Comments
0 comments