iOS 27 Siri's new brain: a Gemini-distilled on-device model, with Google Cloud fallback
TL;DR
iOS 27 Siri will run a small model distilled from Google's full Gemini for on-device queries; some requests will go to a licensed Gemini in Google Cloud. Apple pays Google ~$1B/year for it.
A week before WWDC 2026 (June 8), The Information published the core details of Apple's AI architecture.
Why find Google? Apple has prioritized on-device AI but hits hardware walls for larger AI systems. Google's full Gemini has trillions of parameters; the compute requirement is too high for Apple's Private Cloud Compute infrastructure to run.
Two-layer architecture. Apple uses Google's large Gemini to distill small models that run on iPhone, iPad, Mac — letting on-device AI features avoid every-request cloud calls. Distillation has limits, though. Some users' Siri requests still go to Google Cloud, processed on a licensed Gemini variant. Apple also pulls in NVIDIA AI chips to supplement compute.
Privacy. The new Siri's cloud requests run through Apple's Private Cloud Compute architecture, using stateless ephemeral Apple Silicon servers. Google can't access user data and can't use Siri interactions to train Gemini.
Cost. Apple and Google formally confirmed multi-year cooperation this January; Apple pays Google ~$1B/year for a custom Gemini built for Private Cloud Compute.
What else in iOS 27. Beyond the default Gemini, users can choose Claude, ChatGPT, or other AI services as Siri's «brain» via the new Extensions framework. Full experience expected with iOS 27 in September.
Apple uses Google's model to train its own model, then uses Google's cloud to run requests Apple's hardware can't — not a failure but a microcosm of pragmatic AI-era engineering.
via The Information
Why find Google? Apple has prioritized on-device AI but hits hardware walls for larger AI systems. Google's full Gemini has trillions of parameters; the compute requirement is too high for Apple's Private Cloud Compute infrastructure to run.
Two-layer architecture. Apple uses Google's large Gemini to distill small models that run on iPhone, iPad, Mac — letting on-device AI features avoid every-request cloud calls. Distillation has limits, though. Some users' Siri requests still go to Google Cloud, processed on a licensed Gemini variant. Apple also pulls in NVIDIA AI chips to supplement compute.
Privacy. The new Siri's cloud requests run through Apple's Private Cloud Compute architecture, using stateless ephemeral Apple Silicon servers. Google can't access user data and can't use Siri interactions to train Gemini.
Cost. Apple and Google formally confirmed multi-year cooperation this January; Apple pays Google ~$1B/year for a custom Gemini built for Private Cloud Compute.
What else in iOS 27. Beyond the default Gemini, users can choose Claude, ChatGPT, or other AI services as Siri's «brain» via the new Extensions framework. Full experience expected with iOS 27 in September.
Apple uses Google's model to train its own model, then uses Google's cloud to run requests Apple's hardware can't — not a failure but a microcosm of pragmatic AI-era engineering.
via The Information
