Apple and Google On-Device AI Models Make Cloud Inference Optional in 2026
Apple's third-generation Foundation Models, unveiled at WWDC on June 8, 2026, and Google's Gemma 4 family, released on April 2, 2026, mark a turning point for on-device artificial intelligence. Apple's model stores around 20 billion parameters in flash memory but activates only one to four billion per request using a technique called Instruction-Following Pruning, keeping RAM usage low. Google's Gemma 4 edge models apply similar efficiency tricks, including per-layer embeddings and mixture-of-experts architectures, to run capable AI within tight memory budgets. Because inference now happens entirely on-device, the marginal cost per query is effectively zero, removing the per-token billing that made agentic and high-frequency AI features economically impractical. The shift also delivers offline functionality and stronger data privacy, since user data never leaves the device to reach a remote server.
This is an AI-generated summary. ShortSingh links to the original source for the complete article.
Discussion (0)
Log in to join the discussion and vote.
Log in