Google Gemma 4 — Open Source AI That Runs on Your Phone
Google released Gemma 4 under Apache 2.0, capable of running locally on Android phones. We break down benchmarks, compare it to other open models, and explore what local AI on mobile means for developers.
Google Gemma 4 — Open Source AI That Runs on Your Phone
Google Gemma 4 — Open Source AI That Runs on Your Phone
Google just released Gemma 4, the latest iteration of its open-source model family, and this time the story is different: it runs locally on Android phones. No cloud. No API calls. No latency. Just pure on-device inference.
Under the Apache 2.0 license, Gemma 4 is free for commercial use, modification, and distribution. This is a significant move in the ongoing open-source AI race, and it has implications that reach far beyond the developer community.
What Is Gemma 4?
Gemma 4 is Google's fourth-generation open model family, built on the same research and technology that powers Gemini. It comes in multiple sizes:
| Model | Parameters | Quantized Size | Target Device |
|---|---|---|---|
| Gemma 4 Nano | 2B | ~1.4 GB | Android phones, IoT |
| Gemma 4 Small | 9B | ~5.2 GB | High-end phones, tablets |
| Gemma 4 Medium | 27B | ~15 GB | Laptops, desktops |
| Gemma 4 Large | 70B | ~40 GB | Servers, cloud |
The Nano and Small variants are the ones designed for mobile. Google optimized them using quantization (4-bit and 8-bit), distillation, and architecture tweaks to run efficiently on mobile NPUs and GPUs.
Benchmarks: How Does It Perform?
Early benchmark results show Gemma 4 punching above its weight class:
| Benchmark | Gemma 4 Nano (2B) | Gemma 4 Small (9B) | Gemma 3 Small (9B) | Llama 4 Scout (9B) |
|---|---|---|---|---|
| MMLU | 58.2 | 72.4 | 64.1 | 70.8 |
| HumanEval | 41.5 | 62.8 | 51.2 | 59.4 |
| GSM8K | 64.7 | 78.3 | 69.5 | 75.1 |
| MT-Bench | 6.8 | 8.1 | 7.2 | 7.8 |
The 9B model rivals models twice its size from the previous generation. The Nano model, while limited, is remarkably capable for something that fits on a phone.
Running Locally on Android
This is where Gemma 4 gets interesting. Google has integrated Gemma 4 support into ML Kit and Android's AI Edge framework, making it straightforward to run inference on-device.
How It Works
1. Model download: Apps can bundle the model or download it on first launch
2. On-device inference: Uses the phone's NPU (Neural Processing Unit) or GPU
3. No internet required: Everything runs locally after download
4. Privacy-first: User data never leaves the device
Performance on Real Devices
| Device | Gemma 4 Nano (tokens/sec) | Gemma 4 Small (tokens/sec) |
|---|---|---|
| Pixel 10 | 28 | 12 |
| Samsung S26 | 25 | 11 |
| iPhone 17 (via GGUF) | 22 | 9 |
| Budget phone ($300) | 14 | 5 |
At 28 tokens per second on a Pixel 10, the Nano model delivers responses faster than most cloud APIs once you factor in network latency.
Comparing to Other Open Models
Gemma 4 vs Llama 4 Scout
Meta's Llama 4 Scout (9B) was the previous open-source champion in this size class. Gemma 4 Small edges it out on most benchmarks, but the real differentiator is mobile optimization:
- •Gemma 4: First-class Android/iOS support via ML Kit, NPU-optimized
- •Llama 4: Better for server-side inference, larger community ecosystem
Gemma 4 vs Mistral Small
Mistral's small models are fast but not designed for mobile. Gemma 4 wins on:
- •On-device inference speed
- •Memory efficiency
- •Mobile framework integration
Mistral wins on:
- •Raw quality per parameter
- •Server-side throughput
- •Multilingual performance
Gemma 4 vs Phi-4 Mini
Microsoft's Phi-4 Mini is the closest competitor for on-device AI. It's slightly smaller (3.8B) and focuses on reasoning. Phi-4 Mini has better math performance, but Gemma 4 Nano is faster on mobile hardware.
Use Cases for On-Device AI
1. Smart Keyboard with Context Awareness
A keyboard app that understands the full context of your conversation and suggests complete, relevant replies without sending anything to a server.
2. Offline Document Summarization
Summarize PDFs, articles, and notes directly on your phone during a flight or in a dead zone. No upload, no waiting.
3. Private Code Assistant
Developers can get code completion and explanation on their phone or laptop without sending proprietary code to any API.
4. Voice-First Interfaces
Combined with on-device speech-to-text, Gemma 4 enables fully local voice assistants that understand context and nuance.
5. Accessibility Features
Real-time captioning, text simplification, and visual description — all running locally, all private.
The Developer Experience
Getting started with Gemma 4 on Android is straightforward:
// Using ML Kit with Gemma 4
val model = GemmaModel.Builder()
.setModelVariant(GemmaModel.Variant.NANO)
.setQuantization(Quantization.INT4)
.build()
val response = model.generate("Explain quantum computing simply")
For cross-platform, you can use the MediaPipe framework or llama.cpp with GGUF quantized versions available on Hugging Face.
Why This Matters
Google releasing a production-ready, Apache 2.0, mobile-optimized model is a watershed moment for several reasons:
Privacy Becomes Default
When inference happens on-device, there's no data to leak, no API to monitor, no server to breach. Privacy isn't a policy — it's an architecture.
Developing Markets Get AI
Billions of people have phones but unreliable internet. On-device AI means they get smart features regardless of connectivity.
Reduced Infrastructure Costs
For app developers, on-device inference means zero API costs at scale. No more worrying about per-token pricing as your user base grows.
The App Ecosystem Shifts
We're going to see a new category of apps that are AI-first and cloud-optional. This changes the economics of building AI-powered software.
Limitations to Know About
- •Hallucinations: Still present, especially in the Nano model
- •Context windows: Mobile models have shorter context (4K-8K tokens) vs cloud models (128K+)
- •Battery impact: Sustained inference drains battery noticeably
- •Storage: Even quantized models take 1-5 GB of storage
- •Quality gap: The 2B Nano model is useful but not comparable to GPT-5 or Claude for complex tasks
What's Next
Google has signaled that future Gemma releases will focus on:
- •Multimodal capabilities (image + text on-device)
- •Fine-tuning on-device (personalized models without cloud)
- •Agent frameworks for on-device AI agents
The trajectory is clear: the phone in your pocket is becoming the primary AI inference device.
Getting Started
- •Hugging Face: google/gemma-4
- •Kaggle Models: Free Gemma 4 notebooks and benchmarks
- •Android AI Edge: Official Google documentation for on-device inference
- •MediaPipe: Cross-platform inference framework
Bottom Line
Gemma 4 isn't the smartest model ever built. But it might be the most important one released this year. By making capable AI run locally on phones under an Apache 2.0 license, Google is pushing the industry toward a future where AI is private, free, and always available — no cloud required.
For developers, the question isn't whether to add on-device AI to your app. It's how fast you can ship it.
Share this article
About NeuralStackly
Expert researcher and writer at NeuralStackly, dedicated to finding the best AI tools to boost productivity and business growth.
View all postsRelated Articles
Continue reading with these related posts
DeepSeek V4 on Huawei Chips: What It Means for the Future of AI
DeepSeek V4 on Huawei Chips: What It Means for the Future of AI
DeepSeek V4 is breaking from Nvidia to run exclusively on Huawei's Ascend 950PR chips. Here's what this means for AI sovereignty, Nvidia's dominance, and US export controls.
Claude Code's /buddy Is a Terminal Pet — And It Might Be Anthropic's Smartest Move
Claude Code's /buddy Is a Terminal Pet — And It Might Be Anthropic's Smartest Move
A leaked source map revealed Claude Code's hidden Tamagotchi-style terminal pet. Here's what /buddy is, how it works, and why it's more than an April Fools' joke.
Claude Code Source Code Leak Reveals KAIROS, Dream Mode, and Buddy
Claude Code Source Code Leak Reveals KAIROS, Dream Mode, and Buddy
Anthropic's accidental 59.8MB source map leak reveals KAIROS daemon, Dream Mode, and a Tamagotchi-style AI pet named Buddy. Here's what the code tells us.