On April 2, 2026, Google DeepMind released Gemma 4 — its most capable open model family to date, purpose-built for advanced reasoning and agentic workflows. Available under a fully permissive Apache 2.0 license, Gemma 4 delivers frontier-class multimodal intelligence across four model sizes, from edge devices to high-performance workstations.
Intermediate
Gemma 4 ships in four variants, each targeting a different deployment scenario:
All models are natively multimodal, processing text and images with variable resolution support. The edge models (E2B and E4B) additionally support native audio input for real-time speech understanding on-device. Video processing is supported across the family.
Gemma 4’s flagship 31B dense model delivers impressive results that compete with models many times its size:
The 31B model currently ranks #3 among all open models on the LMArena text leaderboard (score ~1,452), with the 26B MoE close behind at #6 (~1,441) — despite using only 4B active parameters per forward pass.
Under the hood, Gemma 4 introduces several architectural advances built on the Gemini 3 foundation:
Perhaps the most significant change from previous Gemma releases is the shift to a full Apache 2.0 license — granting unrestricted commercial use, modification, and redistribution with no royalties or usage restrictions. This positions Gemma 4 directly against other permissively licensed models like Meta’s Llama family and Mistral’s offerings.
The models are already available on Hugging Face with Day 1 support across major frameworks including Transformers, llama.cpp (GGUF quantizations), MLX for Apple Silicon, and ONNX for edge deployment. Google is also integrating Gemma 4 as the foundation for the next generation of Gemini Nano on Android devices.
