Google Releases Gemma 4: Frontier Open Models Under Apache 2.0

On April 2, 2026, Google DeepMind released Gemma 4 — its most capable open model family to date, purpose-built for advanced reasoning and agentic workflows. Available under a fully permissive Apache 2.0 license, Gemma 4 delivers frontier-class multimodal intelligence across four model sizes, from edge devices to high-performance workstations.

Intermediate

Gemma 4 promotional banner from Google
Image credit: Google Blog

Four Models, One Family

Gemma 4 ships in four variants, each targeting a different deployment scenario:

  • E2B (2.3B effective parameters) — ultra-compact edge model with 128K context, native audio input
  • E4B (4.5B effective parameters) — mid-range edge model with 128K context, native audio input
  • 26B A4B (Mixture-of-Experts: 26B total, 3.8B active) — MoE model with 256K context, only 4B parameters active per inference
  • 31B Dense (30.7B parameters) — the flagship dense model with 256K context window

All models are natively multimodal, processing text and images with variable resolution support. The edge models (E2B and E4B) additionally support native audio input for real-time speech understanding on-device. Video processing is supported across the family.

Benchmark Performance

Gemma 4’s flagship 31B dense model delivers impressive results that compete with models many times its size:

  • AIME 2026: 89.2% (31B) / 88.3% (26B A4B)
  • MMLU Pro: 85.2% (31B) / 82.6% (26B A4B)
  • LiveCodeBench v6: 80.0% (31B) / 77.1% (26B A4B)
  • GPQA Diamond: 84.3% (31B) / 82.3% (26B A4B)
  • Codeforces ELO: 2,150 (31B)
  • MMMU Pro (Vision): 76.9% (31B) / 73.8% (26B A4B)
  • MATH-Vision: 85.6% (31B) / 82.4% (26B A4B)

The 31B model currently ranks #3 among all open models on the LMArena text leaderboard (score ~1,452), with the 26B MoE close behind at #6 (~1,441) — despite using only 4B active parameters per forward pass.

Gemma 4 benchmark comparison chart
Image credit: Hugging Face
Gemma 4 performance comparison across model sizes
Image credit: Hugging Face

Architecture Innovations

Under the hood, Gemma 4 introduces several architectural advances built on the Gemini 3 foundation:

  • Alternating Attention: Interleaves local sliding-window attention (512–1024 tokens) with global full-context layers, balancing efficiency and long-range reasoning
  • Per-Layer Embeddings (PLE): A second embedding table feeds residual signals into every decoder layer for richer representations
  • Shared KV Cache: Later layers reuse key-value states from earlier ones, cutting memory without sacrificing quality
  • Dynamic Vision Encoder: Supports variable aspect ratios with configurable token budgets (70 to 1,120 tokens per image), letting developers trade off detail for speed

Apache 2.0 and the Open Model Landscape

Perhaps the most significant change from previous Gemma releases is the shift to a full Apache 2.0 license — granting unrestricted commercial use, modification, and redistribution with no royalties or usage restrictions. This positions Gemma 4 directly against other permissively licensed models like Meta’s Llama family and Mistral’s offerings.

The models are already available on Hugging Face with Day 1 support across major frameworks including Transformers, llama.cpp (GGUF quantizations), MLX for Apple Silicon, and ONNX for edge deployment. Google is also integrating Gemma 4 as the foundation for the next generation of Gemini Nano on Android devices.

Related Coverage

Sources