Moonshot AI Releases Kimi K2.6 with 256K Context and 300-Agent Swarms

Moonshot AI released Kimi K2.6 on April 20, 2026, shipping an open-weight trillion-parameter Mixture-of-Experts model that leads headline agentic and coding benchmarks against GPT-5.4 and Claude Opus 4.6, while pushing its Agent Swarm system to 300 sub-agents running 4,000 coordinated steps. The weights are available on Hugging Face under a Modified MIT License, and the model is live across Kimi.com, the Kimi App, the official API, and the Kimi Code CLI.

Intermediate

Stylized visualization of a sparse Mixture-of-Experts model: a central glowing core surrounded by hundreds of inactive expert nodes and a small number of active experts, with an outer ring of sub-agent satellites.
Illustration generated by AI

What’s new in K2.6

K2.6 keeps the overall Kimi K2 architecture — a 1 trillion-parameter MoE with 32 billion activated parameters per token, 384 experts (8 selected plus 1 shared), 61 layers, 64 attention heads, and Multi-head Latent Attention (MLA) — but extends the context window to 256K tokens across all variants and folds in a 400M-parameter MoonViT vision encoder for native image and video input. The tokenizer retains a 160K vocabulary, and the model ships with native INT4 quantization support for efficient serving on vLLM, SGLang, and KTransformers.

Two inference modes are exposed: a Thinking Mode with full chain-of-thought reasoning (recommended temperature 1.0) and an Instant Mode for lower-latency responses (temperature 0.6, top-p 0.95). A preserve_thinking option lets agents carry reasoning traces across multi-turn tool-calling loops.

Benchmarks vs GPT-5.4 and Claude Opus 4.6

Moonshot positions K2.6 as a frontier-tier model on agentic and coding workloads. Key numbers reported at release:

  • SWE-Bench Pro: 58.6 — ahead of GPT-5.4 (57.7), Claude Opus 4.6 at max effort (53.4), Gemini 3.1 Pro (54.2), and K2.5 (50.7).
  • SWE-Bench Verified: 80.2.
  • Terminal-Bench 2.0: 66.7 (vs 50.8 for K2.5).
  • LiveCodeBench v6: 89.6.
  • Humanity’s Last Exam (HLE-Full, with tools): 54.0 — leading GPT-5.4 (52.1), Claude Opus 4.6 (53.0), and Gemini 3.1 Pro (51.4).
  • BrowseComp (Agent Swarm): 86.3, up from 78.4 on K2.5.
  • DeepSearchQA F1: 92.5.
  • AIME 2026 / HMMT 2026 / GPQA-Diamond: 96.4 / 92.7 / 90.5.
Kimi K2.6 brand logo from the official Hugging Face model card.
Image credit: Moonshot AI (Hugging Face)

Long-horizon coding and Agent Swarm

The most striking parts of the release are the long-horizon demonstrations. In one case study, K2.6 ran continuously for 12+ hours to port and optimize a Qwen3.5-0.8B inference engine in Zig on a Mac, making 4,000+ tool calls across 14 iterations and raising throughput from roughly 15 to 193 tokens per second — about 20% faster than LM Studio on the same hardware. In a second case, the model spent 13 hours refactoring exchange-core, an eight-year-old open-source financial matching engine, producing a 185% throughput gain in the medium-traffic profile and 133% in the performance profile after modifying more than 4,000 lines of code.

Agent Swarm, the multi-agent orchestration layer introduced in K2.5, now scales to 300 sub-agents and 4,000 coordinated steps (up from 100 and 1,500). Moonshot’s examples include 100 sub-agents matching a single CV against 100 California job listings and producing 100 tailored resumes, and a research pipeline that turned an astrophysics paper into a reusable skill and generated a 40-page report, a 20,000-entry dataset, and 14 astronomy charts. A new “Claw Groups” research preview lets agents running on different devices and different underlying models collaborate with a human in a shared workspace, with K2.6 acting as the adaptive coordinator.

Editorial image accompanying coverage of the Kimi K2.6 release.
Image credit: SiliconANGLE

What this means

K2.6 continues the trajectory set by Kimi K2 in mid-2025 and K2.5 earlier this year: an openly licensed Chinese model that trades blows with the leading US closed systems on coding and agentic benchmarks, released with weights and deployment recipes rather than an API-only posture. The 256K context, native video input, and a three-fold jump in swarm size make the model particularly interesting for long-running autonomous workflows — the kind of 12-hour, thousand-tool-call task that is still awkward to run on most frontier APIs. For developers and research groups that already built on K2.5, the reusable deployment configs and Kimi Code CLI make K2.6 close to a drop-in upgrade.

Related Coverage

Sources