Moonshot AI Releases Kimi K2.5 with Agent Swarm and Frontier Vision

Moonshot AI released Kimi K2.5 on January 27, 2026, a major upgrade to its open-weight model lineup that combines a trillion-parameter Mixture-of-Experts architecture with native multimodal reasoning and a novel multi-agent coordination system called Agent Swarm. The release positions Kimi K2.5 as one of the most capable openly available models available today, outperforming GPT-5.2 and Claude Opus 4.5 on several key benchmarks.

Kimi K2.5 logo by Moonshot AI
Image credit: Moonshot AI via Hugging Face

Architecture and Scale

Kimi K2.5 is built on a Mixture-of-Experts (MoE) architecture with 1 trillion total parameters, of which only 32 billion are activated per token — a design that enables frontier-level performance while remaining practical to deploy. The model has 61 layers (including one dense layer), 384 experts with 8 selected per token, and uses Multi-head Latent Attention (MLA) with a 256K-token context window. Its vocabulary covers 160,000 tokens.

Crucially, vision was not bolted on as an afterthought. Moonshot trained K2.5 from the start on 15 trillion tokens mixing visual and textual data, with a 400-million-parameter MoonViT-3D vision encoder. This means image and video understanding developed alongside language reasoning rather than being integrated post hoc. The model processes text, images, and video in a unified representation and can generate code, documents, spreadsheets, and presentations directly from visual inputs.

Agent Swarm: Parallel AI Coordination

The most distinctive feature of Kimi K2.5 is Agent Swarm, a multi-agent orchestration system that decomposes complex tasks into subtasks executed by up to 100 parallel sub-agents in real time. Each sub-agent can specialize — one may handle research, another coding, another analysis — and the orchestrator synthesizes their results.

To train this capability, Moonshot developed a technique called Parallel Agent Reinforcement Learning (PARL), which freezes sub-agent weights while training only the orchestrator model. This approach addresses key challenges in multi-agent RL: training instability, credit-assignment ambiguity across agents, and a failure mode they call “serial collapse,” where the orchestrator defaults to running agents sequentially rather than in parallel.

The performance gains from Agent Swarm are substantial. On BrowseComp — a benchmark measuring the ability to find specific information across the web — Kimi K2.5 jumps from 60.6% in single-agent mode to 78.4% with Agent Swarm, outperforming GPT-5.2 Pro. On WideSearch, the F1 score rises from 72.7% to 79.0%, exceeding Claude Opus 4.5. Across qualifying tasks, wall-clock execution time drops by 3–4.5× due to parallelization.

Benchmark Performance

Beyond agentic tasks, Kimi K2.5 posts strong results across math, coding, vision, and long-context benchmarks:

  • AIME 2025: 96.1
  • HMMT 2025 (February): 95.4
  • GPQA-Diamond: 87.6
  • HLE-Full (with tools): 50.2 — above GPT-5.2’s 45.5
  • SWE-Bench Verified: 76.8
  • LiveCodeBench v6: 85.0
  • OCRBench: 92.3
  • MathVista: 90.1
  • VideoMME: 87.4
  • MMMU-Pro: 78.5

Across 17 image and video benchmarks, Kimi K2.5 achieved the top score on 9, competing against models including GPT-5.2 set to extended thinking and Gemini 3 Pro.

Four Operating Modes

The model ships with four modes: Instant for quick responses, Thinking for complex step-by-step reasoning, Agent for structured content generation (documents, spreadsheets, presentations), and Agent Swarm (currently in beta) for large-scale parallel tasks. This tiered interface lets users match compute and latency to their task complexity.

Open Weights and Availability

Kimi K2.5 model weights are publicly available on Hugging Face and GitHub under a modified MIT license. The model is supported by vLLM, SGLang, and KTransformers for inference, requiring at minimum transformers version 4.57.1. A free web interface is available at kimi.com, with API access through the Moonshot developer platform.

What This Means

Kimi K2.5 represents a meaningful step in the open-weight AI landscape. Its Agent Swarm capability — backed by a purpose-built training regime rather than a prompt-engineering wrapper — suggests that multi-agent coordination is becoming a first-class feature of frontier models rather than a research prototype. For developers and researchers, the combination of 256K context, strong vision grounding, and open weights makes K2.5 a compelling base for building complex agentic applications without proprietary dependencies.

The model also reinforces the continued competitiveness of Chinese AI labs. Moonshot, backed by Alibaba, has now released two consecutive open-weight models that benchmark at or above GPT- and Claude-class performance on key evaluations — a trend that shows no sign of slowing.

Related Coverage

Sources