On February 24, 2026, Alibaba’s Qwen team expanded the Qwen 3.5 family with three new open-weight models — Qwen3.5-27B, Qwen3.5-35B-A3B, and Qwen3.5-122B-A10B — each designed to deliver frontier-level intelligence at a fraction of the compute cost of the 397B flagship released a week earlier. The release makes a pointed argument: with the right architecture, smaller can outperform bigger.
The Qwen 3.5 Medium series ships three distinct models under the Apache 2.0 open-source license, available immediately on Hugging Face:
All three models share the same capabilities as the flagship: native multimodal understanding (text, images, video), 201 language support, built-in thinking mode with <think>...</think> reasoning traces, and native tool-calling for agentic applications.
The Qwen3.5-27B leads the medium lineup on several key evaluations despite its compact size:
The Qwen3.5-35B-A3B punches well above its weight at just 3B active parameters:
These scores put the 35B-A3B ahead of many 70B+ dense models from prior generations — a result of Alibaba’s emphasis on architectural efficiency and reinforcement learning during post-training rather than brute-force scaling.
All Qwen 3.5 models use a Gated Delta Network architecture — a hybrid that merges gating (which removes unnecessary data from memory) with the delta rule (which streamlines parameter updates). In the sparse MoE variants, this is paired with expert routing: the 35B-A3B, for instance, selects 8 routed experts plus 1 shared expert from 256 total for each token.
The practical benefit is significant. At native 262,144 token context, memory scales linearly rather than quadratically. For the 35B-A3B running locally, that means 1M-token contexts fit on a single 32GB GPU — opening up use cases like full codebase analysis or long-document research without cloud API costs.
The native context window is 262,144 tokens across all models, with YaRN scaling extending this to over 1 million tokens. Models ship in BF16 and support 4-bit quantization with near-lossless accuracy — the quantized 35B-A3B fits on consumer hardware with room to spare.
The models are supported by all major inference frameworks out of the box:
# Run Qwen3.5-35B-A3B with vLLM
vllm serve Qwen/Qwen3.5-35B-A3B \
--port 8000 \
--tensor-parallel-size 8 \
--max-model-len 262144 \
--reasoning-parser qwen3
SGLang, Hugging Face Transformers, llama.cpp, and MLX (Apple Silicon) are all supported. Quantized GGUF versions are available for llama.cpp users running fully locally.
The Qwen 3.5 Medium series makes frontier multimodal, agentic AI accessible to researchers and developers who cannot or do not want to depend on cloud APIs. The 35B-A3B model in particular is notable: at 3B active parameters with 35B total, it rivals models that would require multiple high-end GPUs when run as dense architectures.
The release also signals a broader shift in Alibaba’s Qwen strategy. Rather than releasing one headline model and stopping, the team is systematically filling the compute-efficiency frontier — giving users options from a 27B dense daily driver to a 397B agentic powerhouse, all under the same open license.
For researchers at institutions like NYU Shanghai, the Medium series offers a compelling local deployment story: capable enough for complex research tasks, efficient enough to run without datacenter resources.
