Alibaba’s Qwen team released Qwen 3.5 on February 16, 2026, marking a significant architectural leap with its flagship 397B-parameter mixture-of-experts model built for the agentic AI era. Unlike previous generations where vision was bolted on as an afterthought, Qwen 3.5 was trained from scratch on text, images, and video simultaneously — making it one of the first truly native multimodal foundation models capable of autonomous action across digital environments.
The flagship Qwen3.5-397B-A17B model deploys a sparse Mixture-of-Experts (MoE) architecture that activates only 17 billion parameters per forward pass despite having 397 billion total parameters. This design, combined with a hybrid attention mechanism that fuses Gated Delta Networks (linear attention) with standard sparse MoE layers, allows the model to achieve remarkable inference efficiency — approximately 45 tokens per second on an 8×H100 GPU cluster.
Key technical specifications include:
Inference throughput compared to Qwen3-Max is 8.6× faster at a 32K context length and 19× faster at 256K context — a difference that makes long-document and multimodal workflows substantially more practical at scale.
Qwen 3.5 posts competitive numbers across a wide range of evaluations:
The model surpasses Claude Opus 4.5 on multimodal benchmarks and posts competitive results against GPT-5.2, while remaining fully open-weight and available for local deployment.
The headline capability distinguishing Qwen 3.5 from prior models is its visual agentic interface control. Because the model was trained natively on UI screenshots alongside text and video, it can interpret and interact with graphical interfaces — clicking buttons, filling forms, and executing multi-step workflows across mobile and desktop applications without human intervention.
This positions Qwen 3.5 as a direct competitor to agent-oriented models like Anthropic’s Computer Use and Google’s Project Mariner. The model can process images up to 1344×1344 resolution and 60-second video clips, enabling it to watch a screen recording and then reproduce the demonstrated workflow autonomously.
Alibaba reports approximately 60% lower inference cost per token compared to its predecessor, with the hosted Qwen3.5-Plus API priced at around $0.18 per million tokens. The open-weight model is available on Hugging Face under Apache 2.0, meaning developers can download, fine-tune, and self-host it on their own infrastructure.
Both deployment options are live: the open-weight Qwen3.5-397B-A17B for self-hosting and a hosted “Qwen3.5-Plus” variant for API access with the extended 1M token context. The broad language support — 201 languages versus 82 in the previous generation — combined with the native multimodal architecture makes it one of the more versatile open frontier models currently available.
