Alibaba’s Qwen team has just released Qwen3‑235B‑A22B‑Instruct‑2507, an enhanced instruction-tuned large language model with 235B total parameters (22B active), designed for chat, reasoning, coding, and long-context understanding.
Specification | Details |
---|---|
Architecture | Mixture-of-Experts (MoE), causal LLM |
Parameter Size | 235 B total, 22 B activated |
Layers & Heads | 94 layers; 64 Q-heads, 4 K/V |
Experts | 128 experts, 8 active per token |
Context Length | 262,144 tokens (≈256K) |
Inference Mode | Non-thinking (no <think>...</think> ) (Hugging Face) |
License | Apache 2.0 (open-source) (Hugging Face, GitHub) |
On industry-standard benchmarks, Qwen3‑235B‑A22B‑Instruct‑2507 shows strong results:
The model benchmarks very competitively with top open-source and closed-source systems. (arXiv)
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "Qwen/Qwen3-235B-A22B-Instruct-2507"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
# Generate with up to 16 K tokens
Supports accelerated inference via SGLang, vLLM, vLLM, Ollama, llama.cpp, LMStudio, and more. (Hugging Face, GitHub)
Also available in FP8 quantized format for improved speed and memory efficiency. (Hugging Face)
From r/LocalLLaMA:
“I’ve been kinda disappointed in Qwen3‑235’s non‑thinking quality… now, an inherent non‑thinking, improved Qwen3‑235B? It feels like a dream come true.” (Reddit)
Users appreciate performance gains and native non-thinking behavior, though some remain skeptical about real-world advantages over closed-source models. (Hugging Face)