The Wan2.2 project (hosted at Wan‑Video/Wan2.2) marks another leap forward in large‑scale, open, consumer‑accessible video generative models developed by the Wan‑AI team at Alibaba Cloud (GitHub). Released on July 28, 2025, this upgrade offers major technical advancements over the previous Wan2.1 version (Hugging Face).
Wan2.2’s A14B model uses a Mixture‑of‑Experts (MoE) mechanism that combines two specialized expert sub‑models: one for early high‑noise denoising and another for later fine‑detail refinement. While the model totals ~27 billion parameters, only ~14 billion are active per inference step—maintaining inference cost while boosting capacity and performance (Hugging Face).
The training pipeline includes finely labeled aesthetic datasets (lighting, composition, color maps, contrast), enabling precise control over cinematic styles in generated video outputs (Hugging Face).
Compared to Wan2.1, Wan2.2 is trained with 65.6% more images and 83.2% more videos, significantly improving generalizability across motion patterns, semantics, and visual quality. As a result, it leads across both open‑source and closed‑source benchmarks on Wan‑Bench 2.0 (Hugging Face).
The TI2V‑5B variant integrates text‑to‑video (T2V) and image‑to‑video (I2V) capabilities within one high‑compression model. Its custom Wan2.2‑VAE achieves a 64× compression, enabling 720p output at 24 fps in under ~9 minutes on an RTX 4090—among the fastest for that resolution running on consumer hardware (Hugging Face).
Model | Configuration | Supports | Notes |
---|---|---|---|
A14B (MoE) | 2×14B experts | Text‑to‑Video, Image‑to‑Video | Excellent quality, similar memory cost to single expert (Hugging Face, Wan AI) |
TI2V‑5B | Dense 5B + high‑compression VAE | Unified T2V & I2V | Efficient, 720p@24fps on single consumer GPU (Hugging Face, Hugging Face) |
git clone https://github.com/Wan-Video/Wan2.2.git
cd Wan2.2
pip install -r requirements.txt
Model download is supported via Hugging Face or ModelScope CLI. Examples: Wan2.2-T2V-A14B
, I2V-A14B
, Wan2.2-TI2V-5B
. The TI2V‑5B model supports both T2V and I2V at 720p (Hugging Face).
Inference scripts offer options for prompt extension and memory offloading to optimize VRAM use. ComfyUI and Diffusers integration is already available as of the July 28 release (GitHub).
Wan2.2 opened access for community projects. It has already been integrated into ComfyUI workflows and Hugging Face Spaces. Ongoing development includes plans for multi‑GPU inference support, more model checkpoints, and full integration with ComfyUI and Diffusers (GitHub).
Community discussion on GitHub includes questions about video length limits and MoE tuning. The project is under active development, with new issues and contributions appearing daily (GitHub).
Wan2.2 delivers a major upgrade over Wan2.1 in video generative performance, efficiency, and aesthetic control. With both high‑fidelity MoE models and a streamlined high‑compression variant, it balances quality with accessibility. For anyone interested in working with open video generation models, Wan2.2 is a standout release.