Tencent Open-Sources HY-World 2.0: A Multi-Modal 3D World Model

April 16, 2026Provided by Utku Ege Tuluk

On April 15, 2026, Tencent open-sourced HY-World 2.0, a multi-modal world model framework that can reconstruct, generate, and simulate 3D worlds from text, images, or video. Unlike video-based world models that produce flat, non-editable pixel sequences, HY-World 2.0 outputs real 3D assets — meshes, Gaussian splattings, and point clouds — that can be directly imported into game engines like Unity, Unreal Engine, and Blender.

Intermediate

Image credit: Tencent HY-World 2.0 on Hugging Face

HY-World 2.0 teaser showing 3D world generation from text and image inputs — Image credit: Tencent HY-World 2.0 on Hugging Face

What Is HY-World 2.0?

HY-World 2.0 is the first open-source state-of-the-art 3D world model, delivering results that Tencent says are comparable to closed-source systems like World Labs’ Marble. The framework accepts diverse inputs — a text prompt, a single photograph, multi-view images, or video — and transforms them into navigable, editable 3D scenes.

The system consists of a four-stage pipeline:

HY-Pano 2.0 — generates 360-degree panoramas from text or images
WorldNav — plans camera trajectories through the generated scene
WorldStereo 2.0 — expands panoramas into full navigable 3D worlds
WorldMirror 2.0 — a unified feed-forward model (~1.2 billion parameters) that simultaneously predicts depth maps, surface normals, camera parameters, 3D point clouds, and 3DGS attributes in a single forward pass

HY-World 2.0 architecture overview showing the four-stage pipeline from input to 3D world output — Image credit: Tencent HY-World 2.0 on Hugging Face

Why 3D Assets Instead of Video?

Previous world models like Sora and Gen-3 produce pixel-based videos — visually impressive, but fundamentally limited. HY-World 2.0 takes a different approach by generating persistent, editable 3D assets. Here’s how the two approaches compare:

Aspect	Video World Models	HY-World 2.0
Output	Non-editable pixel videos	Editable meshes and 3DGS
Duration	Limited (under 1 minute)	Unlimited — assets persist permanently
3D Consistency	Poor (flickering, drifting)	Inherently consistent
Real-Time Rendering	Per-frame inference, high latency	Runs on consumer GPUs in real time
Engine Compatibility	Video files only	Direct import into Blender, UE, Unity, Isaac Sim

Performance Benchmarks

WorldStereo 2.0 achieves strong results on camera-controlled 3D generation. In rotation error, it scores 0.492° (down from 0.762° in v1), and translation error drops to 0.968m from 1.245m. On single-view 3D reconstruction benchmarks, it achieves an F1 score of 41.43 on Tanks-and-Temples and 51.27 on MipNeRF360.

WorldMirror 2.0 also shows strong reconstruction accuracy across standard benchmarks, scoring 0.012 accuracy and 0.016 completeness on 7-Scenes, outperforming methods like Pow3R and MapAnything. The model supports flexible input resolutions from 50K to 500K pixels.

WorldMirror 2.0 benchmark comparison showing reconstruction quality against competing methods — Image credit: Tencent HY-World 2.0 on Hugging Face

What’s Available Now — and What’s Coming

As of April 15, 2026, Tencent has released the technical report along with WorldMirror 2.0 inference code and model weights on Hugging Face and GitHub. The model requires Python 3.10 and CUDA 12.4, and supports both single-GPU and multi-GPU inference via FSDP. A Gradio web demo is included for quick experimentation.

Still coming soon: the full world generation inference code, HY-Pano 2.0 model weights, WorldNav code, and WorldStereo 2.0 model weights. The release is under Tencent’s HY-World 2.0 Community License.

What This Means

HY-World 2.0 represents a meaningful shift in how AI can create 3D environments. By producing actual 3D assets rather than video approximations, it bridges the gap between generative AI and practical 3D production workflows in gaming, simulation, robotics, and film. Its open-source release — with results rivaling closed-source alternatives like Marble — gives researchers and developers a powerful foundation for spatial AI applications. For those working with Tencent’s broader Hunyuan ecosystem, HY-World 2.0 complements existing tools for 3D object generation, portrait animation, and machine translation.

Related Coverage

Tencent Open-Sources HY-MT1.5: High-Performance Multilingual Translation Models — Tencent’s multilingual translation models from the Hunyuan family
Tencent Open-Sources HY-Motion 1.0: A Billion-Parameter Text-to-Motion AI Model — Another Hunyuan release for 3D motion generation
Tencent Hunyuan3D: A One-Stop AI 3D Content Creation Platform — Tencent’s broader 3D content creation suite