Tencent Open-Sources HY-World 2.0: A Multi-Modal 3D World Model

On April 15, 2026, Tencent open-sourced HY-World 2.0, a multi-modal world model framework that can reconstruct, generate, and simulate 3D worlds from text, images, or video. Unlike video-based world models that produce flat, non-editable pixel sequences, HY-World 2.0 outputs real 3D assets — meshes, Gaussian splattings, and point clouds — that can be directly imported into game engines like Unity, Unreal Engine, and Blender.

Intermediate

HY-World 2.0 teaser showing 3D world generation from text and image inputs
Image credit: Tencent HY-World 2.0 on Hugging Face

What Is HY-World 2.0?

HY-World 2.0 is the first open-source state-of-the-art 3D world model, delivering results that Tencent says are comparable to closed-source systems like World Labs’ Marble. The framework accepts diverse inputs — a text prompt, a single photograph, multi-view images, or video — and transforms them into navigable, editable 3D scenes.

The system consists of a four-stage pipeline:

  1. HY-Pano 2.0 — generates 360-degree panoramas from text or images
  2. WorldNav — plans camera trajectories through the generated scene
  3. WorldStereo 2.0 — expands panoramas into full navigable 3D worlds
  4. WorldMirror 2.0 — a unified feed-forward model (~1.2 billion parameters) that simultaneously predicts depth maps, surface normals, camera parameters, 3D point clouds, and 3DGS attributes in a single forward pass
HY-World 2.0 architecture overview showing the four-stage pipeline from input to 3D world output
Image credit: Tencent HY-World 2.0 on Hugging Face

Why 3D Assets Instead of Video?

Previous world models like Sora and Gen-3 produce pixel-based videos — visually impressive, but fundamentally limited. HY-World 2.0 takes a different approach by generating persistent, editable 3D assets. Here’s how the two approaches compare:

Aspect Video World Models HY-World 2.0
Output Non-editable pixel videos Editable meshes and 3DGS
Duration Limited (under 1 minute) Unlimited — assets persist permanently
3D Consistency Poor (flickering, drifting) Inherently consistent
Real-Time Rendering Per-frame inference, high latency Runs on consumer GPUs in real time
Engine Compatibility Video files only Direct import into Blender, UE, Unity, Isaac Sim

Performance Benchmarks

WorldStereo 2.0 achieves strong results on camera-controlled 3D generation. In rotation error, it scores 0.492° (down from 0.762° in v1), and translation error drops to 0.968m from 1.245m. On single-view 3D reconstruction benchmarks, it achieves an F1 score of 41.43 on Tanks-and-Temples and 51.27 on MipNeRF360.

WorldMirror 2.0 also shows strong reconstruction accuracy across standard benchmarks, scoring 0.012 accuracy and 0.016 completeness on 7-Scenes, outperforming methods like Pow3R and MapAnything. The model supports flexible input resolutions from 50K to 500K pixels.

WorldMirror 2.0 benchmark comparison showing reconstruction quality against competing methods
Image credit: Tencent HY-World 2.0 on Hugging Face

What’s Available Now — and What’s Coming

As of April 15, 2026, Tencent has released the technical report along with WorldMirror 2.0 inference code and model weights on Hugging Face and GitHub. The model requires Python 3.10 and CUDA 12.4, and supports both single-GPU and multi-GPU inference via FSDP. A Gradio web demo is included for quick experimentation.

Still coming soon: the full world generation inference code, HY-Pano 2.0 model weights, WorldNav code, and WorldStereo 2.0 model weights. The release is under Tencent’s HY-World 2.0 Community License.

What This Means

HY-World 2.0 represents a meaningful shift in how AI can create 3D environments. By producing actual 3D assets rather than video approximations, it bridges the gap between generative AI and practical 3D production workflows in gaming, simulation, robotics, and film. Its open-source release — with results rivaling closed-source alternatives like Marble — gives researchers and developers a powerful foundation for spatial AI applications. For those working with Tencent’s broader Hunyuan ecosystem, HY-World 2.0 complements existing tools for 3D object generation, portrait animation, and machine translation.

Related Coverage

Sources