## ๐ฅ Video
## ๐ฅ News
- **[April 15, 2026]**: ๐ Release HY-World 2.0 technical report & partial codes!
- **[April 15, 2026]**: ๐ค Open-source WorldMirror 2.0 inference code and model weights!
- **[Coming Soon]**: Release Full HY-World 2.0 (World Generation) inference code.
- **[Coming Soon]**: Release  (HY-Pano 2.0) model weights & code.
- **[Coming Soon]**: Release ๏ผWorldNav๏ผ code.
- **[Coming Soon]**: Release (WorldStereo 2.0) model weights & inference code.
## ๐ Table of Contents
- [๐ Introduction](#-introduction)
- [โจ Highlights](#-highlights)
- [๐งฉ Architecture](#-architecture)
- [๐ Open-Source Plan](#-open-source-plan)
- [๐ Model Zoo](#-model-zoo)
- [๐ค Get Started](#-get-started)
- [๐ฎ Performance](#-performance)
- [๐ฌ More Examples](#-more-examples)
- [๐ Citation](#-citation)
## ๐ Introduction
**HY-World 2.0** is a multi-modal world model framework for **world generation** and **world reconstruction**. It accepts diverse input modalities โ text, single-view images, multi-view images, and videos โ and produces 3D world representations (meshes / Gaussian Splattings). It offers two core capabilities:
- **World Generation** (text / single image → 3D world): syntheses high-fidelity, navigable 3D scenes through a four-stage method โโ a)  with HY-Pano 2.0, b)  with WorldNav, c)  with WorldStereo 2.0, and d)  with WorldMirror 2.0 & 3DGS learning.
- **World Reconstruction** (multi-view images / video → 3D): Powered by WorldMirror 2.0, a unified feed-forward model that simultaneously predicts depth, surface normals, camera parameters, 3D point clouds, and 3DGS attributes in a single forward pass.
HY-World 2.0 is the **first open-source state-of-the-art** 3D world model, delivering results comparable to closed-source methods such as Marble. We will release all model weights, code, and technical details to facilitate reproducibility and advance research in this field.
### Why 3D World Models?
Existing world models, such as Genie 3, Cosmos, and HY-World 1.5 (WorldPlay+WorldCompass), generate pixel-level videos โ essentially "watching a movie" that vanishes once playback ends. **HY-World 2.0 takes a fundamentally different approach**: it directly produces editable, persistent 3D assets (meshes / 3DGS) that can be imported into game engines like Blender/Unity/Unreal Engine/Isaac Sim โ more like "building a playable game" than recording a clip. This paradigm shift natively resolves many long-standing pain points of video world models:
| | Video World Models | 3D World Model (HY-World 2.0) |
|--|---|---|
| **Output** | Pixel videos (non-editable) | Real 3D assets โ meshes / 3DGS (fully editable) |
| **Playable Duration** | Limited (typically < 1 min) | Unlimited โ assets persist permanently |
| **3D Consistency** | Poor (flickering, artifacts across views) | Native โ inherently consistent in 3D |
| **Real-Time Rendering** | Requires per-frame inference; high latency | Consumer GPUs can render in real time |
| **Controllability** | Weak (imprecise character control, no real physics) | Precise โ zero-error control, real physics collision, accurate lighting |
| **Inference Cost** | Accumulates with every interaction | One-time generation; rendering cost โ 0 |
| **Engine Compatibility** | โ Video files only | โ Directly importable into Blender / UE / Isaac Engine |
| | $\color{IndianRed}{\textsf{Watch a video, then it's gone}}$ | $\color{RoyalBlue}{\textbf{Build a world, keep it forever}}$ |
All above are real 3D assets (not generated videos) and entirely created by HY-World 2.0 -- captured from live real-time interaction.
## โจ Highlights
- **Real 3D Worlds, Not Just Videos**
Unlike video-only world models (e.g., Genie 3, HY World 1.5), HY-World 2.0 generates **real 3D assets** โ 3DGS, meshes, and point clouds โ that are freely explorable, editable, and directly importable into **Unity / Unreal Engine / Isaac**. From a single text prompt or image, create navigable 3D worlds with diverse styles: realistic, cartoon, game, and more.
- **Instant 3D Reconstruction from Photos & Videos**
Powered by **WorldMirror 2.0**, a unified feed-forward model that predicts dense point clouds, depth maps, surface normals, camera parameters, and 3DGS from multi-view images or casual videos in a single forward pass. Supports flexible-resolution inference (50Kโ500K pixels) with SOTA accuracy. Capture a video, get a digital twin.
- **Interactive Character Exploration**
Go beyond viewing โ **play inside your generated worlds**. HY-World 2.0 supports first-person navigation and third-person character mode, enabling users to freely explore AI-generated streets, buildings, and landscapes with physics-based collision. Go to [our product page]() for free try.
## ๐งฉ Architecture
- **Refer to our tech report for more details**
A systematic pipeline of HY-World 2.0 โ *Panorama Generation* (HY-Pano-2.0) → *Trajectory Planning* (WorldNav) → *World Expansion* (WorldStereo 2.0) → *World Composition* (WorldMirror 2.0 + 3DGS) โ that automatically transforms text or a single image into a high-fidelity, navigable 3D world (3DGS/mesh outputs).
## ๐ Open-Source Plan
- โ Technical Report
- โ WorldMirror 2.0 Code & Model Checkpoints
- โฌ Full Inference Code for World Generation (WorldNav + World Composition)
- โฌ Panorama Generation (HY-Pano 2.0) Model & Code โ [HunyuanWorld 1.0](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0) available as interim alternative
- โฌ World Expansion (WorldStereo 2.0) Model & Code โ [WorldStereo](https://github.com/FuchengSu/WorldStereo) available as interim alternative
## ๐ Model Zoo
### World Reconstruction โ WorldMirror Series
| Model | Description | Params | Date | Hugging Face |
|-------|-------------|--------|------|--------------|
| WorldMirror 2.0 | Multi-view / video → 3D reconstruction | ~1.2B | 2026 | [Download](https://huggingface.co/tencent/HY-World-2.0/tree/main/HY-WorldMirror-2.0) |
| WorldMirror 1.0 | Multi-view / video → 3D reconstruction (legacy) | ~1.2B | 2025 | [Download](https://huggingface.co/tencent/HunyuanWorld-Mirror/tree/main) |
### Panorama Generation
| Model | Description | Params | Date | Hugging Face |
|-------|-------------|--------|------|--------------|
| HY-PanoGen | Text / image → 360ยฐ panorama | โ | Coming Soon | โ |
### World Generation
| Model | Description | Params | Date | Hugging Face |
|-----------------|-------------|-----|------|--------------|
| WorldStereo 2.0 | Panorama → navigable 3DGS world | โ | Coming Soon | โ |
We recommend referring to our previous works, [WorldStereo](https://github.com/FuchengSu/WorldStereo) and [WorldMirror](https://github.com/Tencent-Hunyuan/HunyuanWorld-Mirror), for background knowledge on world generation and reconstruction.
## ๐ค Get Started
### Install Requirements
We recommend CUDA 12.4 for installation.
```bash
# 1. Clone the repository
git clone https://github.com/Tencent-Hunyuan/HY-World-2.0
cd HY-World-2.0
# 2. Create conda environment
conda create -n hyworld2 python=3.10
conda activate hyworld2
# 3. Install PyTorch (CUDA 12.4)
pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124
# 4. Install dependencies
pip install -r requirements.txt
# 5. Install FlashAttention
# (Recommended) Install FlashAttention-3
git clone https://github.com/Dao-AILab/flash-attention.git
cd flash-attention/hopper
python setup.py install
cd ../../
rm -rf flash-attention
# For simpler installation, you can also use FlashAttention-2
pip install flash-attn --no-build-isolation
```
### Code Usage โ Panorama Generation (HY-Pano-2)
*Coming soon.*
### Code Usage โ World Generation (WorldNav, WorldStereo-2, and 3DGS)
*Coming soon.*
**We recommend referring to our previous work, [WorldStereo](https://github.com/FuchengSu/WorldStereo), for the open-source preview version of WorldStereo-2.**
### Code Usage โ WorldMirror 2.0
WorldMirror 2.0 supports the following usage modes:
- [Code Usage](#code-usage--worldmirror-20)
- [Gradio App](#gradio-app--worldmirror-20)
We provide a `diffusers`-like Python API for WorldMirror 2.0. Model weights are automatically downloaded from Hugging Face on first run.
```python
from hyworld2.worldrecon.pipeline import WorldMirrorPipeline
pipeline = WorldMirrorPipeline.from_pretrained('tencent/HY-World-2.0')
result = pipeline('path/to/images')
```
**With Prior Injection (Camera & Depth):**
```python
result = pipeline(
'path/to/images',
prior_cam_path='path/to/prior_camera.json',
prior_depth_path='path/to/prior_depth/',
)
```
> For the detailed structure of camera/depth priors and how to prepare them, see [Prior Preparation Guide](DOCUMENTATION.md#prior-injection).
**CLI:**
```bash
# Single GPU
python -m hyworld2.worldrecon.pipeline --input_path path/to/images
# Multi-GPU
torchrun --nproc_per_node=2 -m hyworld2.worldrecon.pipeline \
--input_path path/to/images \
--use_fsdp --enable_bf16
```
> **Important:** In multi-GPU mode, the number of input images must be **>= the number of GPUs**. For example, with `--nproc_per_node=8`, provide at least 8 images.
### Gradio App โ WorldMirror 2.0
We provide an interactive [Gradio](https://www.gradio.app/) web demo for WorldMirror 2.0. Upload images or videos and visualize 3DGS, point clouds, depth maps, normal maps, and camera parameters in your browser.
```bash
# Single GPU
python -m hyworld2.worldrecon.gradio_app
# Multi-GPU
torchrun --nproc_per_node=2 -m hyworld2.worldrecon.gradio_app \
--use_fsdp --enable_bf16
```
For the full list of Gradio app arguments (port, share, local checkpoints, etc.), see [DOCUMENTATION.md](DOCUMENTATION.md#gradio-app).
## ๐ฎ Performance
For full benchmark results, please refer to the [technical report](https://3d-models.hunyuan.tencent.com/world/).
### WorldStereo 2.0 โ Camera Control
### WorldMirror 2.0 โ Point Map Reconstruction
**Point Map Reconstruction on 7-Scenes, NRGBD, and DTU.** We report the mean Accuracy and Completeness of WorldMirror under different input configurations. **Bold** results are best. "L / M / H" denote low / medium / high inference resolution. "+ all priors" denotes injection of camera extrinsics, camera intrinsics, and depth priors.
Method
7-Scenes (scene)
NRGBD (scene)
DTU (object)
Acc. โ
Comp. โ
Acc. โ
Comp. โ
Acc. โ
Comp. โ
WorldMirror 1.0
L
0.043
0.055
0.046
0.049
1.476
1.768
L + all priors
0.021
0.026
0.022
0.020
1.347
1.392
M
0.043
0.049
0.041
0.045
1.017
1.780
M + all priors
0.018
0.023
0.016
0.014
0.735
0.935
H
0.079
0.087
0.077
0.093
2.271
2.113
H + all priors
0.042
0.041
0.078
0.082
1.773
1.478
WorldMirror 2.0
L
0.041
0.052
0.047
0.058
1.352
2.009
L + all priors
0.019
0.024
0.017
0.015
1.100
1.201
M
0.033
0.046
0.039
0.047
1.005
1.892
M + all priors
0.013
0.017
0.013
0.013
0.690
0.876
H
0.037
0.040
0.046
0.053
0.845
1.904
H + all priors
0.012
0.016
0.015
0.016
0.554
0.771
### WorldMirror 2.0 โ Prior Comparison
**Comparison with Pow3R and MapAnything under Different Prior Conditions.** Results are averaged on 7-Scenes, NRGBD, and DTU datasets. Pow3R (pro) refers to the original Pow3R with Procrustes alignment.
## ๐ฌ More Examples
## ๐ Documentation
For detailed usage guides, parameter references, output format specifications, and prior injection instructions, see **[DOCUMENTATION.md](DOCUMENTATION.md)**.
## ๐ Citation
If you find HunyuanWorld 2.0 useful for your research, please cite:
```bibtex
@article{hyworld22026,
title={HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds},
author={Tencent HY-World Team},
journal={arXiv preprint},
year={2026}
}
@article{hunyuanworld2025tencent,
title={HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels},
author={Team HunyuanWorld},
year={2025},
journal={arXiv preprint}
}
```
## ๐ง Contact
Please send emails to tengfeiwang12@gmail.com for questions or feedback.
## ๐ Acknowledgements
We would like to thank [HunyuanWorld 1.0](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0), [WorldMirror](https://github.com/Tencent-Hunyuan/HunyuanWorld-Mirror), [WorldPlay](https://github.com/Tencent-Hunyuan/HY-WorldPlay), [WorldStereo](https://github.com/FuchengSu/WorldStereo), [HunyuanImage](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0) for their great work.