Commit ·
97c4df8
0
Parent(s):
Duplicate from tencent/HY-World-2.0
Browse filesCo-authored-by: TencentOpen <TencentOpen@users.noreply.huggingface.co>
- .gitattributes +53 -0
- DOCUMENTATION.md +458 -0
- DOCUMENTATION_zh.md +458 -0
- HY-WorldMirror-2.0/README.md +10 -0
- HY-WorldMirror-2.0/config.json +37 -0
- HY-WorldMirror-2.0/model.safetensors +3 -0
- License.txt +82 -0
- README.md +490 -0
- README_zh.md +475 -0
- assets/hyworld2_en.mp4 +3 -0
- assets/interactive.gif +3 -0
- assets/mesh.gif +3 -0
- assets/mesh_en.gif +3 -0
- assets/overview.png +3 -0
- assets/prior_comparison2_wm2.png +0 -0
- assets/qrcode/discord.png +0 -0
- assets/qrcode/wechat.png +0 -0
- assets/qrcode/x.png +0 -0
- assets/qrcode/xiaohongshu.png +0 -0
- assets/recon.gif +3 -0
- assets/recon_en.gif +3 -0
- assets/screenshot_1.gif +3 -0
- assets/screenshot_10.gif +3 -0
- assets/screenshot_2.gif +3 -0
- assets/screenshot_3.gif +3 -0
- assets/screenshot_4.gif +3 -0
- assets/screenshot_5.gif +3 -0
- assets/screenshot_6.gif +3 -0
- assets/screenshot_7.gif +3 -0
- assets/screenshot_8.gif +3 -0
- assets/screenshot_9.gif +3 -0
- assets/teaser.png +3 -0
.gitattributes
ADDED
|
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
assets/interactive.gif filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
assets/mesh_en.gif filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
assets/mesh.gif filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
assets/overview.png filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
assets/recon.gif filter=lfs diff=lfs merge=lfs -text
|
| 41 |
+
assets/screenshot_1.gif filter=lfs diff=lfs merge=lfs -text
|
| 42 |
+
assets/screenshot_10.gif filter=lfs diff=lfs merge=lfs -text
|
| 43 |
+
assets/screenshot_2.gif filter=lfs diff=lfs merge=lfs -text
|
| 44 |
+
assets/screenshot_3.gif filter=lfs diff=lfs merge=lfs -text
|
| 45 |
+
assets/screenshot_4.gif filter=lfs diff=lfs merge=lfs -text
|
| 46 |
+
assets/screenshot_5.gif filter=lfs diff=lfs merge=lfs -text
|
| 47 |
+
assets/screenshot_6.gif filter=lfs diff=lfs merge=lfs -text
|
| 48 |
+
assets/screenshot_7.gif filter=lfs diff=lfs merge=lfs -text
|
| 49 |
+
assets/screenshot_8.gif filter=lfs diff=lfs merge=lfs -text
|
| 50 |
+
assets/screenshot_9.gif filter=lfs diff=lfs merge=lfs -text
|
| 51 |
+
assets/teaser.png filter=lfs diff=lfs merge=lfs -text
|
| 52 |
+
assets/hyworld2_en.mp4 filter=lfs diff=lfs merge=lfs -text
|
| 53 |
+
assets/recon_en.gif filter=lfs diff=lfs merge=lfs -text
|
DOCUMENTATION.md
ADDED
|
@@ -0,0 +1,458 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# HunyuanWorld 2.0 — Documentation
|
| 2 |
+
This document provides detailed usage guides, parameter references, and output format specifications for each component of HunyuanWorld 2.0.
|
| 3 |
+
|
| 4 |
+
## Table of Contents
|
| 5 |
+
- [WorldMirror 2.0 (World Reconstruction)](#worldmirror-20-world-reconstruction)
|
| 6 |
+
- [Overview](#overview)
|
| 7 |
+
- [Python API](#python-api)
|
| 8 |
+
- [`WorldMirrorPipeline.from_pretrained`](#worldmirrorpipelinefrom_pretrained)
|
| 9 |
+
- [`WorldMirrorPipeline.__call__`](#worldmirrorpipelinecall)
|
| 10 |
+
- [CLI Reference](#cli-reference)
|
| 11 |
+
- [Output Format](#output-format)
|
| 12 |
+
- [File Structure](#file-structure)
|
| 13 |
+
- [Prediction Dictionary](#prediction-dictionary)
|
| 14 |
+
- [Prior Injection](#prior-injection)
|
| 15 |
+
- [Camera Parameters (JSON)](#camera-parameters-json)
|
| 16 |
+
- [Depth Maps (Folder)](#depth-maps-folder)
|
| 17 |
+
- [Combining Priors](#combining-priors)
|
| 18 |
+
- [Multi-GPU Inference](#multi-gpu-inference)
|
| 19 |
+
- [Advanced Options](#advanced-options)
|
| 20 |
+
- [Disabling Prediction Heads](#disabling-prediction-heads)
|
| 21 |
+
- [Mask Filtering](#mask-filtering)
|
| 22 |
+
- [Point Cloud Compression](#point-cloud-compression)
|
| 23 |
+
- [Gradio App](#gradio-app)
|
| 24 |
+
- [Panorama Generation](#panorama-generation)
|
| 25 |
+
- [World Generation](#world-generation)
|
| 26 |
+
|
| 27 |
+
---
|
| 28 |
+
## WorldMirror 2.0 (World Reconstruction)
|
| 29 |
+
### Overview
|
| 30 |
+
WorldMirror 2.0 is a unified feed-forward model for comprehensive 3D geometric prediction from multi-view images or video. It simultaneously generates:
|
| 31 |
+
- **3D point clouds** in world coordinates
|
| 32 |
+
- **Per-view depth maps** in camera frame
|
| 33 |
+
- **Surface normals** in camera coordinates
|
| 34 |
+
- **Camera poses** (c2w) and **intrinsics**
|
| 35 |
+
- **3D Gaussian Splatting** attributes (means, scales, rotations, opacities, SH coefficients)
|
| 36 |
+
|
| 37 |
+
Key improvements over WorldMirror 1.0:
|
| 38 |
+
- **Normalized RoPE** for flexible resolution inference
|
| 39 |
+
- **Depth mask prediction** for robust invalid pixel handling
|
| 40 |
+
- **Sequence Parallel + FSDP + BF16** for efficient multi-GPU inference
|
| 41 |
+
|
| 42 |
+
---
|
| 43 |
+
### Python API
|
| 44 |
+
#### `WorldMirrorPipeline.from_pretrained`
|
| 45 |
+
Factory method to load the model and create a pipeline instance.
|
| 46 |
+
|
| 47 |
+
```python
|
| 48 |
+
from hyworld2.worldrecon.pipeline import WorldMirrorPipeline
|
| 49 |
+
|
| 50 |
+
pipeline = WorldMirrorPipeline.from_pretrained(
|
| 51 |
+
pretrained_model_name_or_path="tencent/HY-World-2.0",
|
| 52 |
+
subfolder="HY-WorldMirror-2.0",
|
| 53 |
+
config_path=None,
|
| 54 |
+
ckpt_path=None,
|
| 55 |
+
use_fsdp=False,
|
| 56 |
+
enable_bf16=False,
|
| 57 |
+
fsdp_cpu_offload=False,
|
| 58 |
+
disable_heads=None,
|
| 59 |
+
)
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
| Parameter | Type | Default | Description |
|
| 63 |
+
|-----------|------|---------|-------------|
|
| 64 |
+
| `pretrained_model_name_or_path` | `str` | `"tencent/HY-World-2.0"` | HuggingFace repo ID or local path |
|
| 65 |
+
| `subfolder` | `str` | `"HY-WorldMirror-2.0"` | Subfolder inside the repo containing WorldMirror checkpoint (`model.safetensors` + config) |
|
| 66 |
+
| `config_path` | `str` | `None` | Training config YAML (used with `ckpt_path` for custom checkpoints) |
|
| 67 |
+
| `ckpt_path` | `str` | `None` | Checkpoint file (`.ckpt` / `.safetensors`). When provided with `config_path`, loads model from local checkpoint instead of HuggingFace |
|
| 68 |
+
| `use_fsdp` | `bool` | `False` | Shard parameters across GPUs via Fully Sharded Data Parallel |
|
| 69 |
+
| `enable_bf16` | `bool` | `False` | Use bfloat16 precision (except numerically critical layers) |
|
| 70 |
+
| `fsdp_cpu_offload` | `bool` | `False` | Offload FSDP parameters to CPU (saves GPU memory at the cost of speed) |
|
| 71 |
+
| `disable_heads` | `list[str]` | `None` | Heads to disable and free from memory. Options: `"camera"`, `"depth"`, `"normal"`, `"points"`, `"gs"` |
|
| 72 |
+
|
| 73 |
+
**Notes:**
|
| 74 |
+
- Distributed mode is auto-detected from `WORLD_SIZE` environment variable (set by `torchrun`).
|
| 75 |
+
- When using multi-GPU, each rank must call `from_pretrained` — the method handles `dist.init_process_group` internally.
|
| 76 |
+
|
| 77 |
+
---
|
| 78 |
+
#### `WorldMirrorPipeline.__call__`
|
| 79 |
+
Run inference on a set of images or a video.
|
| 80 |
+
|
| 81 |
+
```python
|
| 82 |
+
result = pipeline(
|
| 83 |
+
input_path,
|
| 84 |
+
output_path="inference_output",
|
| 85 |
+
**kwargs,
|
| 86 |
+
)
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
Returns the output directory path (`str`), or `None` if the input was skipped.
|
| 90 |
+
|
| 91 |
+
**Inference Parameters:**
|
| 92 |
+
|
| 93 |
+
| Parameter | Type | Default | Description |
|
| 94 |
+
|-----------|------|---------|-------------|
|
| 95 |
+
| `input_path` | `str` | *(required)* | Directory of images or path to a video file |
|
| 96 |
+
| `output_path` | `str` | `"inference_output"` | Root output directory |
|
| 97 |
+
| `target_size` | `int` | `952` | Maximum inference resolution (longest edge). Images are resized + center-cropped to the nearest multiple of 14 |
|
| 98 |
+
| `fps` | `int` | `1` | FPS for extracting frames from video input |
|
| 99 |
+
| `video_strategy` | `str` | `"new"` | Video frame extraction strategy: `"new"` (motion-aware) or `"old"` (uniform FPS) |
|
| 100 |
+
| `video_min_frames` | `int` | `1` | Minimum number of frames to extract from video |
|
| 101 |
+
| `video_max_frames` | `int` | `32` | Maximum number of frames to extract from video |
|
| 102 |
+
|
| 103 |
+
**Save Parameters:**
|
| 104 |
+
|
| 105 |
+
| Parameter | Type | Default | Description |
|
| 106 |
+
|-----------|------|---------|-------------|
|
| 107 |
+
| `save_depth` | `bool` | `True` | Save per-view depth maps (PNG visualization + NPY raw values) |
|
| 108 |
+
| `save_normal` | `bool` | `True` | Save per-view surface normal maps (PNG) |
|
| 109 |
+
| `save_gs` | `bool` | `True` | Save 3D Gaussian Splatting as `gaussians.ply` |
|
| 110 |
+
| `save_camera` | `bool` | `True` | Save camera parameters as `camera_params.json` |
|
| 111 |
+
| `save_points` | `bool` | `True` | Save depth-derived point cloud as `points.ply` |
|
| 112 |
+
| `save_colmap` | `bool` | `False` | Save COLMAP-format sparse reconstruction (`sparse/0/`) |
|
| 113 |
+
| `save_conf` | `bool` | `False` | Save depth confidence maps |
|
| 114 |
+
| `save_sky_mask` | `bool` | `False` | Save sky segmentation masks |
|
| 115 |
+
|
| 116 |
+
**Mask Parameters:**
|
| 117 |
+
|
| 118 |
+
| Parameter | Type | Default | Description |
|
| 119 |
+
|-----------|------|---------|-------------|
|
| 120 |
+
| `apply_sky_mask` | `bool` | `True` | Filter out sky regions from point clouds and Gaussians |
|
| 121 |
+
| `apply_edge_mask` | `bool` | `True` | Filter out edge/discontinuity regions |
|
| 122 |
+
| `apply_confidence_mask` | `bool` | `False` | Filter out low-confidence predictions |
|
| 123 |
+
| `sky_mask_source` | `str` | `"auto"` | Sky mask method: `"auto"` (ONNX + model fusion), `"model"` (model predictions only), `"onnx"` (external segmentation only) |
|
| 124 |
+
| `model_sky_threshold` | `float` | `0.45` | Threshold for model-based sky detection |
|
| 125 |
+
| `confidence_percentile` | `float` | `10.0` | Percentile threshold for confidence filtering (bottom N% removed) |
|
| 126 |
+
| `edge_normal_threshold` | `float` | `1.0` | Normal edge detection tolerance |
|
| 127 |
+
| `edge_depth_threshold` | `float` | `0.03` | Depth edge detection relative tolerance |
|
| 128 |
+
|
| 129 |
+
**Compression Parameters:**
|
| 130 |
+
|
| 131 |
+
| Parameter | Type | Default | Description |
|
| 132 |
+
|-----------|------|---------|-------------|
|
| 133 |
+
| `compress_pts` | `bool` | `True` | Compress point clouds via voxel merging + random sampling |
|
| 134 |
+
| `compress_pts_max_points` | `int` | `2,000,000` | Maximum number of points after compression |
|
| 135 |
+
| `compress_pts_voxel_size` | `float` | `0.002` | Voxel size for point cloud merging |
|
| 136 |
+
| `max_resolution` | `int` | `1920` | Maximum resolution for saved output images |
|
| 137 |
+
| `compress_gs_max_points` | `int` | `5,000,000` | Maximum number of Gaussians after voxel pruning |
|
| 138 |
+
|
| 139 |
+
**Prior Parameters:**
|
| 140 |
+
|
| 141 |
+
| Parameter | Type | Default | Description |
|
| 142 |
+
|-----------|------|---------|-------------|
|
| 143 |
+
| `prior_cam_path` | `str` | `None` | Path to camera parameters JSON file |
|
| 144 |
+
| `prior_depth_path` | `str` | `None` | Path to directory containing depth map files |
|
| 145 |
+
|
| 146 |
+
**Rendered Video Parameters:**
|
| 147 |
+
|
| 148 |
+
| Parameter | Type | Default | Description |
|
| 149 |
+
|-----------|------|---------|-------------|
|
| 150 |
+
| `save_rendered` | `bool` | `False` | Render interpolated fly-through video from Gaussian splats |
|
| 151 |
+
| `render_interp_per_pair` | `int` | `15` | Number of interpolated frames between each camera pair |
|
| 152 |
+
| `render_depth` | `bool` | `False` | Also render a depth visualization video |
|
| 153 |
+
|
| 154 |
+
**Misc Parameters:**
|
| 155 |
+
|
| 156 |
+
| Parameter | Type | Default | Description |
|
| 157 |
+
|-----------|------|---------|-------------|
|
| 158 |
+
| `log_time` | `bool` | `True` | Print timing report and save `pipeline_timing.json` |
|
| 159 |
+
| `strict_output_path` | `str` | `None` | If set, save results directly to this path without `<case_name>/<timestamp>` subdirectories |
|
| 160 |
+
|
| 161 |
+
---
|
| 162 |
+
### CLI Reference
|
| 163 |
+
All `__call__` parameters are exposed as CLI arguments:
|
| 164 |
+
|
| 165 |
+
```bash
|
| 166 |
+
python -m hyworld2.worldrecon.pipeline \
|
| 167 |
+
--input_path path/to/images \
|
| 168 |
+
--output_path inference_output \
|
| 169 |
+
--target_size 952 \
|
| 170 |
+
--prior_cam_path path/to/camera_params.json \
|
| 171 |
+
--prior_depth_path path/to/depth_dir/ \
|
| 172 |
+
```
|
| 173 |
+
|
| 174 |
+
**Boolean flag conventions:**
|
| 175 |
+
|
| 176 |
+
| Enable | Disable |
|
| 177 |
+
|--------|---------|
|
| 178 |
+
| `--save_colmap` | *(omit)* |
|
| 179 |
+
| `--save_conf` | *(omit)* |
|
| 180 |
+
| `--save_sky_mask` | *(omit)* |
|
| 181 |
+
| `--apply_sky_mask` (default on) | `--no_sky_mask` |
|
| 182 |
+
| `--apply_edge_mask` (default on) | `--no_edge_mask` |
|
| 183 |
+
| `--apply_confidence_mask` | *(omit)* |
|
| 184 |
+
| `--compress_pts` (default on) | `--no_compress_pts` |
|
| 185 |
+
| `--log_time` (default on) | `--no_log_time` |
|
| 186 |
+
| *(default on)* `save_depth` | `--no_save_depth` |
|
| 187 |
+
| *(default on)* `save_normal` | `--no_save_normal` |
|
| 188 |
+
| *(default on)* `save_gs` | `--no_save_gs` |
|
| 189 |
+
| *(default on)* `save_camera` | `--no_save_camera` |
|
| 190 |
+
| *(default on)* `save_points` | `--no_save_points` |
|
| 191 |
+
| `--save_rendered` | *(omit)* |
|
| 192 |
+
| `--render_depth` | *(omit)* |
|
| 193 |
+
|
| 194 |
+
**Additional CLI-only arguments:**
|
| 195 |
+
|
| 196 |
+
| Argument | Description |
|
| 197 |
+
|----------|-------------|
|
| 198 |
+
| `--config_path` | Training config YAML for custom checkpoint loading |
|
| 199 |
+
| `--ckpt_path` | Local checkpoint file path |
|
| 200 |
+
| `--use_fsdp` | Enable FSDP multi-GPU sharding |
|
| 201 |
+
| `--enable_bf16` | Enable bfloat16 mixed precision |
|
| 202 |
+
| `--fsdp_cpu_offload` | Offload FSDP params to CPU |
|
| 203 |
+
| `--disable_heads` | Space-separated list of heads to disable (e.g. `--disable_heads camera normal`) |
|
| 204 |
+
| `--no_interactive` | Exit after first inference (skip interactive prompt loop) |
|
| 205 |
+
|
| 206 |
+
---
|
| 207 |
+
### Output Format
|
| 208 |
+
#### File Structure
|
| 209 |
+
|
| 210 |
+
```
|
| 211 |
+
inference_output/
|
| 212 |
+
└── <case_name>/
|
| 213 |
+
└── <timestamp>/
|
| 214 |
+
├── depth/
|
| 215 |
+
│ ├── depth_0000.png # Normalized depth visualization
|
| 216 |
+
│ ├── depth_0000.npy # Raw float32 depth values [H, W]
|
| 217 |
+
│ └── ...
|
| 218 |
+
��── normal/
|
| 219 |
+
│ ├── normal_0000.png # Normal map visualization (RGB)
|
| 220 |
+
│ └── ...
|
| 221 |
+
├── camera_params.json # Camera extrinsics & intrinsics
|
| 222 |
+
├── gaussians.ply # 3D Gaussian Splatting (standard format)
|
| 223 |
+
├── points.ply # Colored point cloud
|
| 224 |
+
├── sparse/ # COLMAP format (if --save_colmap)
|
| 225 |
+
│ └── 0/
|
| 226 |
+
│ ├── cameras.bin
|
| 227 |
+
│ ├── images.bin
|
| 228 |
+
│ └── points3D.bin
|
| 229 |
+
├── rendered/ # Rendered video (if --save_rendered)
|
| 230 |
+
│ ├── rendered_rgb.mp4
|
| 231 |
+
│ └── rendered_depth.mp4 # (if --render_depth)
|
| 232 |
+
└── pipeline_timing.json # Performance timing report
|
| 233 |
+
```
|
| 234 |
+
|
| 235 |
+
#### Prediction Dictionary
|
| 236 |
+
When using the Python API, `pipeline(...)` internally produces a `predictions` dictionary with the following keys:
|
| 237 |
+
|
| 238 |
+
```python
|
| 239 |
+
# Geometry
|
| 240 |
+
predictions["depth"] # [B, S, H, W, 1] — Z-depth in camera frame
|
| 241 |
+
predictions["depth_conf"] # [B, S, H, W] — Depth confidence
|
| 242 |
+
predictions["normals"] # [B, S, H, W, 3] — Surface normals in camera coords
|
| 243 |
+
predictions["normals_conf"] # [B, S, H, W] — Normal confidence
|
| 244 |
+
predictions["pts3d"] # [B, S, H, W, 3] — 3D point maps in world coords
|
| 245 |
+
predictions["pts3d_conf"] # [B, S, H, W] — Point cloud confidence
|
| 246 |
+
# Camera
|
| 247 |
+
predictions["camera_poses"] # [B, S, 4, 4] — Camera-to-world (c2w), OpenCV convention
|
| 248 |
+
predictions["camera_intrs"] # [B, S, 3, 3] — Camera intrinsic matrices
|
| 249 |
+
predictions["camera_params"]# [B, S, 9] — Compact camera vector (translation, quaternion, fov_v, fov_u)
|
| 250 |
+
# 3D Gaussian Splatting
|
| 251 |
+
predictions["splats"]["means"] # [B, N, 3] — Gaussian centers
|
| 252 |
+
predictions["splats"]["scales"] # [B, N, 3] — Gaussian scales
|
| 253 |
+
predictions["splats"]["quats"] # [B, N, 4] — Gaussian rotations (quaternions)
|
| 254 |
+
predictions["splats"]["opacities"] # [B, N] — Gaussian opacities
|
| 255 |
+
predictions["splats"]["sh"] # [B, N, 1, 3] — Spherical harmonics (degree 0)
|
| 256 |
+
predictions["splats"]["weights"] # [B, N] — Per-Gaussian confidence weights
|
| 257 |
+
```
|
| 258 |
+
|
| 259 |
+
Where `B` = batch size (always 1 for inference), `S` = number of input views, `H, W` = image dimensions, `N` = total Gaussians (`S × H × W`).
|
| 260 |
+
|
| 261 |
+
---
|
| 262 |
+
### Prior Injection
|
| 263 |
+
WorldMirror 2.0 accepts three types of geometric priors as conditioning inputs. Priors are automatically detected from the provided files.
|
| 264 |
+
|
| 265 |
+
| Prior Type | Condition | Input Format |
|
| 266 |
+
|------------|-----------|--------------|
|
| 267 |
+
| Camera Pose | `cond_flags[0]` | c2w 4×4 matrix (OpenCV convention) |
|
| 268 |
+
| Depth Map | `cond_flags[1]` | Per-view float depth maps |
|
| 269 |
+
| Intrinsics | `cond_flags[2]` | 3×3 intrinsic matrix |
|
| 270 |
+
|
| 271 |
+
#### Camera Parameters (JSON)
|
| 272 |
+
The camera parameter file follows the same format as the `camera_params.json` output by the pipeline:
|
| 273 |
+
|
| 274 |
+
```json
|
| 275 |
+
{
|
| 276 |
+
"num_cameras": 2,
|
| 277 |
+
"extrinsics": [
|
| 278 |
+
{
|
| 279 |
+
"camera_id": 0,
|
| 280 |
+
"matrix": [
|
| 281 |
+
[0.98, 0.01, -0.17, 0.52],
|
| 282 |
+
[-0.01, 0.99, 0.01, -0.03],
|
| 283 |
+
[0.17, -0.01, 0.98, 1.20],
|
| 284 |
+
[0.0, 0.0, 0.0, 1.0]
|
| 285 |
+
]
|
| 286 |
+
}
|
| 287 |
+
],
|
| 288 |
+
"intrinsics": [
|
| 289 |
+
{
|
| 290 |
+
"camera_id": 0,
|
| 291 |
+
"matrix": [
|
| 292 |
+
[525.0, 0.0, 320.0],
|
| 293 |
+
[0.0, 525.0, 240.0],
|
| 294 |
+
[0.0, 0.0, 1.0]
|
| 295 |
+
]
|
| 296 |
+
}
|
| 297 |
+
]
|
| 298 |
+
}
|
| 299 |
+
```
|
| 300 |
+
|
| 301 |
+
**Field descriptions:**
|
| 302 |
+
|
| 303 |
+
| Field | Description |
|
| 304 |
+
|-------|-------------|
|
| 305 |
+
| `camera_id` | Integer index (`0`, `1`, `2`, ...) or image filename stem without extension (e.g., `"image_0001"`) |
|
| 306 |
+
| `extrinsics.matrix` | 4×4 camera-to-world (c2w) transformation matrix, OpenCV coordinate convention |
|
| 307 |
+
| `intrinsics.matrix` | 3×3 camera intrinsic matrix in pixels (`fx, fy` = focal lengths; `cx, cy` = principal point) |
|
| 308 |
+
|
| 309 |
+
**Important notes:**
|
| 310 |
+
- `extrinsics` and `intrinsics` lists can be provided independently or together. An empty list `[]` or missing key means that prior is unavailable.
|
| 311 |
+
- **Intrinsics resolution:** Values should correspond to the **original image resolution**. The pipeline automatically adjusts for inference-time resize + center-crop.
|
| 312 |
+
- **Extrinsics alignment:** The pipeline automatically normalizes all extrinsics relative to the first view, consistent with training behavior.
|
| 313 |
+
#### Depth Maps (Folder)
|
| 314 |
+
Depth maps are stored as individual files in a directory. Filenames should match the input image filenames. Supported formats: `.npy`, `.exr`, `.png` (16-bit).
|
| 315 |
+
|
| 316 |
+
```
|
| 317 |
+
prior_depth/
|
| 318 |
+
├── image_0001.npy # float32, shape [H, W]
|
| 319 |
+
├── image_0002.npy
|
| 320 |
+
└── ...
|
| 321 |
+
```
|
| 322 |
+
|
| 323 |
+
#### Combining Priors
|
| 324 |
+
Priors can be freely combined. Examples:
|
| 325 |
+
|
| 326 |
+
```bash
|
| 327 |
+
# Only intrinsics
|
| 328 |
+
python -m hyworld2.worldrecon.pipeline --input_path images/ \
|
| 329 |
+
--prior_cam_path camera_intrinsics_only.json
|
| 330 |
+
# Only depth
|
| 331 |
+
python -m hyworld2.worldrecon.pipeline --input_path images/ \
|
| 332 |
+
--prior_depth_path depth_maps/
|
| 333 |
+
# Camera poses + intrinsics + depth
|
| 334 |
+
python -m hyworld2.worldrecon.pipeline --input_path images/ \
|
| 335 |
+
--prior_cam_path camera_params.json \
|
| 336 |
+
--prior_depth_path depth_maps/
|
| 337 |
+
```
|
| 338 |
+
|
| 339 |
+
---
|
| 340 |
+
### Multi-GPU Inference
|
| 341 |
+
WorldMirror 2.0 supports **Sequence Parallel (SP)** inference across multiple GPUs, where token sequences are sharded across ranks in the ViT backbone, and DPT heads process frames in parallel.
|
| 342 |
+
|
| 343 |
+
> **Requirement:** The number of input images must be **>= the number of GPUs** (`nproc_per_node`). For example, with 8 GPUs you need at least 8 input images. The pipeline will raise an error if this condition is not met.
|
| 344 |
+
|
| 345 |
+
```bash
|
| 346 |
+
# 2 GPUs with FSDP + bf16
|
| 347 |
+
torchrun --nproc_per_node=2 -m hyworld2.worldrecon.pipeline \
|
| 348 |
+
--input_path path/to/images \
|
| 349 |
+
--use_fsdp --enable_bf16
|
| 350 |
+
# 4 GPUs
|
| 351 |
+
torchrun --nproc_per_node=4 -m hyworld2.worldrecon.pipeline \
|
| 352 |
+
--input_path path/to/images \
|
| 353 |
+
--use_fsdp --enable_bf16
|
| 354 |
+
# Python API (inside a torchrun script)
|
| 355 |
+
from hyworld2.worldrecon.pipeline import WorldMirrorPipeline
|
| 356 |
+
pipeline = WorldMirrorPipeline.from_pretrained(
|
| 357 |
+
'tencent/HY-World-2.0',
|
| 358 |
+
use_fsdp=True,
|
| 359 |
+
enable_bf16=True,
|
| 360 |
+
)
|
| 361 |
+
pipeline('path/to/images')
|
| 362 |
+
```
|
| 363 |
+
|
| 364 |
+
**What happens under the hood:**
|
| 365 |
+
1. `from_pretrained` auto-detects `WORLD_SIZE > 1` and initializes `torch.distributed`.
|
| 366 |
+
2. The model is loaded on rank 0 and broadcast via `sync_module_states=True`.
|
| 367 |
+
3. FSDP shards parameters across the SP process group.
|
| 368 |
+
4. DPT prediction heads split frames across ranks and `AllGather` results.
|
| 369 |
+
5. Post-processing (mask computation, saving) runs on rank 0 only.
|
| 370 |
+
|
| 371 |
+
---
|
| 372 |
+
### Advanced Options
|
| 373 |
+
#### Disabling Prediction Heads
|
| 374 |
+
To save memory when you only need specific outputs:
|
| 375 |
+
|
| 376 |
+
```python
|
| 377 |
+
from hyworld2.worldrecon.pipeline import WorldMirrorPipeline
|
| 378 |
+
|
| 379 |
+
pipeline = WorldMirrorPipeline.from_pretrained(
|
| 380 |
+
'tencent/HY-World-2.0',
|
| 381 |
+
disable_heads=["normal", "points"], # free ~200M params
|
| 382 |
+
)
|
| 383 |
+
```
|
| 384 |
+
|
| 385 |
+
Available heads: `"camera"`, `"depth"`, `"normal"`, `"points"`, `"gs"`.
|
| 386 |
+
#### Mask Filtering
|
| 387 |
+
The pipeline supports three types of output filtering to improve point cloud and Gaussian quality:
|
| 388 |
+
1. **Sky mask** (`apply_sky_mask=True`): Removes sky regions using an ONNX-based segmentation model, optionally fused with model-predicted depth masks.
|
| 389 |
+
2. **Edge mask** (`apply_edge_mask=True`): Removes points at depth/normal discontinuities (object boundaries).
|
| 390 |
+
3. **Confidence mask** (`apply_confidence_mask=False`): Removes the bottom N% of points by prediction confidence.
|
| 391 |
+
These masks are applied independently to both the `points.ply` (depth-based) and `gaussians.ply` (GS-based) outputs. The GS output uses its own depth predictions for edge detection when available.
|
| 392 |
+
#### Point Cloud Compression
|
| 393 |
+
When `compress_pts=True` (default), the depth-derived point cloud undergoes:
|
| 394 |
+
1. **Voxel merging**: Points within each voxel (size controlled by `compress_pts_voxel_size`) are merged via weighted averaging.
|
| 395 |
+
2. **Random subsampling**: If the result exceeds `compress_pts_max_points`, points are uniformly subsampled.
|
| 396 |
+
Similarly, Gaussians are voxel-pruned (weighted averaging of means, scales, quaternions, colors, opacities) and optionally subsampled to `compress_gs_max_points`.
|
| 397 |
+
|
| 398 |
+
---
|
| 399 |
+
### Gradio App
|
| 400 |
+
An interactive web demo for WorldMirror 2.0. Upload images or videos and visualize 3DGS, point clouds, depth maps, normal maps, and camera parameters in your browser.
|
| 401 |
+
**Quick start:**
|
| 402 |
+
|
| 403 |
+
```bash
|
| 404 |
+
# Single GPU
|
| 405 |
+
python -m hyworld2.worldrecon.gradio_app
|
| 406 |
+
|
| 407 |
+
# Multi-GPU
|
| 408 |
+
torchrun --nproc_per_node=2 -m hyworld2.worldrecon.gradio_app \
|
| 409 |
+
--use_fsdp --enable_bf16
|
| 410 |
+
```
|
| 411 |
+
|
| 412 |
+
**With a local checkpoint:**
|
| 413 |
+
|
| 414 |
+
```bash
|
| 415 |
+
python -m hyworld2.worldrecon.gradio_app \
|
| 416 |
+
--config_path /path/to/config.yaml \
|
| 417 |
+
--ckpt_path /path/to/checkpoint.safetensors
|
| 418 |
+
```
|
| 419 |
+
|
| 420 |
+
**With a public link (e.g., for Colab or remote servers):**
|
| 421 |
+
|
| 422 |
+
```bash
|
| 423 |
+
python -m hyworld2.worldrecon.gradio_app --share
|
| 424 |
+
```
|
| 425 |
+
|
| 426 |
+
**Arguments:**
|
| 427 |
+
|
| 428 |
+
| Argument | Default | Description |
|
| 429 |
+
|----------|---------|-------------|
|
| 430 |
+
| `--port` | `8081` | Server port |
|
| 431 |
+
| `--host` | `0.0.0.0` | Server host |
|
| 432 |
+
| `--share` | `False` | Create a public Gradio link |
|
| 433 |
+
| `--examples_dir` | `./examples/worldrecon` | Path to example scenes directory |
|
| 434 |
+
| `--config_path` | `None` | Training config YAML (used with `--ckpt_path`) |
|
| 435 |
+
| `--ckpt_path` | `None` | Local checkpoint file (`.ckpt` / `.safetensors`) |
|
| 436 |
+
| `--use_fsdp` | `False` | Enable FSDP multi-GPU sharding |
|
| 437 |
+
| `--enable_bf16` | `False` | Enable bfloat16 mixed precision |
|
| 438 |
+
| `--fsdp_cpu_offload` | `False` | Offload FSDP params to CPU (saves GPU memory) |
|
| 439 |
+
|
| 440 |
+
> **Important:** In multi-GPU mode, the number of input images must be **>= the number of GPUs**.
|
| 441 |
+
|
| 442 |
+
---
|
| 443 |
+
## Panorama Generation
|
| 444 |
+
*Coming soon.*
|
| 445 |
+
This section will document the panorama generation model, including:
|
| 446 |
+
- Text-to-panorama and image-to-panorama APIs
|
| 447 |
+
- Model architecture (MMDiT-based implicit perspective-to-ERP mapping)
|
| 448 |
+
- Configuration parameters
|
| 449 |
+
- Output formats
|
| 450 |
+
|
| 451 |
+
---
|
| 452 |
+
## World Generation
|
| 453 |
+
*Coming soon.*
|
| 454 |
+
This section will document the world generation pipeline, including:
|
| 455 |
+
- Trajectory planning configuration
|
| 456 |
+
- World expansion with memory-driven video generation
|
| 457 |
+
- World composition (point cloud expansion + 3DGS optimization)
|
| 458 |
+
- End-to-end generation from text/image to navigable 3D world
|
DOCUMENTATION_zh.md
ADDED
|
@@ -0,0 +1,458 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# HunyuanWorld 2.0 — 文档
|
| 2 |
+
本文档提供了 HunyuanWorld 2.0 各组件的详细使用指南、参数参考和输出格式说明。
|
| 3 |
+
|
| 4 |
+
## 目录
|
| 5 |
+
- [WorldMirror 2.0(世界重建)](#worldmirror-20世界重建)
|
| 6 |
+
- [概述](#概述)
|
| 7 |
+
- [Python API](#python-api)
|
| 8 |
+
- [`WorldMirrorPipeline.from_pretrained`](#worldmirrorpipelinefrom_pretrained)
|
| 9 |
+
- [`WorldMirrorPipeline.__call__`](#worldmirrorpipelinecall)
|
| 10 |
+
- [命令行参考](#命令行参考)
|
| 11 |
+
- [输出格式](#输出格式)
|
| 12 |
+
- [文件结构](#文件结构)
|
| 13 |
+
- [预测字典](#预测字典)
|
| 14 |
+
- [先验注入](#先验注入)
|
| 15 |
+
- [相机参数(JSON)](#相机参数json)
|
| 16 |
+
- [深度图(文件夹)](#深度图文件夹)
|
| 17 |
+
- [组合先验](#组合先验)
|
| 18 |
+
- [多卡推理](#多卡推理)
|
| 19 |
+
- [高级选项](#高级选项)
|
| 20 |
+
- [禁用预测头](#禁用预测头)
|
| 21 |
+
- [掩码过滤](#掩码过滤)
|
| 22 |
+
- [点云压缩](#点云压缩)
|
| 23 |
+
- [Gradio 应用](#gradio-应用)
|
| 24 |
+
- [全景生成](#全景生成)
|
| 25 |
+
- [世界生成](#世界生成)
|
| 26 |
+
|
| 27 |
+
---
|
| 28 |
+
## WorldMirror 2.0(世界重建)
|
| 29 |
+
### 概述
|
| 30 |
+
WorldMirror 2.0 是一个统一的前馈模型,用于从多视图图像或视频进行全面的3D几何预测。它能同时生成:
|
| 31 |
+
- **3D 点云**(世界坐标系)
|
| 32 |
+
- **逐视图深度图**(相机坐标系)
|
| 33 |
+
- **表面法线**(相机坐标系)
|
| 34 |
+
- **相机位姿**(c2w)和**内参**
|
| 35 |
+
- **3D 高斯点云**属性(均值、尺度、旋转、不透明度、球谐系数)
|
| 36 |
+
|
| 37 |
+
相比 WorldMirror 1.0 的关键改进:
|
| 38 |
+
- **归一化 RoPE** 支持灵活分辨率推理
|
| 39 |
+
- **深度掩码预测** 实现稳健的无效像素处理
|
| 40 |
+
- **序列并行 + FSDP + BF16** 实现高效多卡推理
|
| 41 |
+
|
| 42 |
+
---
|
| 43 |
+
### Python API
|
| 44 |
+
#### `WorldMirrorPipeline.from_pretrained`
|
| 45 |
+
工厂方法,用于加载模型并创建 Pipeline 实例。
|
| 46 |
+
|
| 47 |
+
```python
|
| 48 |
+
from hyworld2.worldrecon.pipeline import WorldMirrorPipeline
|
| 49 |
+
|
| 50 |
+
pipeline = WorldMirrorPipeline.from_pretrained(
|
| 51 |
+
pretrained_model_name_or_path="tencent/HY-World-2.0",
|
| 52 |
+
subfolder="HY-WorldMirror-2.0",
|
| 53 |
+
config_path=None,
|
| 54 |
+
ckpt_path=None,
|
| 55 |
+
use_fsdp=False,
|
| 56 |
+
enable_bf16=False,
|
| 57 |
+
fsdp_cpu_offload=False,
|
| 58 |
+
disable_heads=None,
|
| 59 |
+
)
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
| 参数 | 类型 | 默认值 | 描述 |
|
| 63 |
+
|------|------|--------|------|
|
| 64 |
+
| `pretrained_model_name_or_path` | `str` | `"tencent/HY-World-2.0"` | HuggingFace 仓库 ID 或本地路径 |
|
| 65 |
+
| `subfolder` | `str` | `"HY-WorldMirror-2.0"` | 仓库内包含 WorldMirror 检查点的子文件夹(`model.safetensors` + 配置文件) |
|
| 66 |
+
| `config_path` | `str` | `None` | 训练配置 YAML(与 `ckpt_path` 配合用于自定义检查点) |
|
| 67 |
+
| `ckpt_path` | `str` | `None` | 检查点文件(`.ckpt` / `.safetensors`)。与 `config_path` 一起使用时,从本地检查点加载模型而非 HuggingFace |
|
| 68 |
+
| `use_fsdp` | `bool` | `False` | 通过完全分片数据并行(FSDP)在多卡间分片参数 |
|
| 69 |
+
| `enable_bf16` | `bool` | `False` | 使用 bfloat16 精度(数值敏感层除外) |
|
| 70 |
+
| `fsdp_cpu_offload` | `bool` | `False` | 将 FSDP 参数卸载到 CPU(节省显存但降低速度) |
|
| 71 |
+
| `disable_heads` | `list[str]` | `None` | 要禁用并释放内存的预测头。可选:`"camera"`、`"depth"`、`"normal"`、`"points"`、`"gs"` |
|
| 72 |
+
|
| 73 |
+
**说明:**
|
| 74 |
+
- 分布式模式通过 `WORLD_SIZE` 环境变量(由 `torchrun` 设置)自动检测。
|
| 75 |
+
- 使用多卡时,每个 rank 都需要调用 `from_pretrained`——该方法内部处理 `dist.init_process_group`。
|
| 76 |
+
|
| 77 |
+
---
|
| 78 |
+
#### `WorldMirrorPipeline.__call__`
|
| 79 |
+
对一组图像或视频运行推理。
|
| 80 |
+
|
| 81 |
+
```python
|
| 82 |
+
result = pipeline(
|
| 83 |
+
input_path,
|
| 84 |
+
output_path="inference_output",
|
| 85 |
+
**kwargs,
|
| 86 |
+
)
|
| 87 |
+
```
|
| 88 |
+
|
| 89 |
+
返回输出目录路径(`str`),如果输入被跳过则返回 `None`。
|
| 90 |
+
|
| 91 |
+
**推理参数:**
|
| 92 |
+
|
| 93 |
+
| 参数 | 类型 | 默认值 | 描述 |
|
| 94 |
+
|------|------|--------|------|
|
| 95 |
+
| `input_path` | `str` | *(必填)* | 图像目录或视频文件路径 |
|
| 96 |
+
| `output_path` | `str` | `"inference_output"` | 输出根目录 |
|
| 97 |
+
| `target_size` | `int` | `952` | 最大推理分辨率(最长边)。图像将被缩放 + 中心裁剪到最近的 14 的倍数 |
|
| 98 |
+
| `fps` | `int` | `1` | 从视频输入提取帧的帧率 |
|
| 99 |
+
| `video_strategy` | `str` | `"new"` | 视频帧提取策略:`"new"`(运动感知)或 `"old"`(均匀 FPS) |
|
| 100 |
+
| `video_min_frames` | `int` | `1` | 从视频中提取的最少帧数 |
|
| 101 |
+
| `video_max_frames` | `int` | `32` | 从视频中提取的最多帧数 |
|
| 102 |
+
|
| 103 |
+
**保存参数:**
|
| 104 |
+
|
| 105 |
+
| 参数 | 类型 | 默认值 | 描述 |
|
| 106 |
+
|------|------|--------|------|
|
| 107 |
+
| `save_depth` | `bool` | `True` | 保存逐视图深度图(PNG 可视化 + NPY 原始值) |
|
| 108 |
+
| `save_normal` | `bool` | `True` | 保存逐视图表面法线图(PNG) |
|
| 109 |
+
| `save_gs` | `bool` | `True` | 保存 3D 高斯点云为 `gaussians.ply` |
|
| 110 |
+
| `save_camera` | `bool` | `True` | 保存相机参数为 `camera_params.json` |
|
| 111 |
+
| `save_points` | `bool` | `True` | 保存基于深度的点云为 `points.ply` |
|
| 112 |
+
| `save_colmap` | `bool` | `False` | 保存 COLMAP 格式的稀疏重建(`sparse/0/`) |
|
| 113 |
+
| `save_conf` | `bool` | `False` | 保存深度置信度图 |
|
| 114 |
+
| `save_sky_mask` | `bool` | `False` | 保存天空分割掩码 |
|
| 115 |
+
|
| 116 |
+
**掩码参数:**
|
| 117 |
+
|
| 118 |
+
| 参数 | 类型 | 默认值 | 描述 |
|
| 119 |
+
|------|------|--------|------|
|
| 120 |
+
| `apply_sky_mask` | `bool` | `True` | 从点云和高斯中过滤天空区域 |
|
| 121 |
+
| `apply_edge_mask` | `bool` | `True` | 过滤边缘/不连续区域 |
|
| 122 |
+
| `apply_confidence_mask` | `bool` | `False` | 过滤低置信度预测 |
|
| 123 |
+
| `sky_mask_source` | `str` | `"auto"` | 天空掩码方法:`"auto"`(ONNX + 模型融合)、`"model"`(仅模型预测)、`"onnx"`(仅外部分割) |
|
| 124 |
+
| `model_sky_threshold` | `float` | `0.45` | 基于模型的天空检测阈值 |
|
| 125 |
+
| `confidence_percentile` | `float` | `10.0` | 置信度过滤的百分位阈值(移除最低 N%) |
|
| 126 |
+
| `edge_normal_threshold` | `float` | `1.0` | 法线边缘检测容差 |
|
| 127 |
+
| `edge_depth_threshold` | `float` | `0.03` | 深度边缘检测相对容差 |
|
| 128 |
+
|
| 129 |
+
**压缩参数:**
|
| 130 |
+
|
| 131 |
+
| 参数 | 类型 | 默认值 | 描述 |
|
| 132 |
+
|------|------|--------|------|
|
| 133 |
+
| `compress_pts` | `bool` | `True` | 通过体素合并 + 随机采样压缩点云 |
|
| 134 |
+
| `compress_pts_max_points` | `int` | `2,000,000` | 压缩后的最大点数 |
|
| 135 |
+
| `compress_pts_voxel_size` | `float` | `0.002` | 点云合并的体素大小 |
|
| 136 |
+
| `max_resolution` | `int` | `1920` | 保存输出图像的最大分辨率 |
|
| 137 |
+
| `compress_gs_max_points` | `int` | `5,000,000` | 体素剪枝后的最大高斯数 |
|
| 138 |
+
|
| 139 |
+
**先验参数:**
|
| 140 |
+
|
| 141 |
+
| 参数 | 类型 | 默认值 | 描述 |
|
| 142 |
+
|------|------|--------|------|
|
| 143 |
+
| `prior_cam_path` | `str` | `None` | 相机参数 JSON 文件路径 |
|
| 144 |
+
| `prior_depth_path` | `str` | `None` | 深度图文件夹路径 |
|
| 145 |
+
|
| 146 |
+
**渲染视频参数:**
|
| 147 |
+
|
| 148 |
+
| 参数 | 类型 | 默认值 | 描述 |
|
| 149 |
+
|------|------|--------|------|
|
| 150 |
+
| `save_rendered` | `bool` | `False` | 从高斯点云渲染插值飞行视频 |
|
| 151 |
+
| `render_interp_per_pair` | `int` | `15` | 每对相机之间的插值帧数 |
|
| 152 |
+
| `render_depth` | `bool` | `False` | 同时渲染深度可视化视频 |
|
| 153 |
+
|
| 154 |
+
**其他参数:**
|
| 155 |
+
|
| 156 |
+
| 参数 | 类型 | 默认值 | 描述 |
|
| 157 |
+
|------|------|--------|------|
|
| 158 |
+
| `log_time` | `bool` | `True` | 打印计时报告并保存 `pipeline_timing.json` |
|
| 159 |
+
| `strict_output_path` | `str` | `None` | 若指定,结果直接保存到该路径下,不创建 `<case_name>/<timestamp>` 子目录 |
|
| 160 |
+
|
| 161 |
+
---
|
| 162 |
+
### 命令行参考
|
| 163 |
+
所有 `__call__` 参数都可作为命令行参数使用:
|
| 164 |
+
|
| 165 |
+
```bash
|
| 166 |
+
python -m hyworld2.worldrecon.pipeline \
|
| 167 |
+
--input_path path/to/images \
|
| 168 |
+
--output_path inference_output \
|
| 169 |
+
--target_size 952 \
|
| 170 |
+
--prior_cam_path path/to/camera_params.json \
|
| 171 |
+
--prior_depth_path path/to/depth_dir/
|
| 172 |
+
```
|
| 173 |
+
|
| 174 |
+
**布尔标志约定:**
|
| 175 |
+
|
| 176 |
+
| 启用 | 禁用 |
|
| 177 |
+
|------|------|
|
| 178 |
+
| `--save_colmap` | *(省略)* |
|
| 179 |
+
| `--save_conf` | *(省略)* |
|
| 180 |
+
| `--save_sky_mask` | *(省略)* |
|
| 181 |
+
| `--apply_sky_mask`(默认开启) | `--no_sky_mask` |
|
| 182 |
+
| `--apply_edge_mask`(默认开启) | `--no_edge_mask` |
|
| 183 |
+
| `--apply_confidence_mask` | *(省略)* |
|
| 184 |
+
| `--compress_pts`(默认开启) | `--no_compress_pts` |
|
| 185 |
+
| `--log_time`(默认开启) | `--no_log_time` |
|
| 186 |
+
| *(默认开启)* `save_depth` | `--no_save_depth` |
|
| 187 |
+
| *(默认开启)* `save_normal` | `--no_save_normal` |
|
| 188 |
+
| *(默认开启)* `save_gs` | `--no_save_gs` |
|
| 189 |
+
| *(默认开启)* `save_camera` | `--no_save_camera` |
|
| 190 |
+
| *(默认开启)* `save_points` | `--no_save_points` |
|
| 191 |
+
| `--save_rendered` | *(省略)* |
|
| 192 |
+
| `--render_depth` | *(省略)* |
|
| 193 |
+
|
| 194 |
+
**仅命令行参数:**
|
| 195 |
+
|
| 196 |
+
| 参数 | 描述 |
|
| 197 |
+
|------|------|
|
| 198 |
+
| `--config_path` | 用于自定义检查点加载的训练配置 YAML |
|
| 199 |
+
| `--ckpt_path` | 本地检查点文件路径 |
|
| 200 |
+
| `--use_fsdp` | 启用 FSDP 多卡分片 |
|
| 201 |
+
| `--enable_bf16` | 启用 bfloat16 混合精度 |
|
| 202 |
+
| `--fsdp_cpu_offload` | 将 FSDP 参数卸载到 CPU |
|
| 203 |
+
| `--disable_heads` | 以空格分隔要禁用的预测头(例如 `--disable_heads camera normal`) |
|
| 204 |
+
| `--no_interactive` | 首次推理后退出(跳过交互式提示循环) |
|
| 205 |
+
|
| 206 |
+
---
|
| 207 |
+
### 输出格式
|
| 208 |
+
#### 文件结构
|
| 209 |
+
|
| 210 |
+
```
|
| 211 |
+
inference_output/
|
| 212 |
+
└── <case_name>/
|
| 213 |
+
└── <timestamp>/
|
| 214 |
+
├── depth/
|
| 215 |
+
│ ├── depth_0000.png # 归一化深度可视化
|
| 216 |
+
│ ├── depth_0000.npy # 原始 float32 深度值 [H, W]
|
| 217 |
+
│ └── ...
|
| 218 |
+
├── normal/
|
| 219 |
+
│ ├── normal_0000.png # 法线图可视化(RGB)
|
| 220 |
+
│ └── ...
|
| 221 |
+
├── camera_params.json # 相机外参和内参
|
| 222 |
+
├── gaussians.ply # 3D 高斯点云(标准格式)
|
| 223 |
+
├── points.ply # 带颜色的点云
|
| 224 |
+
├── sparse/ # COLMAP 格式(使用 --save_colmap 时)
|
| 225 |
+
│ └── 0/
|
| 226 |
+
│ ├── cameras.bin
|
| 227 |
+
│ ├── images.bin
|
| 228 |
+
│ └── points3D.bin
|
| 229 |
+
├── rendered/ # 渲染视频(使用 --save_rendered 时)
|
| 230 |
+
│ ���── rendered_rgb.mp4
|
| 231 |
+
│ └── rendered_depth.mp4 # (使用 --render_depth 时)
|
| 232 |
+
└── pipeline_timing.json # 性能计时报告
|
| 233 |
+
```
|
| 234 |
+
|
| 235 |
+
#### 预测字典
|
| 236 |
+
使用 Python API 时,`pipeline(...)` 内部生成一个 `predictions` 字典,包含以下键:
|
| 237 |
+
|
| 238 |
+
```python
|
| 239 |
+
# 几何
|
| 240 |
+
predictions["depth"] # [B, S, H, W, 1] — 相机坐标系中的 Z 深度
|
| 241 |
+
predictions["depth_conf"] # [B, S, H, W] — 深度置信度
|
| 242 |
+
predictions["normals"] # [B, S, H, W, 3] — 相机坐标系中的表面法线
|
| 243 |
+
predictions["normals_conf"] # [B, S, H, W] — 法线置信度
|
| 244 |
+
predictions["pts3d"] # [B, S, H, W, 3] — 世界坐标系中的 3D 点图
|
| 245 |
+
predictions["pts3d_conf"] # [B, S, H, W] — 点云置信度
|
| 246 |
+
# 相机
|
| 247 |
+
predictions["camera_poses"] # [B, S, 4, 4] — 相机到世界(c2w),OpenCV 约定
|
| 248 |
+
predictions["camera_intrs"] # [B, S, 3, 3] — 相机内参矩阵
|
| 249 |
+
predictions["camera_params"]# [B, S, 9] — 紧凑相机向量(平移、四元数、fov_v、fov_u)
|
| 250 |
+
# 3D 高斯点云
|
| 251 |
+
predictions["splats"]["means"] # [B, N, 3] — 高斯中心
|
| 252 |
+
predictions["splats"]["scales"] # [B, N, 3] — 高斯尺度
|
| 253 |
+
predictions["splats"]["quats"] # [B, N, 4] — 高斯旋转(四元数)
|
| 254 |
+
predictions["splats"]["opacities"] # [B, N] — 高斯不透明度
|
| 255 |
+
predictions["splats"]["sh"] # [B, N, 1, 3] — 球谐函数(0 阶)
|
| 256 |
+
predictions["splats"]["weights"] # [B, N] — 逐高斯置信度权重
|
| 257 |
+
```
|
| 258 |
+
|
| 259 |
+
其中 `B` = 批大小(推理时始终为 1),`S` = 输入视图数,`H, W` = 图像尺寸,`N` = 总高斯数(`S × H × W`)。
|
| 260 |
+
|
| 261 |
+
---
|
| 262 |
+
### 先验注入
|
| 263 |
+
WorldMirror 2.0 接受三种几何先验作为条件输入。先验会从提供的文件中自动检测。
|
| 264 |
+
|
| 265 |
+
| 先验类型 | 条件标志 | 输入格式 |
|
| 266 |
+
|----------|----------|----------|
|
| 267 |
+
| 相机位姿 | `cond_flags[0]` | c2w 4×4 矩阵(OpenCV 约定) |
|
| 268 |
+
| 深度图 | `cond_flags[1]` | 逐视图浮点深度图 |
|
| 269 |
+
| 相机内参 | `cond_flags[2]` | 3×3 内参矩阵 |
|
| 270 |
+
|
| 271 |
+
#### 相机参数(JSON)
|
| 272 |
+
相机参数文件格式与 Pipeline 输出的 `camera_params.json` 一致:
|
| 273 |
+
|
| 274 |
+
```json
|
| 275 |
+
{
|
| 276 |
+
"num_cameras": 2,
|
| 277 |
+
"extrinsics": [
|
| 278 |
+
{
|
| 279 |
+
"camera_id": 0,
|
| 280 |
+
"matrix": [
|
| 281 |
+
[0.98, 0.01, -0.17, 0.52],
|
| 282 |
+
[-0.01, 0.99, 0.01, -0.03],
|
| 283 |
+
[0.17, -0.01, 0.98, 1.20],
|
| 284 |
+
[0.0, 0.0, 0.0, 1.0]
|
| 285 |
+
]
|
| 286 |
+
}
|
| 287 |
+
],
|
| 288 |
+
"intrinsics": [
|
| 289 |
+
{
|
| 290 |
+
"camera_id": 0,
|
| 291 |
+
"matrix": [
|
| 292 |
+
[525.0, 0.0, 320.0],
|
| 293 |
+
[0.0, 525.0, 240.0],
|
| 294 |
+
[0.0, 0.0, 1.0]
|
| 295 |
+
]
|
| 296 |
+
}
|
| 297 |
+
]
|
| 298 |
+
}
|
| 299 |
+
```
|
| 300 |
+
|
| 301 |
+
**字段说明:**
|
| 302 |
+
|
| 303 |
+
| 字段 | 描述 |
|
| 304 |
+
|------|------|
|
| 305 |
+
| `camera_id` | 整数索引(`0`、`1`、`2` ...)或图像文件名(不含扩展名,如 `"image_0001"`) |
|
| 306 |
+
| `extrinsics.matrix` | 4×4 相机到世界(c2w)变换矩阵,OpenCV 坐标约定 |
|
| 307 |
+
| `intrinsics.matrix` | 3×3 相机内参矩阵(像素单位):`fx, fy` 为焦距,`cx, cy` 为主点坐标 |
|
| 308 |
+
|
| 309 |
+
**重要说明:**
|
| 310 |
+
- `extrinsics` 和 `intrinsics` 列表可以独立提供或一起提供。列表为空 `[]` 或缺失字段表示该先验不可用。
|
| 311 |
+
- **内参分辨率:** 值应对应**原始图像分辨率**。Pipeline 会根据推理时的 resize + center-crop 自动调整。
|
| 312 |
+
- **外参对齐:** Pipeline 会自动将所有外参相对于第一帧归一化,与训练行为一致。
|
| 313 |
+
#### 深度图(文件夹)
|
| 314 |
+
深度图以独立文件存储在一个文件夹中。文件名应与输入图像文件名对应。支持格式:`.npy`、`.exr`、`.png`(16-bit)。
|
| 315 |
+
|
| 316 |
+
```
|
| 317 |
+
prior_depth/
|
| 318 |
+
├── image_0001.npy # float32, shape [H, W]
|
| 319 |
+
├── image_0002.npy
|
| 320 |
+
└── ...
|
| 321 |
+
```
|
| 322 |
+
|
| 323 |
+
#### 组合先验
|
| 324 |
+
先验可以自由组合。示例:
|
| 325 |
+
|
| 326 |
+
```bash
|
| 327 |
+
# 仅内参
|
| 328 |
+
python -m hyworld2.worldrecon.pipeline --input_path images/ \
|
| 329 |
+
--prior_cam_path camera_intrinsics_only.json
|
| 330 |
+
# 仅深度
|
| 331 |
+
python -m hyworld2.worldrecon.pipeline --input_path images/ \
|
| 332 |
+
--prior_depth_path depth_maps/
|
| 333 |
+
# 相机位姿 + 内参 + 深度
|
| 334 |
+
python -m hyworld2.worldrecon.pipeline --input_path images/ \
|
| 335 |
+
--prior_cam_path camera_params.json \
|
| 336 |
+
--prior_depth_path depth_maps/
|
| 337 |
+
```
|
| 338 |
+
|
| 339 |
+
---
|
| 340 |
+
### 多卡推理
|
| 341 |
+
WorldMirror 2.0 支持跨多卡的**序列并行(SP)**推理,其中 token 序列在 ViT 骨干网络中跨 rank 分片,DPT 预测头并行处理帧。
|
| 342 |
+
|
| 343 |
+
> **要求:** 输入图像数量必须 **>= GPU 数量**(`nproc_per_node`)。例如,使用 8 卡时需要提供至少 8 张输入图像。如果不满足此条件,Pipeline 将报错。
|
| 344 |
+
|
| 345 |
+
```bash
|
| 346 |
+
# 2 卡 + FSDP + bf16
|
| 347 |
+
torchrun --nproc_per_node=2 -m hyworld2.worldrecon.pipeline \
|
| 348 |
+
--input_path path/to/images \
|
| 349 |
+
--use_fsdp --enable_bf16
|
| 350 |
+
# 4 卡
|
| 351 |
+
torchrun --nproc_per_node=4 -m hyworld2.worldrecon.pipeline \
|
| 352 |
+
--input_path path/to/images \
|
| 353 |
+
--use_fsdp --enable_bf16
|
| 354 |
+
# Python API(在 torchrun 脚本内)
|
| 355 |
+
from hyworld2.worldrecon.pipeline import WorldMirrorPipeline
|
| 356 |
+
pipeline = WorldMirrorPipeline.from_pretrained(
|
| 357 |
+
'tencent/HY-World-2.0',
|
| 358 |
+
use_fsdp=True,
|
| 359 |
+
enable_bf16=True,
|
| 360 |
+
)
|
| 361 |
+
pipeline('path/to/images')
|
| 362 |
+
```
|
| 363 |
+
|
| 364 |
+
**内部工作原理:**
|
| 365 |
+
1. `from_pretrained` 自动检测 `WORLD_SIZE > 1` 并初始化 `torch.distributed`。
|
| 366 |
+
2. 模型在 rank 0 上加载,并通过 `sync_module_states=True` 广播。
|
| 367 |
+
3. FSDP 将参数跨 SP 进程组分片。
|
| 368 |
+
4. DPT 预测头将帧分配到各 rank 并通过 `AllGather` 汇总结果。
|
| 369 |
+
5. 后处理(掩码计算、保存)仅在 rank 0 上运行。
|
| 370 |
+
|
| 371 |
+
---
|
| 372 |
+
### 高级选项
|
| 373 |
+
#### 禁用预测头
|
| 374 |
+
当只需要特定输出时,可以禁用不需要的预测头以节省显存:
|
| 375 |
+
|
| 376 |
+
```python
|
| 377 |
+
from hyworld2.worldrecon.pipeline import WorldMirrorPipeline
|
| 378 |
+
|
| 379 |
+
pipeline = WorldMirrorPipeline.from_pretrained(
|
| 380 |
+
'tencent/HY-World-2.0',
|
| 381 |
+
disable_heads=["normal", "points"], # 释放约 200M 参数
|
| 382 |
+
)
|
| 383 |
+
```
|
| 384 |
+
|
| 385 |
+
可禁用的预测头:`"camera"`、`"depth"`、`"normal"`、`"points"`、`"gs"`。
|
| 386 |
+
#### 掩码过滤
|
| 387 |
+
Pipeline 支持三种输出过滤方式,以提高点云和高斯质量:
|
| 388 |
+
1. **天空掩码**(`apply_sky_mask=True`):使用基于 ONNX 的分割模型移除天空区域,可选与模型预测的深度掩码融合。
|
| 389 |
+
2. **边缘掩码**(`apply_edge_mask=True`):移除深度/法线不连续处(物体边界)的点。
|
| 390 |
+
3. **置信度掩码**(`apply_confidence_mask=False`):移除预测置信度最低的 N% 的点。
|
| 391 |
+
这些掩码独立应用于 `points.ply`(基于深度)和 `gaussians.ply`(基于 GS)输出。GS 输出在可用时使用其自身的深度预测进行边缘检测。
|
| 392 |
+
#### 点云压缩
|
| 393 |
+
当 `compress_pts=True`(默认)时,基于深度的点云会经过以下处理:
|
| 394 |
+
1. **体素合并**:每个体素内的点(大小由 `compress_pts_voxel_size` 控制)通过加权平均进行合并。
|
| 395 |
+
2. **随机下采样**:如果结果超过 `compress_pts_max_points`,则进行均匀下采样。
|
| 396 |
+
类似地,高斯也会经过体素剪枝(均值、尺度、四元数、颜色、不透明度的加权平均),并可选下采样至 `compress_gs_max_points`。
|
| 397 |
+
|
| 398 |
+
---
|
| 399 |
+
### Gradio 应用
|
| 400 |
+
WorldMirror 2.0 的交互式 Web 演示。上传图像或视频,即可在浏览器中可视化 3DGS、点云、深度图、法线图和相机参数。
|
| 401 |
+
**快速开始:**
|
| 402 |
+
|
| 403 |
+
```bash
|
| 404 |
+
# 单卡
|
| 405 |
+
python -m hyworld2.worldrecon.gradio_app
|
| 406 |
+
|
| 407 |
+
# 多卡
|
| 408 |
+
torchrun --nproc_per_node=2 -m hyworld2.worldrecon.gradio_app \
|
| 409 |
+
--use_fsdp --enable_bf16
|
| 410 |
+
```
|
| 411 |
+
|
| 412 |
+
**使用本地检查点:**
|
| 413 |
+
|
| 414 |
+
```bash
|
| 415 |
+
python -m hyworld2.worldrecon.gradio_app \
|
| 416 |
+
--config_path /path/to/config.yaml \
|
| 417 |
+
--ckpt_path /path/to/checkpoint.safetensors
|
| 418 |
+
```
|
| 419 |
+
|
| 420 |
+
**创建公开链接(如 Colab 或远程服务器):**
|
| 421 |
+
|
| 422 |
+
```bash
|
| 423 |
+
python -m hyworld2.worldrecon.gradio_app --share
|
| 424 |
+
```
|
| 425 |
+
|
| 426 |
+
**参数:**
|
| 427 |
+
|
| 428 |
+
| 参数 | 默认值 | 描述 |
|
| 429 |
+
|------|--------|------|
|
| 430 |
+
| `--port` | `8081` | 服务端口 |
|
| 431 |
+
| `--host` | `0.0.0.0` | 服务主机 |
|
| 432 |
+
| `--share` | `False` | 创建公开的 Gradio 链接 |
|
| 433 |
+
| `--examples_dir` | `./examples/worldrecon` | 示例场景目录路径 |
|
| 434 |
+
| `--config_path` | `None` | 训练配置 YAML(与 `--ckpt_path` 配合使用) |
|
| 435 |
+
| `--ckpt_path` | `None` | 本地检查点文件(`.ckpt` / `.safetensors`) |
|
| 436 |
+
| `--use_fsdp` | `False` | 启用 FSDP 多卡分片 |
|
| 437 |
+
| `--enable_bf16` | `False` | 启用 bfloat16 混合精度 |
|
| 438 |
+
| `--fsdp_cpu_offload` | `False` | 将 FSDP 参数卸载到 CPU(节省显存) |
|
| 439 |
+
|
| 440 |
+
> **重要提示:** 在多卡模式下,输入图像数量必须 **>= GPU 数量**。
|
| 441 |
+
|
| 442 |
+
---
|
| 443 |
+
## 全景生成
|
| 444 |
+
*即将发布。*
|
| 445 |
+
本节将记录全景生成模型,包括:
|
| 446 |
+
- 文本到全景和图像到全景的 API
|
| 447 |
+
- 模型架构(基于 MMDiT 的隐式透视到 ERP 映射)
|
| 448 |
+
- 配置参数
|
| 449 |
+
- 输出格式
|
| 450 |
+
|
| 451 |
+
---
|
| 452 |
+
## 世界生成
|
| 453 |
+
*即将发布。*
|
| 454 |
+
本节将记录世界生成流水线,包括:
|
| 455 |
+
- 轨迹规划配置
|
| 456 |
+
- 基于记忆驱动的视频生成进行世界扩展
|
| 457 |
+
- 世界组合(点云扩展 + 3DGS 优化)
|
| 458 |
+
- 从文本/图像到可导航3D世界的端到端生成
|
HY-WorldMirror-2.0/README.md
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- model_hub_mixin
|
| 4 |
+
- pytorch_model_hub_mixin
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
|
| 8 |
+
- Code: [More Information Needed]
|
| 9 |
+
- Paper: [More Information Needed]
|
| 10 |
+
- Docs: [More Information Needed]
|
HY-WorldMirror-2.0/config.json
ADDED
|
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"condition_strategy": [
|
| 3 |
+
"token",
|
| 4 |
+
"pow3r",
|
| 5 |
+
"token"
|
| 6 |
+
],
|
| 7 |
+
"depth": 24,
|
| 8 |
+
"disable_gs_depth": false,
|
| 9 |
+
"dpt_gradient_checkpoint": false,
|
| 10 |
+
"embed_dim": 1024,
|
| 11 |
+
"enable_bf16": false,
|
| 12 |
+
"enable_cam": true,
|
| 13 |
+
"enable_cond": true,
|
| 14 |
+
"enable_depth": true,
|
| 15 |
+
"enable_depth_mask": true,
|
| 16 |
+
"enable_gs": true,
|
| 17 |
+
"enable_norm": true,
|
| 18 |
+
"enable_pts": true,
|
| 19 |
+
"fixed_patch_embed": true,
|
| 20 |
+
"gs_dim": 256,
|
| 21 |
+
"img_size": 518,
|
| 22 |
+
"mlp_ratio": 4.0,
|
| 23 |
+
"model_size": "large",
|
| 24 |
+
"normalized_rope": true,
|
| 25 |
+
"num_heads": 16,
|
| 26 |
+
"num_register_tokens": 4,
|
| 27 |
+
"patch_embed": "dinov2_vitl14_reg",
|
| 28 |
+
"patch_size": 14,
|
| 29 |
+
"rope_base": 100.0,
|
| 30 |
+
"rope_jitter_coords": null,
|
| 31 |
+
"rope_normalize_coords": "separate",
|
| 32 |
+
"rope_rescale_coords": null,
|
| 33 |
+
"rope_shift_coords": null,
|
| 34 |
+
"sampling_strategy": "uniform",
|
| 35 |
+
"set_sky_region_to_maxdepth": false,
|
| 36 |
+
"sp_size": 1
|
| 37 |
+
}
|
HY-WorldMirror-2.0/model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9fff06539d3d9e85338d7de1ffb5afffb7739fa5bf4d62b4b7c319b5ecdde54f
|
| 3 |
+
size 5053553272
|
License.txt
ADDED
|
@@ -0,0 +1,82 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
TENCENT HY-WORLD 2.0 COMMUNITY LICENSE AGREEMENT
|
| 2 |
+
Tencent HY-WORLD 2.0 Release Date: April 15, 2026
|
| 3 |
+
THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW.
|
| 4 |
+
By clicking to agree or by using, reproducing, modifying, distributing, performing or displaying any portion or element of the Tencent HY-WORLD 2.0 Works, including via any Hosted Service, You will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately.
|
| 5 |
+
1. DEFINITIONS.
|
| 6 |
+
a. “Acceptable Use Policy” shall mean the policy made available by Tencent as set forth in the Exhibit A.
|
| 7 |
+
b. “Agreement” shall mean the terms and conditions for use, reproduction, distribution, modification, performance and displaying of Tencent HY-WORLD 2.0 Works or any portion or element thereof set forth herein.
|
| 8 |
+
c. “Documentation” shall mean the specifications, manuals and documentation for Tencent HY-WORLD 2.0 made publicly available by Tencent.
|
| 9 |
+
d. “Hosted Service” shall mean a hosted service offered via an application programming interface (API), web access, or any other electronic or remote means.
|
| 10 |
+
e. “Licensee,” “You” or “Your” shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Tencent HY-WORLD 2.0 Works for any purpose and in any field of use.
|
| 11 |
+
f. “Materials” shall mean, collectively, Tencent’s proprietary Tencent HY-WORLD 2.0 and Documentation (and any portion thereof) as made available by Tencent under this Agreement.
|
| 12 |
+
g. “Model Derivatives” shall mean all: (i) modifications to Tencent HY-WORLD 2.0 or any Model Derivative of Tencent HY-WORLD 2.0; (ii) works based on Tencent HY-WORLD 2.0 or any Model Derivative of Tencent HY-WORLD 2.0; or (iii) any other machine learning model which is created by transfer of patterns of the weights, parameters, operations, or Output of Tencent HY-WORLD 2.0 or any Model Derivative of Tencent HY-WORLD 2.0, to that model in order to cause that model to perform similarly to Tencent HY-WORLD 2.0 or a Model Derivative of Tencent HY-WORLD 2.0, including distillation methods, methods that use intermediate data representations, or methods based on the generation of synthetic data Outputs by Tencent HY-WORLD 2.0 or a Model Derivative of Tencent HY-WORLD 2.0 for training that model. For clarity, Outputs by themselves are not deemed Model Derivatives.
|
| 13 |
+
h. “Output” shall mean the information and/or content output of Tencent HY-WORLD 2.0 or a Model Derivative that results from operating or otherwise using Tencent HY-WORLD 2.0 or a Model Derivative, including via a Hosted Service.
|
| 14 |
+
i. “Tencent,” “We” or “Us” shall mean the applicable entity or entities in the Tencent corporate family that own(s) intellectual property or other rights embodied in or utilized by the Materials..
|
| 15 |
+
j. “Tencent HY-WORLD 2.0” shall mean the 3D generation models and their software and algorithms, including trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing made publicly available by Us at [https://github.com/Tencent-Hunyuan/HY-World-2.0].
|
| 16 |
+
k. “Tencent HY-WORLD 2.0 Works” shall mean: (i) the Materials; (ii) Model Derivatives; and (iii) all derivative works thereof.
|
| 17 |
+
l. “Territory” shall mean the worldwide territory, excluding the territory of the European Union, United Kingdom and South Korea.
|
| 18 |
+
m. “Third Party” or “Third Parties” shall mean individuals or legal entities that are not under common control with Us or You.
|
| 19 |
+
n. “including” shall mean including but not limited to.
|
| 20 |
+
2. GRANT OF RIGHTS.
|
| 21 |
+
We grant You, for the Territory only, a non-exclusive, non-transferable and royalty-free limited license under Tencent’s intellectual property or other rights owned by Us embodied in or utilized by the Materials to use, reproduce, distribute, create derivative works of (including Model Derivatives), and make modifications to the Materials, only in accordance with the terms of this Agreement and the Acceptable Use Policy, and You must not violate (or encourage or permit anyone else to violate) any term of this Agreement or the Acceptable Use Policy.
|
| 22 |
+
3. DISTRIBUTION.
|
| 23 |
+
You may, subject to Your compliance with this Agreement, distribute or make available to Third Parties the Tencent HY-WORLD 2.0 Works, exclusively in the Territory, provided that You meet all of the following conditions:
|
| 24 |
+
a. You must provide all such Third Party recipients of the Tencent HY-WORLD 2.0 Works or products or services using them a copy of this Agreement;
|
| 25 |
+
b. You must cause any modified files to carry prominent notices stating that You changed the files;
|
| 26 |
+
c. You are encouraged to: (i) publish at least one technology introduction blogpost or one public statement expressing Your experience of using the Tencent HY-WORLD 2.0 Works; and (ii) mark the products or services developed by using the Tencent HY-WORLD 2.0 Works to indicate that the product/service is “Powered by Tencent HY”; and
|
| 27 |
+
d. All distributions to Third Parties (other than through a Hosted Service) must be accompanied by a “Notice” text file that contains the following notice: “Tencent HY-WORLD 2.0 is licensed under the Tencent HY-WORLD 2.0 Community License Agreement, Copyright © 2026 Tencent. All Rights Reserved. The trademark rights of “Tencent HY” are owned by Tencent or its affiliate.”
|
| 28 |
+
e. In the event that You use, integrate, implement, or otherwise deploy the Tencent HY Works, in whole or in part, to provide, enable, or support any service, product, or functionality to third parties, You shall clearly, accurately, and prominently disclose to all end users the full legal name and entity of the actual provider of such service, product, or functionality. You shall expressly and conspicuously state that Tencent is not affiliated with, associated with, sponsoring, or endorsing any such service, product, or functionality. You shall not use or display any name, logo, trademark, trade name, or other indicia of Tencent in any manner that could be construed as, or be likely to create, confusion, deception, or a false impression regarding any relationship, affiliation, sponsorship, or endorsement by Tencent.
|
| 29 |
+
You may add Your own copyright statement to Your modifications and, except as set forth in this Section and in Section 5, may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Model Derivatives as a whole, provided Your use, reproduction, modification, distribution, performance and display of the work otherwise complies with the terms and conditions of this Agreement (including as regards the Territory). If You receive Tencent HY-WORLD 2.0 Works from a Licensee as part of an integrated end user product, then this Section 3 of this Agreement will not apply to You.
|
| 30 |
+
4. ADDITIONAL COMMERCIAL TERMS.
|
| 31 |
+
If, on the Tencent HY-WORLD 2.0 version release date, the monthly active users of all products or services made available by or for Licensee is greater than 1 million monthly active users in the preceding calendar month, You must request a license from Tencent, which Tencent may grant to You in its sole discretion, and You are not authorized to exercise any of the rights under this Agreement unless or until Tencent otherwise expressly grants You such rights.
|
| 32 |
+
Subject to Tencent's written approval, you may request a license for the use of Tencent HY-WORLD 2.0 by submitting the following information to hunyuan3d@tencent.com:
|
| 33 |
+
a. Your company’s name and associated business sector that plans to use Tencent HY-WORLD 2.0.
|
| 34 |
+
b. Your intended use case and the purpose of using Tencent HY-WORLD 2.0.
|
| 35 |
+
c. Your plans to modify Tencent HY-WORLD 2.0 or create Model Derivatives.
|
| 36 |
+
5. RULES OF USE.
|
| 37 |
+
a. Your use of the Tencent HY-WORLD 2.0 Works must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Tencent HY-WORLD 2.0 Works, which is hereby incorporated by reference into this Agreement. You must include the use restrictions referenced in these Sections 5(a) and 5(b) as an enforceable provision in any agreement (e.g., license agreement, terms of use, etc.) governing the use and/or distribution of Tencent HY-WORLD 2.0 Works and You must provide notice to subsequent users to whom You distribute that Tencent HY-WORLD 2.0 Works are subject to the use restrictions in these Sections 5(a) and 5(b).
|
| 38 |
+
b. You must not use the Tencent HY-WORLD 2.0 Works or any Output or results of the Tencent HY-WORLD 2.0 Works to improve any other AI model (other than Tencent HY-WORLD 2.0 or Model Derivatives thereof).
|
| 39 |
+
c. You must not use, reproduce, modify, distribute, or display the Tencent HY-WORLD 2.0 Works, Output or results of the Tencent HY-WORLD 2.0 Works outside the Territory. Any such use outside the Territory is unlicensed and unauthorized under this Agreement.
|
| 40 |
+
6. INTELLECTUAL PROPERTY.
|
| 41 |
+
a. Subject to Tencent’s ownership of Tencent HY-WORLD 2.0 Works made by or for Tencent and intellectual property rights therein, conditioned upon Your compliance with the terms and conditions of this Agreement, as between You and Tencent, You will be the owner of any derivative works and modifications of the Materials and any Model Derivatives that are made by or for You.
|
| 42 |
+
b. No trademark licenses are granted under this Agreement, and in connection with the Tencent HY-WORLD 2.0 Works, Licensee may not use any name or mark owned by or associated with Tencent or any of its affiliates, except as required for reasonable and customary use in describing and distributing the Tencent HY-WORLD 2.0 Works. Tencent hereby grants You a license to use “Tencent HY” (the “Mark”) in the Territory solely as required to comply with the provisions of Section 3(c), provided that You comply with any applicable laws related to trademark protection. All goodwill arising out of Your use of the Mark will inure to the benefit of Tencent.
|
| 43 |
+
c. If You commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any person or entity alleging that the Materials or any Output, or any portion of any of the foregoing, infringe any intellectual property or other right owned or licensable by You, then all licenses granted to You under this Agreement shall terminate as of the date such lawsuit or other proceeding is filed. You will defend, indemnify and hold harmless Us from and against any claim by any Third Party arising out of or related to Your or the Third Party’s use or distribution of the Tencent HY-WORLD 2.0 Works.
|
| 44 |
+
d. Tencent claims no rights in Outputs You generate. You and Your users are solely responsible for Outputs and their subsequent uses.
|
| 45 |
+
7. DISCLAIMERS OF WARRANTY AND LIMITATIONS OF LIABILITY.
|
| 46 |
+
a. We are not obligated to support, update, provide training for, or develop any further version of the Tencent HY-WORLD 2.0 Works or to grant any license thereto.
|
| 47 |
+
b. UNLESS AND ONLY TO THE EXTENT REQUIRED BY APPLICABLE LAW, THE TENCENT HY-WORLD 2.0 WORKS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED “AS IS” WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES OF ANY KIND INCLUDING ANY WARRANTIES OF TITLE, MERCHANTABILITY, NONINFRINGEMENT, COURSE OF DEALING, USAGE OF TRADE, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING, REPRODUCING, MODIFYING, PERFORMING, DISPLAYING OR DISTRIBUTING ANY OF THE TENCENT HY-WORLD 2.0 WORKS OR OUTPUTS AND ASSUME ANY AND ALL RISKS ASSOCIATED WITH YOUR OR A THIRD PARTY’S USE OR DISTRIBUTION OF ANY OF THE TENCENT HY-WORLD 2.0 WORKS OR OUTPUTS AND YOUR EXERCISE OF RIGHTS AND PERMISSIONS UNDER THIS AGREEMENT.
|
| 48 |
+
c. TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT SHALL TENCENT OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, FOR ANY DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, CONSEQUENTIAL OR PUNITIVE DAMAGES, OR LOST PROFITS OF ANY KIND ARISING FROM THIS AGREEMENT OR RELATED TO ANY OF THE TENCENT HY-WORLD 2.0 WORKS OR OUTPUTS, EVEN IF TENCENT OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
|
| 49 |
+
8. SURVIVAL AND TERMINATION.
|
| 50 |
+
a. The term of this Agreement shall commence upon Your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein.
|
| 51 |
+
b. We may terminate this Agreement if You breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, You must promptly delete and cease use of the Tencent HY-WORLD 2.0 Works. Sections 6(a), 6(c), 7 and 9 shall survive the termination of this Agreement.
|
| 52 |
+
9. GOVERNING LAW AND JURISDICTION.
|
| 53 |
+
a. This Agreement and any dispute arising out of or relating to it will be governed by the laws of the Hong Kong Special Administrative Region of the People’s Republic of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
|
| 54 |
+
b. Exclusive jurisdiction and venue for any dispute arising out of or relating to this Agreement will be a court of competent jurisdiction in the Hong Kong Special Administrative Region of the People’s Republic of China, and Tencent and Licensee consent to the exclusive jurisdiction of such court with respect to any such dispute.
|
| 55 |
+
|
| 56 |
+
EXHIBIT A
|
| 57 |
+
ACCEPTABLE USE POLICY
|
| 58 |
+
|
| 59 |
+
Tencent reserves the right to update this Acceptable Use Policy from time to time.
|
| 60 |
+
Last modified: December 30, 2025
|
| 61 |
+
|
| 62 |
+
Tencent endeavors to promote safe and fair use of its tools and features, including Tencent HY-WORLD 2.0. You agree not to use Tencent HY-WORLD 2.0 or Model Derivatives:
|
| 63 |
+
1. Outside the Territory;
|
| 64 |
+
2. In any way that violates any applicable national, federal, state, local, international or any other law or regulation;
|
| 65 |
+
3. To harm Yourself or others;
|
| 66 |
+
4. To repurpose or distribute output from Tencent HY-WORLD 2.0 or any Model Derivatives to harm Yourself or others;
|
| 67 |
+
5. To override or circumvent the safety guardrails and safeguards We have put in place;
|
| 68 |
+
6. For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
|
| 69 |
+
7. To generate or disseminate verifiably false information and/or content with the purpose of harming others or influencing elections;
|
| 70 |
+
8. To generate or facilitate false online engagement, including fake reviews and other means of fake online engagement;
|
| 71 |
+
9. To intentionally defame, disparage or otherwise harass others;
|
| 72 |
+
10. To generate and/or disseminate malware (including ransomware) or any other content to be used for the purpose of harming electronic systems;
|
| 73 |
+
11. To generate or disseminate personal identifiable information with the purpose of harming others;
|
| 74 |
+
12. To generate or disseminate information (including images, code, posts, articles), and place the information in any public context (including –through the use of bot generated tweets), without expressly and conspicuously identifying that the information and/or content is machine generated;
|
| 75 |
+
13. To impersonate another individual without consent, authorization, or legal right;
|
| 76 |
+
14. To make high-stakes automated decisions in domains that affect an individual’s safety, rights or wellbeing (e.g., law enforcement, migration, medicine/health, management of critical infrastructure, safety components of products, essential services, credit, employment, housing, education, social scoring, or insurance);
|
| 77 |
+
15. In a manner that violates or disrespects the social ethics and moral standards of other countries or regions;
|
| 78 |
+
16. To perform, facilitate, threaten, incite, plan, promote or encourage violent extremism or terrorism;
|
| 79 |
+
17. For any use intended to discriminate against or harm individuals or groups based on protected characteristics or categories, online or offline social behavior or known or predicted personal or personality characteristics;
|
| 80 |
+
18. To intentionally exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
|
| 81 |
+
19. For military purposes;
|
| 82 |
+
20. To engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or other professional practices.
|
README.md
ADDED
|
@@ -0,0 +1,490 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
- zh
|
| 5 |
+
license: other
|
| 6 |
+
license_name: tencent-hy-world-2.0-community
|
| 7 |
+
license_link: https://github.com/Tencent-Hunyuan/HY-World-2.0/blob/main/License.txt
|
| 8 |
+
pipeline_tag: image-to-3d
|
| 9 |
+
library_name: hy-world-2
|
| 10 |
+
tags:
|
| 11 |
+
- worldmodel
|
| 12 |
+
- 3d
|
| 13 |
+
- hy-world
|
| 14 |
+
extra_gated_eu_disallowed: true
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
<h1>HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds</h1>
|
| 18 |
+
|
| 19 |
+
[English](README.md) | [简体中文](README_zh.md)
|
| 20 |
+
|
| 21 |
+
<p align="center">
|
| 22 |
+
<img src="assets/teaser.png" width="95%" alt="HY-World-2.0 Teaser">
|
| 23 |
+
</p>
|
| 24 |
+
|
| 25 |
+
<div align="center">
|
| 26 |
+
<a href=https://3d.hunyuan.tencent.com/sceneTo3D target="_blank"><img src=https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage height=22px></a>
|
| 27 |
+
<a href=https://huggingface.co/tencent/HY-World-2.0 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a>
|
| 28 |
+
<a href=https://3d-models.hunyuan.tencent.com/world/ target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
|
| 29 |
+
<a href=https://3d-models.hunyuan.tencent.com/world/world2_0/HY_World_2_0.pdf target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
|
| 30 |
+
<a href=https://modelscope.cn/models/Tencent-Hunyuan/HY-World-2.0 target="_blank"><img src=https://img.shields.io/badge/ModelScope-Models-624aff.svg height=22px></a>
|
| 31 |
+
<a href=https://discord.gg/dNBrdrGGMa target="_blank"><img src= https://img.shields.io/badge/Discord-white.svg?logo=discord height=22px></a>
|
| 32 |
+
<a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Tencent%20HY-black.svg?logo=x height=22px></a>
|
| 33 |
+
<a href="#community-resources" target="_blank"><img src=https://img.shields.io/badge/Community-lavender.svg?logo=homeassistantcommunitystore height=22px></a>
|
| 34 |
+
</div>
|
| 35 |
+
|
| 36 |
+
<br>
|
| 37 |
+
<p align="center">
|
| 38 |
+
<i>"What Is Now Proved Was Once Only Imagined"</i>
|
| 39 |
+
</p>
|
| 40 |
+
|
| 41 |
+
## 🎥 Video
|
| 42 |
+
<video width="100%" controls><source src="https://github.com/user-attachments/assets/b56f4750-25c9-48fb-83ff-d58526711463" type="video/mp4"></video>
|
| 43 |
+
|
| 44 |
+
## 🔥 News
|
| 45 |
+
|
| 46 |
+
- **[April 15, 2026]**: 🚀 Release HY-World 2.0 technical report & partial codes!
|
| 47 |
+
- **[April 15, 2026]**: 🤗 Open-source WorldMirror 2.0 inference code and model weights!
|
| 48 |
+
- **[Coming Soon]**: Release Full HY-World 2.0 (World Generation) inference code.
|
| 49 |
+
- **[Coming Soon]**: Release  (HY-Pano 2.0) model weights & code.
|
| 50 |
+
- **[Coming Soon]**: Release (WorldNav) code.
|
| 51 |
+
- **[Coming Soon]**: Release (WorldStereo 2.0) model weights & inference code.
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
## 📋 Table of Contents
|
| 55 |
+
- [📖 Introduction](#-introduction)
|
| 56 |
+
- [✨ Highlights](#-highlights)
|
| 57 |
+
- [🧩 Architecture](#-architecture)
|
| 58 |
+
- [📝 Open-Source Plan](#-open-source-plan)
|
| 59 |
+
- [🎁 Model Zoo](#-model-zoo)
|
| 60 |
+
- [🤗 Get Started](#-get-started)
|
| 61 |
+
- [🔮 Performance](#-performance)
|
| 62 |
+
- [🎬 More Examples](#-more-examples)
|
| 63 |
+
- [📚 Citation](#-citation)
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
## 📖 Introduction
|
| 67 |
+
|
| 68 |
+
**HY-World 2.0** is a multi-modal world model framework for **world generation** and **world reconstruction**. It accepts diverse input modalities — text, single-view images, multi-view images, and videos — and produces 3D world representations (meshes / Gaussian Splattings). It offers two core capabilities:
|
| 69 |
+
|
| 70 |
+
- **World Generation** (text / single image → 3D world): syntheses high-fidelity, navigable 3D scenes through a four-stage method —— a)  with HY-Pano 2.0, b)  with WorldNav, c)  with WorldStereo 2.0, and d)  with WorldMirror 2.0 & 3DGS learning.
|
| 71 |
+
- **World Reconstruction** (multi-view images / video → 3D): Powered by WorldMirror 2.0, a unified feed-forward model that simultaneously predicts depth, surface normals, camera parameters, 3D point clouds, and 3DGS attributes in a single forward pass.
|
| 72 |
+
|
| 73 |
+
HY-World 2.0 is the **first open-source state-of-the-art** 3D world model, delivering results comparable to closed-source methods such as Marble. We will release all model weights, code, and technical details to facilitate reproducibility and advance research in this field.
|
| 74 |
+
|
| 75 |
+
### Why 3D World Models?
|
| 76 |
+
|
| 77 |
+
Existing world models, such as Genie 3, Cosmos, and HY-World 1.5 (WorldPlay+WorldCompass), generate pixel-level videos — essentially "watching a movie" that vanishes once playback ends. **HY-World 2.0 takes a fundamentally different approach**: it directly produces editable, persistent 3D assets (meshes / 3DGS) that can be imported into game engines like Blender/Unity/Unreal Engine/Isaac Sim — more like "building a playable game" than recording a clip. This paradigm shift natively resolves many long-standing pain points of video world models:
|
| 78 |
+
|
| 79 |
+
| | Video World Models | 3D World Model (HY-World 2.0) |
|
| 80 |
+
|--|---|---|
|
| 81 |
+
| **Output** | Pixel videos (non-editable) | Real 3D assets — meshes / 3DGS (fully editable) |
|
| 82 |
+
| **Playable Duration** | Limited (typically < 1 min) | Unlimited — assets persist permanently |
|
| 83 |
+
| **3D Consistency** | Poor (flickering, artifacts across views) | Native — inherently consistent in 3D |
|
| 84 |
+
| **Real-Time Rendering** | Requires per-frame inference; high latency | Consumer GPUs can render in real time |
|
| 85 |
+
| **Controllability** | Weak (imprecise character control, no real physics) | Precise — zero-error control, real physics collision, accurate lighting |
|
| 86 |
+
| **Inference Cost** | Accumulates with every interaction | One-time generation; rendering cost ≈ 0 |
|
| 87 |
+
| **Engine Compatibility** | ✗ Video files only | ✓ Directly importable into Blender / UE / Isaac Engine |
|
| 88 |
+
| | $\color{IndianRed}{\textsf{Watch a video, then it's gone}}$ | $\color{RoyalBlue}{\textbf{Build a world, keep it forever}}$ |
|
| 89 |
+
|
| 90 |
+
|
| 91 |
+
<table align="center" style="border: none;">
|
| 92 |
+
<tr>
|
| 93 |
+
<td align="center" width="50%"><img src="assets/screenshot_1.gif" width="100%"></td>
|
| 94 |
+
<td align="center" width="50%"><img src="assets/screenshot_2.gif" width="100%"></td>
|
| 95 |
+
</tr>
|
| 96 |
+
<tr>
|
| 97 |
+
<td align="center" width="50%"><img src="assets/screenshot_7.gif" width="100%"></td>
|
| 98 |
+
<td align="center" width="50%"><img src="assets/screenshot_8.gif" width="100%"></td>
|
| 99 |
+
</tr>
|
| 100 |
+
</table>
|
| 101 |
+
|
| 102 |
+
<p align="center"><em>All above are <strong>real 3D assets</strong> (not generated videos) and entirely created by HY-World 2.0 -- captured from live real-time interaction.</em></p>
|
| 103 |
+
|
| 104 |
+
## ✨ Highlights
|
| 105 |
+
|
| 106 |
+
- **Real 3D Worlds, Not Just Videos**
|
| 107 |
+
|
| 108 |
+
Unlike video-only world models (e.g., Genie 3, HY World 1.5), HY-World 2.0 generates **real 3D assets** — 3DGS, meshes, and point clouds — that are freely explorable, editable, and directly importable into **Unity / Unreal Engine / Isaac**. From a single text prompt or image, create navigable 3D worlds with diverse styles: realistic, cartoon, game, and more.
|
| 109 |
+
|
| 110 |
+
<p align="center">
|
| 111 |
+
<img src="assets/mesh_en.gif" width="95%">
|
| 112 |
+
</p>
|
| 113 |
+
|
| 114 |
+
|
| 115 |
+
- **Instant 3D Reconstruction from Photos & Videos**
|
| 116 |
+
|
| 117 |
+
Powered by **WorldMirror 2.0**, a unified feed-forward model that predicts dense point clouds, depth maps, surface normals, camera parameters, and 3DGS from multi-view images or casual videos in a single forward pass. Supports flexible-resolution inference (50K–500K pixels) with SOTA accuracy. Capture a video, get a digital twin.
|
| 118 |
+
|
| 119 |
+
<p align="center">
|
| 120 |
+
<img src="assets/recon_en.gif" width="95%">
|
| 121 |
+
</p>
|
| 122 |
+
|
| 123 |
+
- **Interactive Character Exploration**
|
| 124 |
+
|
| 125 |
+
Go beyond viewing — **play inside your generated worlds**. HY-World 2.0 supports first-person navigation and third-person character mode, enabling users to freely explore AI-generated streets, buildings, and landscapes with physics-based collision. Go to [our product page]() for free try.
|
| 126 |
+
|
| 127 |
+
<p align="center">
|
| 128 |
+
<img src="assets/interactive.gif" width="95%">
|
| 129 |
+
</p>
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
## 🧩 Architecture
|
| 133 |
+
- **Refer to our tech report for more details**
|
| 134 |
+
|
| 135 |
+
A systematic pipeline of HY-World 2.0 — *Panorama Generation* (HY-Pano-2.0) → *Trajectory Planning* (WorldNav) → *World Expansion* (WorldStereo 2.0) → *World Composition* (WorldMirror 2.0 + 3DGS) — that automatically transforms text or a single image into a high-fidelity, navigable 3D world (3DGS/mesh outputs).
|
| 136 |
+
|
| 137 |
+
<p align="center">
|
| 138 |
+
<img src="assets/overview.png" width="95%">
|
| 139 |
+
</p>
|
| 140 |
+
|
| 141 |
+
## 📝 Open-Source Plan
|
| 142 |
+
|
| 143 |
+
- ✅ Technical Report
|
| 144 |
+
- ✅ WorldMirror 2.0 Code & Model Checkpoints
|
| 145 |
+
- ⬜ Full Inference Code for World Generation (WorldNav + World Composition)
|
| 146 |
+
- ⬜ Panorama Generation (HY-Pano 2.0) Model & Code — [HunyuanWorld 1.0](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0) available as interim alternative
|
| 147 |
+
- ⬜ World Expansion (WorldStereo 2.0) Model & Code — [WorldStereo](https://github.com/FuchengSu/WorldStereo) available as interim alternative
|
| 148 |
+
|
| 149 |
+
|
| 150 |
+
## 🎁 Model Zoo
|
| 151 |
+
|
| 152 |
+
### World Reconstruction — WorldMirror Series
|
| 153 |
+
|
| 154 |
+
| Model | Description | Params | Date | Hugging Face |
|
| 155 |
+
|-------|-------------|--------|------|--------------|
|
| 156 |
+
| WorldMirror 2.0 | Multi-view / video → 3D reconstruction | ~1.2B | 2026 | [Download](https://huggingface.co/tencent/HY-World-2.0/tree/main/HY-WorldMirror-2.0) |
|
| 157 |
+
| WorldMirror 1.0 | Multi-view / video → 3D reconstruction (legacy) | ~1.2B | 2025 | [Download](https://huggingface.co/tencent/HunyuanWorld-Mirror/tree/main) |
|
| 158 |
+
|
| 159 |
+
### Panorama Generation
|
| 160 |
+
|
| 161 |
+
| Model | Description | Params | Date | Hugging Face |
|
| 162 |
+
|-------|-------------|--------|------|--------------|
|
| 163 |
+
| HY-PanoGen | Text / image → 360° panorama | — | Coming Soon | — |
|
| 164 |
+
|
| 165 |
+
### World Generation
|
| 166 |
+
|
| 167 |
+
| Model | Description | Params | Date | Hugging Face |
|
| 168 |
+
|-----------------|-------------|-----|------|--------------|
|
| 169 |
+
| WorldStereo 2.0 | Panorama → navigable 3DGS world | — | Coming Soon | — |
|
| 170 |
+
|
| 171 |
+
We recommend referring to our previous works, [WorldStereo](https://github.com/FuchengSu/WorldStereo) and [WorldMirror](https://github.com/Tencent-Hunyuan/HunyuanWorld-Mirror), for background knowledge on world generation and reconstruction.
|
| 172 |
+
|
| 173 |
+
## 🤗 Get Started
|
| 174 |
+
|
| 175 |
+
### Install Requirements
|
| 176 |
+
|
| 177 |
+
We recommend CUDA 12.4 for installation.
|
| 178 |
+
|
| 179 |
+
```bash
|
| 180 |
+
# 1. Clone the repository
|
| 181 |
+
git clone https://github.com/Tencent-Hunyuan/HY-World-2.0
|
| 182 |
+
cd HY-World-2.0
|
| 183 |
+
|
| 184 |
+
# 2. Create conda environment
|
| 185 |
+
conda create -n hyworld2 python=3.10
|
| 186 |
+
conda activate hyworld2
|
| 187 |
+
|
| 188 |
+
# 3. Install PyTorch (CUDA 12.4)
|
| 189 |
+
pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124
|
| 190 |
+
|
| 191 |
+
# 4. Install dependencies
|
| 192 |
+
pip install -r requirements.txt
|
| 193 |
+
|
| 194 |
+
# 5. Install FlashAttention
|
| 195 |
+
# (Recommended) Install FlashAttention-3
|
| 196 |
+
git clone https://github.com/Dao-AILab/flash-attention.git
|
| 197 |
+
cd flash-attention/hopper
|
| 198 |
+
python setup.py install
|
| 199 |
+
cd ../../
|
| 200 |
+
rm -rf flash-attention
|
| 201 |
+
|
| 202 |
+
# For simpler installation, you can also use FlashAttention-2
|
| 203 |
+
pip install flash-attn --no-build-isolation
|
| 204 |
+
```
|
| 205 |
+
|
| 206 |
+
### Code Usage — Panorama Generation (HY-Pano-2)
|
| 207 |
+
|
| 208 |
+
*Coming soon.*
|
| 209 |
+
|
| 210 |
+
### Code Usage — World Generation (WorldNav, WorldStereo-2, and 3DGS)
|
| 211 |
+
|
| 212 |
+
*Coming soon.*
|
| 213 |
+
|
| 214 |
+
**We recommend referring to our previous work, [WorldStereo](https://github.com/FuchengSu/WorldStereo), for the open-source preview version of WorldStereo-2.**
|
| 215 |
+
|
| 216 |
+
### Code Usage — WorldMirror 2.0
|
| 217 |
+
WorldMirror 2.0 supports the following usage modes:
|
| 218 |
+
|
| 219 |
+
- [Code Usage](#code-usage--worldmirror-20)
|
| 220 |
+
- [Gradio App](#gradio-app--worldmirror-20)
|
| 221 |
+
|
| 222 |
+
We provide a `diffusers`-like Python API for WorldMirror 2.0. Model weights are automatically downloaded from Hugging Face on first run.
|
| 223 |
+
|
| 224 |
+
```python
|
| 225 |
+
from hyworld2.worldrecon.pipeline import WorldMirrorPipeline
|
| 226 |
+
|
| 227 |
+
pipeline = WorldMirrorPipeline.from_pretrained('tencent/HY-World-2.0')
|
| 228 |
+
result = pipeline('path/to/images')
|
| 229 |
+
```
|
| 230 |
+
|
| 231 |
+
**With Prior Injection (Camera & Depth):**
|
| 232 |
+
|
| 233 |
+
```python
|
| 234 |
+
result = pipeline(
|
| 235 |
+
'path/to/images',
|
| 236 |
+
prior_cam_path='path/to/prior_camera.json',
|
| 237 |
+
prior_depth_path='path/to/prior_depth/',
|
| 238 |
+
)
|
| 239 |
+
```
|
| 240 |
+
|
| 241 |
+
> For the detailed structure of camera/depth priors and how to prepare them, see [Prior Preparation Guide](DOCUMENTATION.md#prior-injection).
|
| 242 |
+
|
| 243 |
+
**CLI:**
|
| 244 |
+
|
| 245 |
+
```bash
|
| 246 |
+
# Single GPU
|
| 247 |
+
python -m hyworld2.worldrecon.pipeline --input_path path/to/images
|
| 248 |
+
|
| 249 |
+
# Multi-GPU
|
| 250 |
+
torchrun --nproc_per_node=2 -m hyworld2.worldrecon.pipeline \
|
| 251 |
+
--input_path path/to/images \
|
| 252 |
+
--use_fsdp --enable_bf16
|
| 253 |
+
```
|
| 254 |
+
|
| 255 |
+
> **Important:** In multi-GPU mode, the number of input images must be **>= the number of GPUs**. For example, with `--nproc_per_node=8`, provide at least 8 images.
|
| 256 |
+
|
| 257 |
+
### Gradio App — WorldMirror 2.0
|
| 258 |
+
|
| 259 |
+
We provide an interactive [Gradio](https://www.gradio.app/) web demo for WorldMirror 2.0. Upload images or videos and visualize 3DGS, point clouds, depth maps, normal maps, and camera parameters in your browser.
|
| 260 |
+
|
| 261 |
+
```bash
|
| 262 |
+
# Single GPU
|
| 263 |
+
python -m hyworld2.worldrecon.gradio_app
|
| 264 |
+
|
| 265 |
+
# Multi-GPU
|
| 266 |
+
torchrun --nproc_per_node=2 -m hyworld2.worldrecon.gradio_app \
|
| 267 |
+
--use_fsdp --enable_bf16
|
| 268 |
+
```
|
| 269 |
+
|
| 270 |
+
For the full list of Gradio app arguments (port, share, local checkpoints, etc.), see [DOCUMENTATION.md](DOCUMENTATION.md#gradio-app).
|
| 271 |
+
|
| 272 |
+
|
| 273 |
+
|
| 274 |
+
## 🔮 Performance
|
| 275 |
+
|
| 276 |
+
For full benchmark results, please refer to the [technical report](https://3d-models.hunyuan.tencent.com/world/).
|
| 277 |
+
|
| 278 |
+
### WorldStereo 2.0 — Camera Control
|
| 279 |
+
|
| 280 |
+
<table>
|
| 281 |
+
<thead>
|
| 282 |
+
<tr>
|
| 283 |
+
<th rowspan="2">Methods</th>
|
| 284 |
+
<th colspan="3" align="center">Camera Metrics</th>
|
| 285 |
+
<th colspan="4" align="center">Visual Quality</th>
|
| 286 |
+
</tr>
|
| 287 |
+
<tr>
|
| 288 |
+
<th>RotErr ↓</th><th>TransErr ↓</th><th>ATE ↓</th>
|
| 289 |
+
<th>Q-Align ↑</th><th>CLIP-IQA+ ↑</th><th>Laion-Aes ↑</th><th>CLIP-I ↑</th>
|
| 290 |
+
</tr>
|
| 291 |
+
</thead>
|
| 292 |
+
<tbody>
|
| 293 |
+
<tr><td>SEVA</td><td>1.690</td><td>1.578</td><td>2.879</td><td>3.232</td><td>0.479</td><td>4.623</td><td>77.16</td></tr>
|
| 294 |
+
<tr><td>Gen3C</td><td>0.944</td><td>1.580</td><td>2.789</td><td>3.353</td><td>0.489</td><td>4.863</td><td>82.33</td></tr>
|
| 295 |
+
<tr><td>WorldStereo</td><td>0.762</td><td>1.245</td><td>2.141</td><td>4.149</td><td><b>0.547</b></td><td>5.257</td><td>89.05</td></tr>
|
| 296 |
+
<tr><td><b>WorldStereo 2.0</b></td><td><b>0.492</b></td><td><b>0.968</b></td><td><b>1.768</b></td><td><b>4.205</b></td><td>0.544</td><td><b>5.266</b></td><td><b>89.43</b></td></tr>
|
| 297 |
+
</tbody>
|
| 298 |
+
</table>
|
| 299 |
+
|
| 300 |
+
### WorldStereo 2.0 — Single-View-Generated Reconstruction
|
| 301 |
+
|
| 302 |
+
<table>
|
| 303 |
+
<thead>
|
| 304 |
+
<tr>
|
| 305 |
+
<th rowspan="2">Methods</th>
|
| 306 |
+
<th colspan="4">Tanks-and-Temples</th>
|
| 307 |
+
<th colspan="4">MipNeRF360</th>
|
| 308 |
+
</tr>
|
| 309 |
+
<tr>
|
| 310 |
+
<th>Precision ↑</th>
|
| 311 |
+
<th>Recall ↑</th>
|
| 312 |
+
<th>F1-Score ↑</th>
|
| 313 |
+
<th>AUC ↑</th>
|
| 314 |
+
<th>Precision ↑</th>
|
| 315 |
+
<th>Recall ↑</th>
|
| 316 |
+
<th>F1-Score ↑</th>
|
| 317 |
+
<th>AUC ↑</th>
|
| 318 |
+
</tr>
|
| 319 |
+
</thead>
|
| 320 |
+
<tbody align="center">
|
| 321 |
+
<tr>
|
| 322 |
+
<td align="left">SEVA</td>
|
| 323 |
+
<td>33.59</td>
|
| 324 |
+
<td>35.34</td>
|
| 325 |
+
<td>36.73</td>
|
| 326 |
+
<td>51.03</td>
|
| 327 |
+
<td>22.38</td>
|
| 328 |
+
<td>55.63</td>
|
| 329 |
+
<td>28.75</td>
|
| 330 |
+
<td>46.81</td>
|
| 331 |
+
</tr>
|
| 332 |
+
<tr>
|
| 333 |
+
<td align="left">Gen3C</td>
|
| 334 |
+
<td><u>46.73</u></td>
|
| 335 |
+
<td>25.51</td>
|
| 336 |
+
<td>31.24</td>
|
| 337 |
+
<td>42.44</td>
|
| 338 |
+
<td>23.28</td>
|
| 339 |
+
<td><strong>75.37</strong></td>
|
| 340 |
+
<td>35.26</td>
|
| 341 |
+
<td>52.10</td>
|
| 342 |
+
</tr>
|
| 343 |
+
<tr>
|
| 344 |
+
<td align="left">Lyra</td>
|
| 345 |
+
<td><strong>50.38</strong></td>
|
| 346 |
+
<td>28.67</td>
|
| 347 |
+
<td>32.54</td>
|
| 348 |
+
<td>43.05</td>
|
| 349 |
+
<td>30.02</td>
|
| 350 |
+
<td>58.60</td>
|
| 351 |
+
<td>36.05</td>
|
| 352 |
+
<td>49.89</td>
|
| 353 |
+
</tr>
|
| 354 |
+
<tr>
|
| 355 |
+
<td align="left">FlashWorld</td>
|
| 356 |
+
<td>26.58</td>
|
| 357 |
+
<td>20.72</td>
|
| 358 |
+
<td>22.29</td>
|
| 359 |
+
<td>30.45</td>
|
| 360 |
+
<td>35.97</td>
|
| 361 |
+
<td>53.77</td>
|
| 362 |
+
<td>42.60</td>
|
| 363 |
+
<td>53.86</td>
|
| 364 |
+
</tr>
|
| 365 |
+
<tr>
|
| 366 |
+
<td align="left">WorldStereo 2.0</td>
|
| 367 |
+
<td>43.62</td>
|
| 368 |
+
<td><u>41.02</u></td>
|
| 369 |
+
<td><u>41.43</u></td>
|
| 370 |
+
<td><u>58.19</u></td>
|
| 371 |
+
<td><strong>43.19</strong></td>
|
| 372 |
+
<td><u>65.32</u></td>
|
| 373 |
+
<td><strong>51.27</strong></td>
|
| 374 |
+
<td><strong>65.79</strong></td>
|
| 375 |
+
</tr>
|
| 376 |
+
<tr>
|
| 377 |
+
<td align="left">WorldStereo 2.0 (DMD)</td>
|
| 378 |
+
<td>40.41</td>
|
| 379 |
+
<td><strong>44.41</strong></td>
|
| 380 |
+
<td><strong>43.16</strong></td>
|
| 381 |
+
<td><strong>60.09</strong></td>
|
| 382 |
+
<td><u>42.34</u></td>
|
| 383 |
+
<td>64.83</td>
|
| 384 |
+
<td><u>50.52</u></td>
|
| 385 |
+
<td><u>65.64</u></td>
|
| 386 |
+
</tr>
|
| 387 |
+
</tbody>
|
| 388 |
+
</table>
|
| 389 |
+
|
| 390 |
+
### WorldMirror 2.0 — Point Map Reconstruction
|
| 391 |
+
|
| 392 |
+
**Point Map Reconstruction on 7-Scenes, NRGBD, and DTU.** We report the mean Accuracy and Completeness of WorldMirror under different input configurations. **Bold** results are best. "L / M / H" denote low / medium / high inference resolution. "+ all priors" denotes injection of camera extrinsics, camera intrinsics, and depth priors.
|
| 393 |
+
|
| 394 |
+
<table>
|
| 395 |
+
<thead>
|
| 396 |
+
<tr>
|
| 397 |
+
<th rowspan="2">Method</th>
|
| 398 |
+
<th colspan="2" align="center">7-Scenes <sub>(scene)</sub></th>
|
| 399 |
+
<th colspan="2" align="center">NRGBD <sub>(scene)</sub></th>
|
| 400 |
+
<th colspan="2" align="center">DTU <sub>(object)</sub></th>
|
| 401 |
+
</tr>
|
| 402 |
+
<tr>
|
| 403 |
+
<th>Acc. ↓</th><th>Comp. ↓</th>
|
| 404 |
+
<th>Acc. ↓</th><th>Comp. ↓</th>
|
| 405 |
+
<th>Acc. ↓</th><th>Comp. ↓</th>
|
| 406 |
+
</tr>
|
| 407 |
+
</thead>
|
| 408 |
+
<tbody>
|
| 409 |
+
<tr><td colspan="7"><em>WorldMirror 1.0</em></td></tr>
|
| 410 |
+
<tr><td> L</td><td>0.043</td><td>0.055</td><td>0.046</td><td>0.049</td><td>1.476</td><td>1.768</td></tr>
|
| 411 |
+
<tr><td> L + all priors</td><td>0.021</td><td>0.026</td><td>0.022</td><td>0.020</td><td>1.347</td><td>1.392</td></tr>
|
| 412 |
+
<tr><td> M</td><td>0.043</td><td>0.049</td><td>0.041</td><td>0.045</td><td>1.017</td><td>1.780</td></tr>
|
| 413 |
+
<tr><td> M + all priors</td><td>0.018</td><td>0.023</td><td>0.016</td><td>0.014</td><td>0.735</td><td>0.935</td></tr>
|
| 414 |
+
<tr><td> H</td><td>0.079</td><td>0.087</td><td>0.077</td><td>0.093</td><td>2.271</td><td>2.113</td></tr>
|
| 415 |
+
<tr><td> H + all priors</td><td>0.042</td><td>0.041</td><td>0.078</td><td>0.082</td><td>1.773</td><td>1.478</td></tr>
|
| 416 |
+
<tr><td colspan="7"></td></tr>
|
| 417 |
+
<tr><td colspan="7"><em>WorldMirror 2.0</em></td></tr>
|
| 418 |
+
<tr><td> L</td><td>0.041</td><td>0.052</td><td>0.047</td><td>0.058</td><td>1.352</td><td>2.009</td></tr>
|
| 419 |
+
<tr><td> L + all priors</td><td>0.019</td><td>0.024</td><td>0.017</td><td>0.015</td><td>1.100</td><td>1.201</td></tr>
|
| 420 |
+
<tr><td> M</td><td>0.033</td><td>0.046</td><td>0.039</td><td>0.047</td><td>1.005</td><td>1.892</td></tr>
|
| 421 |
+
<tr><td> M + all priors</td><td>0.013</td><td>0.017</td><td><b>0.013</b></td><td><b>0.013</b></td><td>0.690</td><td>0.876</td></tr>
|
| 422 |
+
<tr><td> H</td><td>0.037</td><td>0.040</td><td>0.046</td><td>0.053</td><td>0.845</td><td>1.904</td></tr>
|
| 423 |
+
<tr><td> <b>H + all priors</b></td><td><b>0.012</b></td><td><b>0.016</b></td><td>0.015</td><td>0.016</td><td><b>0.554</b></td><td><b>0.771</b></td></tr>
|
| 424 |
+
</tbody>
|
| 425 |
+
</table>
|
| 426 |
+
|
| 427 |
+
### WorldMirror 2.0 — Prior Comparison
|
| 428 |
+
|
| 429 |
+
**Comparison with Pow3R and MapAnything under Different Prior Conditions.** Results are averaged on 7-Scenes, NRGBD, and DTU datasets. Pow3R (pro) refers to the original Pow3R with Procrustes alignment.
|
| 430 |
+
|
| 431 |
+
|
| 432 |
+
<p align="center">
|
| 433 |
+
<img src="assets/prior_comparison2_wm2.png" width="85%">
|
| 434 |
+
</p>
|
| 435 |
+
|
| 436 |
+
|
| 437 |
+
|
| 438 |
+
|
| 439 |
+
## 🎬 More Examples
|
| 440 |
+
|
| 441 |
+
<table align="center" style="border: none;">
|
| 442 |
+
<tr>
|
| 443 |
+
<td align="center" width="50%"><img src="assets/screenshot_3.gif" width="100%"></td>
|
| 444 |
+
<td align="center" width="50%"><img src="assets/screenshot_4.gif" width="100%"></td>
|
| 445 |
+
</tr>
|
| 446 |
+
<tr>
|
| 447 |
+
<td align="center" width="50%"><img src="assets/screenshot_5.gif" width="100%"></td>
|
| 448 |
+
<td align="center" width="50%"><img src="assets/screenshot_6.gif" width="100%"></td>
|
| 449 |
+
</tr>
|
| 450 |
+
<tr>
|
| 451 |
+
<td align="center" width="50%"><img src="assets/screenshot_9.gif" width="100%"></td>
|
| 452 |
+
<td align="center" width="50%"><img src="assets/screenshot_10.gif" width="100%"></td>
|
| 453 |
+
</tr>
|
| 454 |
+
</table>
|
| 455 |
+
|
| 456 |
+
|
| 457 |
+
## 📖 Documentation
|
| 458 |
+
|
| 459 |
+
For detailed usage guides, parameter references, output format specifications, and prior injection instructions, see **[DOCUMENTATION.md](DOCUMENTATION.md)**.
|
| 460 |
+
|
| 461 |
+
|
| 462 |
+
## 📚 Citation
|
| 463 |
+
|
| 464 |
+
If you find HunyuanWorld 2.0 useful for your research, please cite:
|
| 465 |
+
|
| 466 |
+
```bibtex
|
| 467 |
+
@article{hyworld22026,
|
| 468 |
+
title={HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds},
|
| 469 |
+
author={Tencent HY-World Team},
|
| 470 |
+
journal={arXiv preprint},
|
| 471 |
+
year={2026}
|
| 472 |
+
}
|
| 473 |
+
|
| 474 |
+
@article{hunyuanworld2025tencent,
|
| 475 |
+
title={HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels},
|
| 476 |
+
author={Team HunyuanWorld},
|
| 477 |
+
year={2025},
|
| 478 |
+
journal={arXiv preprint}
|
| 479 |
+
}
|
| 480 |
+
```
|
| 481 |
+
|
| 482 |
+
|
| 483 |
+
## 📧 Contact
|
| 484 |
+
|
| 485 |
+
Please send emails to tengfeiwang12@gmail.com for questions or feedback.
|
| 486 |
+
|
| 487 |
+
|
| 488 |
+
## 🙏 Acknowledgements
|
| 489 |
+
|
| 490 |
+
We would like to thank [HunyuanWorld 1.0](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0), [WorldMirror](https://github.com/Tencent-Hunyuan/HunyuanWorld-Mirror), [WorldPlay](https://github.com/Tencent-Hunyuan/HY-WorldPlay), [WorldStereo](https://github.com/FuchengSu/WorldStereo), [HunyuanImage](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0) for their great work.
|
README_zh.md
ADDED
|
@@ -0,0 +1,475 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
|
| 3 |
+
<h1>HY-World 2.0:用于重建、生成和模拟3D世界的多模态世界模型</h1>
|
| 4 |
+
|
| 5 |
+
[English](README.md) | [简体中文](README_zh.md)
|
| 6 |
+
|
| 7 |
+
<p align="center">
|
| 8 |
+
<img src="assets/teaser.png" width="95%" alt="HY-World-2.0 Teaser">
|
| 9 |
+
</p>
|
| 10 |
+
|
| 11 |
+
<div align="center">
|
| 12 |
+
<a href=https://3d.hunyuan.tencent.com/sceneTo3D target="_blank"><img src=https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage height=22px></a>
|
| 13 |
+
<a href=https://huggingface.co/tencent/HY-World-2.0 target="_blank"><img src=https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg height=22px></a>
|
| 14 |
+
<a href=https://3d-models.hunyuan.tencent.com/world/ target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
|
| 15 |
+
<a href=https://3d-models.hunyuan.tencent.com/world/world2_0/HY_World_2_0.pdf target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
|
| 16 |
+
<a href=https://discord.gg/dNBrdrGGMa target="_blank"><img src= https://img.shields.io/badge/Discord-white.svg?logo=discord height=22px></a>
|
| 17 |
+
<a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Tencent%20HY-black.svg?logo=x height=22px></a>
|
| 18 |
+
<a href="#community-resources" target="_blank"><img src=https://img.shields.io/badge/Community-lavender.svg?logo=homeassistantcommunitystore height=22px></a>
|
| 19 |
+
</div>
|
| 20 |
+
|
| 21 |
+
<br>
|
| 22 |
+
<p align="center">
|
| 23 |
+
<i>"What Is Now Proved Was Once Only Imagined"</i>
|
| 24 |
+
</p>
|
| 25 |
+
|
| 26 |
+
## 🎥 视频
|
| 27 |
+
<video width="100%" controls><source src="https://github.com/user-attachments/assets/b56f4750-25c9-48fb-83ff-d58526711463" type="video/mp4"></video>
|
| 28 |
+
|
| 29 |
+
## 🔥 最新动态
|
| 30 |
+
|
| 31 |
+
- **[2026年4月15日]**:🚀 发布 HY-World 2.0 技术报告及部分代码!
|
| 32 |
+
- **[2026年4月15日]**:🤗 开源 WorldMirror 2.0 推理代码和模型权重!
|
| 33 |
+
- **[即将发布]**:发布完整的 HY-World 2.0(World Generation)推理代码。
|
| 34 |
+
- **[即将发布]**:发布 (HY-Pano 2.0)模型权重和代码。
|
| 35 |
+
- **[即将发布]**:发布 (WorldNav)代码。
|
| 36 |
+
- **[即将发布]**:发布 (WorldStereo 2.0)模型权重和推理代码。
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
## 📋 目录
|
| 40 |
+
- [📖 介绍](#-介绍)
|
| 41 |
+
- [✨ 亮点](#-亮点)
|
| 42 |
+
- [🧩 架构](#-架构)
|
| 43 |
+
- [📝 开源计划](#-开源计划)
|
| 44 |
+
- [🎁 模型库](#-模型库)
|
| 45 |
+
- [🤗 快速开始](#-快速开始)
|
| 46 |
+
- [🔮 性能表现](#-性能表现)
|
| 47 |
+
- [🎬 更多示例](#-更多示例)
|
| 48 |
+
- [📚 引用](#-引用)
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
## 📖 介绍
|
| 52 |
+
|
| 53 |
+
**HY-World 2.0** 是一个面向**世界生成**和**世界重建**的多模态世界模型框架。它接受多种输入模态——文本、单视图图像、多视图图像和视频——并生成3D世界表示(网格 / 3D高斯点云)。它提供两大核心能力:
|
| 54 |
+
|
| 55 |
+
- **世界生成**(文本 / 单张图像 → 3D 世界):通过四阶段方法合成高保真、可导航的3D场景——a) (HY-Pano 2.0),b) (WorldNav),c) (WorldStereo 2.0),d) (WorldMirror 2.0 + 3DGS 学习)。
|
| 56 |
+
- **世界重建**(多视图图像 / 视频 → 3D):由 WorldMirror 2.0 驱动,这是一个统一的前馈模型,能够在单次前向传播中同时预测深度、表面法线、相机参数、3D点云和3DGS属性。
|
| 57 |
+
|
| 58 |
+
HY-World 2.0 是**首个开源的最先进**3D世界模型,其效果可与 Marble 等闭源方法相媲美。我们将发布所有模型权重、代码和技术细节,以促进可复现性和推动该领域的研究进展。
|
| 59 |
+
|
| 60 |
+
### 为什么需要3D世界模型?
|
| 61 |
+
|
| 62 |
+
现有的世界模型(如 Genie 3、Cosmos、HY-World 1.5(WorldPlay+WorldCompass))生成的是像素级视频——本质上是"看一部电影",播放结束即消失。**HY-World 2.0 采用了完全不同的方法**:它直接生成可编辑、可持久化的3D资产(网格 / 3DGS),可以直接导入到 Blender/Unity/Unreal Engine/Isaac Sim 等游戏引擎中——更像是"构建一个可玩的游戏",而非录制一段视频。这种范式转变从根本上解决了视频世界模型的许多长期痛点:
|
| 63 |
+
|
| 64 |
+
| | 视频世界模型 | 3D 世界模型(HY-World 2.0) |
|
| 65 |
+
|--|---|---|
|
| 66 |
+
| **输出** | 像素视频(不可编辑) | 真实 3D 资产——网格 / 3DGS(完全可编辑) |
|
| 67 |
+
| **可交互时长** | 有限(通常 < 1 分钟) | 无限——资产永久保存 |
|
| 68 |
+
| **3D 一致性** | 差(闪烁、跨视角伪影) | 原生一致——内在3D一致性 |
|
| 69 |
+
| **实时渲染** | 需要逐帧推理;延迟高 | 消费级 GPU 即可实时渲染 |
|
| 70 |
+
| **可控性** | 弱(角色控制不精确,无真实物理) | 精确——零误差控制、真实物理碰撞、准确光照 |
|
| 71 |
+
| **推理成本** | 随每次交互累积 | 一次生成;渲染成本 ≈ 0 |
|
| 72 |
+
| **引擎兼容性** | ✗ 仅视频文件 | ✓ 可直接导入 Blender / UE / Isaac Engine |
|
| 73 |
+
| | $\color{IndianRed}{\textsf{看完视频,即刻消失}}$ | $\color{RoyalBlue}{\textbf{构建世界,永久保留}}$ |
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
<table align="center" style="border: none;">
|
| 77 |
+
<tr>
|
| 78 |
+
<td align="center" width="50%"><img src="assets/screenshot_1.gif" width="100%"></td>
|
| 79 |
+
<td align="center" width="50%"><img src="assets/screenshot_2.gif" width="100%"></td>
|
| 80 |
+
</tr>
|
| 81 |
+
<tr>
|
| 82 |
+
<td align="center" width="50%"><img src="assets/screenshot_7.gif" width="100%"></td>
|
| 83 |
+
<td align="center" width="50%"><img src="assets/screenshot_8.gif" width="100%"></td>
|
| 84 |
+
</tr>
|
| 85 |
+
</table>
|
| 86 |
+
|
| 87 |
+
<p align="center"><em>以上均为<strong>真实3D资产</strong>(非生成视频),完全由 HY-World 2.0 创建——截取自实时交互画面。</em></p>
|
| 88 |
+
|
| 89 |
+
## ✨ 亮点
|
| 90 |
+
|
| 91 |
+
- **真实3D世界,而非仅仅是视频**
|
| 92 |
+
|
| 93 |
+
与纯视频世界模型(如 Genie 3、HY World 1.5)不同,HY-World 2.0 生成**真实3D资产**——3DGS、网格和点云——可自由浏览、编辑,并直接导入 **Unity / Unreal Engine / Isaac**。从一段文本提示或一张图像出发,即可创建多种风格的可导航3D世界:写实、卡通、游戏等。
|
| 94 |
+
|
| 95 |
+
<p align="center">
|
| 96 |
+
<img src="assets/mesh.gif" width="95%">
|
| 97 |
+
</p>
|
| 98 |
+
|
| 99 |
+
|
| 100 |
+
- **从照片和视频即时3D重建**
|
| 101 |
+
|
| 102 |
+
由 **WorldMirror 2.0** 驱动,这是一个统一的前馈模型,能够在单次前向传播中从多视图图像或随手拍摄的视频中预测稠密点云、深度图、表面法线、相机参数和3DGS。支持灵活分辨率推理(50K–500K 像素),精度达到 SOTA 水平。拍摄一段视频,即可获得数字孪生。
|
| 103 |
+
|
| 104 |
+
<p align="center">
|
| 105 |
+
<img src="assets/recon.gif" width="95%">
|
| 106 |
+
</p>
|
| 107 |
+
|
| 108 |
+
- **交互式角色探索**
|
| 109 |
+
|
| 110 |
+
不仅仅是观看——**在生成的世界中自由漫游**。HY-World 2.0 支持第一人称导航和第三人称角色模式,用户可以在 AI 生成的街道、建筑和景观中自由探索,并具备基于物理的碰撞效果。前往[我们的产品页面]()免费体验。
|
| 111 |
+
|
| 112 |
+
<p align="center">
|
| 113 |
+
<img src="assets/interactive.gif" width="95%">
|
| 114 |
+
</p>
|
| 115 |
+
|
| 116 |
+
## 🧩 架构
|
| 117 |
+
- **详细信息请参阅我们的技术报告**
|
| 118 |
+
|
| 119 |
+
HY-World 2.0 的系统化流水线——*全景生成*(HY-Pano-2.0)→ *轨迹规划*(WorldNav)→ *世界扩展*(WorldStereo 2.0)→ *世界组合*(WorldMirror 2.0 + 3DGS)——能够自动将文本或单张图像转化为高保真、可导航的3D世界(3DGS/网格输出)。
|
| 120 |
+
|
| 121 |
+
<p align="center">
|
| 122 |
+
<img src="assets/overview.png" width="95%">
|
| 123 |
+
</p>
|
| 124 |
+
|
| 125 |
+
## 📝 开源计划
|
| 126 |
+
|
| 127 |
+
- ✅ 技术报告
|
| 128 |
+
- ✅ WorldMirror 2.0 代码和模型权重
|
| 129 |
+
- ⬜ 世界生成完整推理代码(WorldNav + World Composition)
|
| 130 |
+
- ⬜ 全景生成(HY-Pano 2.0)模型和代码 — 可使用 [HunyuanWorld 1.0](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0) 作为临时替代
|
| 131 |
+
- ⬜ 世界扩展(WorldStereo 2.0)模型和代码 — 可使用 [WorldStereo](https://github.com/FuchengSu/WorldStereo) 作为临时替代
|
| 132 |
+
|
| 133 |
+
|
| 134 |
+
## 🎁 模型库
|
| 135 |
+
|
| 136 |
+
### 世界重建 — WorldMirror 系列
|
| 137 |
+
|
| 138 |
+
| 模型 | 描述 | 参数量 | 日期 | Hugging Face |
|
| 139 |
+
|------|------|--------|------|--------------|
|
| 140 |
+
| WorldMirror 2.0 | 多视图 / 视频 → 3D 重建 | ~1.2B | 2026 | [下载](https://huggingface.co/tencent/HY-World-2.0/tree/main/HY-WorldMirror-2.0) |
|
| 141 |
+
| WorldMirror 1.0 | 多视图 / 视频 → 3D 重建(旧版) | ~1.2B | 2025 | [下载](https://huggingface.co/tencent/HunyuanWorld-Mirror/tree/main) |
|
| 142 |
+
|
| 143 |
+
### 全景生成
|
| 144 |
+
|
| 145 |
+
| 模型 | 描述 | 参数量 | 日期 | Hugging Face |
|
| 146 |
+
|------|------|--------|------|--------------|
|
| 147 |
+
| HY-PanoGen | 文本 / 图像 → 360° 全景 | — | 即将发布 | — |
|
| 148 |
+
|
| 149 |
+
### 世界生成
|
| 150 |
+
|
| 151 |
+
| 模型 | 描述 | 参数量 | 日期 | Hugging Face |
|
| 152 |
+
|-----------------|------|--------|------|--------------|
|
| 153 |
+
| WorldStereo 2.0 | 全景 → 可导航 3DGS 世界 | — | 即将发布 | — |
|
| 154 |
+
|
| 155 |
+
我们建议参考我们之前的工作 [WorldStereo](https://github.com/FuchengSu/WorldStereo) 和 [WorldMirror](https://github.com/Tencent-Hunyuan/HunyuanWorld-Mirror),以了解世界生成和重建的背景知识。
|
| 156 |
+
|
| 157 |
+
## 🤗 快速开始
|
| 158 |
+
|
| 159 |
+
### 安装依赖
|
| 160 |
+
|
| 161 |
+
我们建议使用 CUDA 12.4 进行安装。
|
| 162 |
+
|
| 163 |
+
```bash
|
| 164 |
+
# 1. 克隆仓库
|
| 165 |
+
git clone https://github.com/Tencent-Hunyuan/HY-World-2.0
|
| 166 |
+
cd HY-World-2.0
|
| 167 |
+
|
| 168 |
+
# 2. 创建 conda 环境
|
| 169 |
+
conda create -n hyworld2 python=3.10
|
| 170 |
+
conda activate hyworld2
|
| 171 |
+
|
| 172 |
+
# 3. 安装 PyTorch(CUDA 12.4)
|
| 173 |
+
pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124
|
| 174 |
+
|
| 175 |
+
# 4. 安装依赖
|
| 176 |
+
pip install -r requirements.txt
|
| 177 |
+
|
| 178 |
+
# 5. 安装 FlashAttention
|
| 179 |
+
# (推荐)安装 FlashAttention-3
|
| 180 |
+
git clone https://github.com/Dao-AILab/flash-attention.git
|
| 181 |
+
cd flash-attention/hopper
|
| 182 |
+
python setup.py install
|
| 183 |
+
cd ../../
|
| 184 |
+
rm -rf flash-attention
|
| 185 |
+
|
| 186 |
+
# 也可以使用更简单的 FlashAttention-2 安装方式
|
| 187 |
+
pip install flash-attn --no-build-isolation
|
| 188 |
+
```
|
| 189 |
+
|
| 190 |
+
### 代码使用 — 全景生成(HY-Pano-2)
|
| 191 |
+
|
| 192 |
+
*即将发布。*
|
| 193 |
+
|
| 194 |
+
### 代码使用 — 世界生成(WorldNav、WorldStereo-2 和 3DGS)
|
| 195 |
+
|
| 196 |
+
*即将发布。*
|
| 197 |
+
|
| 198 |
+
**我们建议参考之前的工作 [WorldStereo](https://github.com/FuchengSu/WorldStereo),作为 WorldStereo-2 的开源预览版本。**
|
| 199 |
+
|
| 200 |
+
### 代码使用 — WorldMirror 2.0
|
| 201 |
+
WorldMirror 2.0 支持以下使用方式:
|
| 202 |
+
|
| 203 |
+
- [代码使用](#代码使用--worldmirror-20)
|
| 204 |
+
- [Gradio 应用](#gradio-应用--worldmirror-20)
|
| 205 |
+
|
| 206 |
+
我们提供了类似 `diffusers` 的 Python API。模型权重将在首次运行时自动从 Hugging Face 下载。
|
| 207 |
+
|
| 208 |
+
```python
|
| 209 |
+
from hyworld2.worldrecon.pipeline import WorldMirrorPipeline
|
| 210 |
+
|
| 211 |
+
pipeline = WorldMirrorPipeline.from_pretrained('tencent/HY-World-2.0')
|
| 212 |
+
result = pipeline('path/to/images')
|
| 213 |
+
```
|
| 214 |
+
|
| 215 |
+
**使用先验注入(相机位姿和深度):**
|
| 216 |
+
|
| 217 |
+
```python
|
| 218 |
+
result = pipeline(
|
| 219 |
+
'path/to/images',
|
| 220 |
+
prior_cam_path='path/to/prior_camera.json',
|
| 221 |
+
prior_depth_path='path/to/prior_depth/',
|
| 222 |
+
)
|
| 223 |
+
```
|
| 224 |
+
|
| 225 |
+
> 关于相机/深度先验的详细格式和准备方法,请参阅[先验准备指南](DOCUMENTATION_zh.md#先验注入)。
|
| 226 |
+
|
| 227 |
+
**命令行:**
|
| 228 |
+
|
| 229 |
+
```bash
|
| 230 |
+
# 单卡推理
|
| 231 |
+
python -m hyworld2.worldrecon.pipeline --input_path path/to/images
|
| 232 |
+
|
| 233 |
+
# 多卡推理
|
| 234 |
+
torchrun --nproc_per_node=2 -m hyworld2.worldrecon.pipeline \
|
| 235 |
+
--input_path path/to/images \
|
| 236 |
+
--use_fsdp --enable_bf16
|
| 237 |
+
```
|
| 238 |
+
|
| 239 |
+
> **重要提示:** 在多卡模式下,输入图像数量必须 **>= GPU 数量**。例如,使用 `--nproc_per_node=8` 时,需要提供至少 8 张图像。
|
| 240 |
+
|
| 241 |
+
### Gradio 应用 — WorldMirror 2.0
|
| 242 |
+
|
| 243 |
+
我们提供了一个交互式 [Gradio](https://www.gradio.app/) Web 演示。上传图像或视频,即可在浏览器中可视化 3DGS、点云、深度图、法线图和相机参数。
|
| 244 |
+
|
| 245 |
+
```bash
|
| 246 |
+
# 单卡
|
| 247 |
+
python -m hyworld2.worldrecon.gradio_app
|
| 248 |
+
|
| 249 |
+
# 多卡
|
| 250 |
+
torchrun --nproc_per_node=2 -m hyworld2.worldrecon.gradio_app \
|
| 251 |
+
--use_fsdp --enable_bf16
|
| 252 |
+
```
|
| 253 |
+
|
| 254 |
+
关于 Gradio 应用的完整参数列表(端口、分享、本地检查点等),请参阅 [DOCUMENTATION_zh.md](DOCUMENTATION_zh.md#gradio-应用)。
|
| 255 |
+
|
| 256 |
+
|
| 257 |
+
|
| 258 |
+
## 🔮 性能表现
|
| 259 |
+
|
| 260 |
+
完整的基准测试结果请参阅[技术报告](https://3d-models.hunyuan.tencent.com/world/)。
|
| 261 |
+
|
| 262 |
+
### WorldStereo 2.0 — 相机控制
|
| 263 |
+
|
| 264 |
+
<table>
|
| 265 |
+
<thead>
|
| 266 |
+
<tr>
|
| 267 |
+
<th rowspan="2">方法</th>
|
| 268 |
+
<th colspan="3" align="center">相机指标</th>
|
| 269 |
+
<th colspan="4" align="center">视觉质量</th>
|
| 270 |
+
</tr>
|
| 271 |
+
<tr>
|
| 272 |
+
<th>RotErr ↓</th><th>TransErr ↓</th><th>ATE ↓</th>
|
| 273 |
+
<th>Q-Align ↑</th><th>CLIP-IQA+ ↑</th><th>Laion-Aes ↑</th><th>CLIP-I ↑</th>
|
| 274 |
+
</tr>
|
| 275 |
+
</thead>
|
| 276 |
+
<tbody>
|
| 277 |
+
<tr><td>SEVA</td><td>1.690</td><td>1.578</td><td>2.879</td><td>3.232</td><td>0.479</td><td>4.623</td><td>77.16</td></tr>
|
| 278 |
+
<tr><td>Gen3C</td><td>0.944</td><td>1.580</td><td>2.789</td><td>3.353</td><td>0.489</td><td>4.863</td><td>82.33</td></tr>
|
| 279 |
+
<tr><td>WorldStereo</td><td>0.762</td><td>1.245</td><td>2.141</td><td>4.149</td><td><b>0.547</b></td><td>5.257</td><td>89.05</td></tr>
|
| 280 |
+
<tr><td><b>WorldStereo 2.0</b></td><td><b>0.492</b></td><td><b>0.968</b></td><td><b>1.768</b></td><td><b>4.205</b></td><td>0.544</td><td><b>5.266</b></td><td><b>89.43</b></td></tr>
|
| 281 |
+
</tbody>
|
| 282 |
+
</table>
|
| 283 |
+
|
| 284 |
+
|
| 285 |
+
### WorldStereo 2.0 — 基于单帧输入的生成式重建
|
| 286 |
+
|
| 287 |
+
<table>
|
| 288 |
+
<thead>
|
| 289 |
+
<tr>
|
| 290 |
+
<th rowspan="2">Methods</th>
|
| 291 |
+
<th colspan="4">Tanks-and-Temples</th>
|
| 292 |
+
<th colspan="4">MipNeRF360</th>
|
| 293 |
+
</tr>
|
| 294 |
+
<tr>
|
| 295 |
+
<th>Precision ↑</th>
|
| 296 |
+
<th>Recall ↑</th>
|
| 297 |
+
<th>F1-Score ↑</th>
|
| 298 |
+
<th>AUC ↑</th>
|
| 299 |
+
<th>Precision ↑</th>
|
| 300 |
+
<th>Recall ↑</th>
|
| 301 |
+
<th>F1-Score ↑</th>
|
| 302 |
+
<th>AUC ↑</th>
|
| 303 |
+
</tr>
|
| 304 |
+
</thead>
|
| 305 |
+
<tbody align="center">
|
| 306 |
+
<tr>
|
| 307 |
+
<td align="left">SEVA</td>
|
| 308 |
+
<td>33.59</td>
|
| 309 |
+
<td>35.34</td>
|
| 310 |
+
<td>36.73</td>
|
| 311 |
+
<td>51.03</td>
|
| 312 |
+
<td>22.38</td>
|
| 313 |
+
<td>55.63</td>
|
| 314 |
+
<td>28.75</td>
|
| 315 |
+
<td>46.81</td>
|
| 316 |
+
</tr>
|
| 317 |
+
<tr>
|
| 318 |
+
<td align="left">Gen3C</td>
|
| 319 |
+
<td><u>46.73</u></td>
|
| 320 |
+
<td>25.51</td>
|
| 321 |
+
<td>31.24</td>
|
| 322 |
+
<td>42.44</td>
|
| 323 |
+
<td>23.28</td>
|
| 324 |
+
<td><strong>75.37</strong></td>
|
| 325 |
+
<td>35.26</td>
|
| 326 |
+
<td>52.10</td>
|
| 327 |
+
</tr>
|
| 328 |
+
<tr>
|
| 329 |
+
<td align="left">Lyra</td>
|
| 330 |
+
<td><strong>50.38</strong></td>
|
| 331 |
+
<td>28.67</td>
|
| 332 |
+
<td>32.54</td>
|
| 333 |
+
<td>43.05</td>
|
| 334 |
+
<td>30.02</td>
|
| 335 |
+
<td>58.60</td>
|
| 336 |
+
<td>36.05</td>
|
| 337 |
+
<td>49.89</td>
|
| 338 |
+
</tr>
|
| 339 |
+
<tr>
|
| 340 |
+
<td align="left">FlashWorld</td>
|
| 341 |
+
<td>26.58</td>
|
| 342 |
+
<td>20.72</td>
|
| 343 |
+
<td>22.29</td>
|
| 344 |
+
<td>30.45</td>
|
| 345 |
+
<td>35.97</td>
|
| 346 |
+
<td>53.77</td>
|
| 347 |
+
<td>42.60</td>
|
| 348 |
+
<td>53.86</td>
|
| 349 |
+
</tr>
|
| 350 |
+
<tr>
|
| 351 |
+
<td align="left">WorldStereo 2.0</td>
|
| 352 |
+
<td>43.62</td>
|
| 353 |
+
<td><u>41.02</u></td>
|
| 354 |
+
<td><u>41.43</u></td>
|
| 355 |
+
<td><u>58.19</u></td>
|
| 356 |
+
<td><strong>43.19</strong></td>
|
| 357 |
+
<td><u>65.32</u></td>
|
| 358 |
+
<td><strong>51.27</strong></td>
|
| 359 |
+
<td><strong>65.79</strong></td>
|
| 360 |
+
</tr>
|
| 361 |
+
<tr>
|
| 362 |
+
<td align="left">WorldStereo 2.0 (DMD)</td>
|
| 363 |
+
<td>40.41</td>
|
| 364 |
+
<td><strong>44.41</strong></td>
|
| 365 |
+
<td><strong>43.16</strong></td>
|
| 366 |
+
<td><strong>60.09</strong></td>
|
| 367 |
+
<td><u>42.34</u></td>
|
| 368 |
+
<td>64.83</td>
|
| 369 |
+
<td><u>50.52</u></td>
|
| 370 |
+
<td><u>65.64</u></td>
|
| 371 |
+
</tr>
|
| 372 |
+
</tbody>
|
| 373 |
+
</table>
|
| 374 |
+
|
| 375 |
+
### WorldMirror 2.0 — 点云重建
|
| 376 |
+
|
| 377 |
+
**在 7-Scenes、NRGBD 和 DTU 上的点图重建。** 我们报告了 WorldMirror 在不同输入配置下的平均精度和完整度。**加粗**为最优结果。"L / M / H" 分别代表低 / 中 / 高推理分辨率。"+ all priors" 表示同时注入相机外参、相机内参和深度先验。
|
| 378 |
+
|
| 379 |
+
<table>
|
| 380 |
+
<thead>
|
| 381 |
+
<tr>
|
| 382 |
+
<th rowspan="2">方法</th>
|
| 383 |
+
<th colspan="2" align="center">7-Scenes <sub>(场景)</sub></th>
|
| 384 |
+
<th colspan="2" align="center">NRGBD <sub>(场景)</sub></th>
|
| 385 |
+
<th colspan="2" align="center">DTU <sub>(物体)</sub></th>
|
| 386 |
+
</tr>
|
| 387 |
+
<tr>
|
| 388 |
+
<th>Acc. ↓</th><th>Comp. ↓</th>
|
| 389 |
+
<th>Acc. ↓</th><th>Comp. ↓</th>
|
| 390 |
+
<th>Acc. ↓</th><th>Comp. ↓</th>
|
| 391 |
+
</tr>
|
| 392 |
+
</thead>
|
| 393 |
+
<tbody>
|
| 394 |
+
<tr><td colspan="7"><em>WorldMirror 1.0</em></td></tr>
|
| 395 |
+
<tr><td> L</td><td>0.043</td><td>0.055</td><td>0.046</td><td>0.049</td><td>1.476</td><td>1.768</td></tr>
|
| 396 |
+
<tr><td> L + all priors</td><td>0.021</td><td>0.026</td><td>0.022</td><td>0.020</td><td>1.347</td><td>1.392</td></tr>
|
| 397 |
+
<tr><td> M</td><td>0.043</td><td>0.049</td><td>0.041</td><td>0.045</td><td>1.017</td><td>1.780</td></tr>
|
| 398 |
+
<tr><td> M + all priors</td><td>0.018</td><td>0.023</td><td>0.016</td><td>0.014</td><td>0.735</td><td>0.935</td></tr>
|
| 399 |
+
<tr><td> H</td><td>0.079</td><td>0.087</td><td>0.077</td><td>0.093</td><td>2.271</td><td>2.113</td></tr>
|
| 400 |
+
<tr><td> H + all priors</td><td>0.042</td><td>0.041</td><td>0.078</td><td>0.082</td><td>1.773</td><td>1.478</td></tr>
|
| 401 |
+
<tr><td colspan="7"></td></tr>
|
| 402 |
+
<tr><td colspan="7"><em>WorldMirror 2.0</em></td></tr>
|
| 403 |
+
<tr><td> L</td><td>0.041</td><td>0.052</td><td>0.047</td><td>0.058</td><td>1.352</td><td>2.009</td></tr>
|
| 404 |
+
<tr><td> L + all priors</td><td>0.019</td><td>0.024</td><td>0.017</td><td>0.015</td><td>1.100</td><td>1.201</td></tr>
|
| 405 |
+
<tr><td> M</td><td>0.033</td><td>0.046</td><td>0.039</td><td>0.047</td><td>1.005</td><td>1.892</td></tr>
|
| 406 |
+
<tr><td> M + all priors</td><td>0.013</td><td>0.017</td><td><b>0.013</b></td><td><b>0.013</b></td><td>0.690</td><td>0.876</td></tr>
|
| 407 |
+
<tr><td> H</td><td>0.037</td><td>0.040</td><td>0.046</td><td>0.053</td><td>0.845</td><td>1.904</td></tr>
|
| 408 |
+
<tr><td> <b>H + all priors</b></td><td><b>0.012</b></td><td><b>0.016</b></td><td>0.015</td><td>0.016</td><td><b>0.554</b></td><td><b>0.771</b></td></tr>
|
| 409 |
+
</tbody>
|
| 410 |
+
</table>
|
| 411 |
+
|
| 412 |
+
### WorldMirror 2.0 — 先验对比
|
| 413 |
+
|
| 414 |
+
**WorldMirror 与 Pow3R、MapAnything 在不同先验条件下的对比。** 结果为 7-Scenes、NRGBD 和 DTU 数据集上的平均值。Pow3R (pro) 指使用 Procrustes 对齐的原版 Pow3R。
|
| 415 |
+
|
| 416 |
+
|
| 417 |
+
<p align="center">
|
| 418 |
+
<img src="assets/prior_comparison2_wm2.png" width="85%">
|
| 419 |
+
</p>
|
| 420 |
+
|
| 421 |
+
|
| 422 |
+
|
| 423 |
+
|
| 424 |
+
## 🎬 更多示例
|
| 425 |
+
|
| 426 |
+
<table align="center" style="border: none;">
|
| 427 |
+
<tr>
|
| 428 |
+
<td align="center" width="50%"><img src="assets/screenshot_3.gif" width="100%"></td>
|
| 429 |
+
<td align="center" width="50%"><img src="assets/screenshot_4.gif" width="100%"></td>
|
| 430 |
+
</tr>
|
| 431 |
+
<tr>
|
| 432 |
+
<td align="center" width="50%"><img src="assets/screenshot_5.gif" width="100%"></td>
|
| 433 |
+
<td align="center" width="50%"><img src="assets/screenshot_6.gif" width="100%"></td>
|
| 434 |
+
</tr>
|
| 435 |
+
<tr>
|
| 436 |
+
<td align="center" width="50%"><img src="assets/screenshot_9.gif" width="100%"></td>
|
| 437 |
+
<td align="center" width="50%"><img src="assets/screenshot_10.gif" width="100%"></td>
|
| 438 |
+
</tr>
|
| 439 |
+
</table>
|
| 440 |
+
|
| 441 |
+
|
| 442 |
+
## 📖 文档
|
| 443 |
+
|
| 444 |
+
详细的使用指南、参数参考、输出格式说明和先验注入说明,请参阅 **[DOCUMENTATION_zh.md](DOCUMENTATION_zh.md)**。
|
| 445 |
+
|
| 446 |
+
|
| 447 |
+
## 📚 引用
|
| 448 |
+
|
| 449 |
+
如果您觉得 HunyuanWorld 2.0 对您的研究有帮助,请引用:
|
| 450 |
+
|
| 451 |
+
```bibtex
|
| 452 |
+
@article{hyworld22026,
|
| 453 |
+
title={HY-World 2.0: A Multi-Modal World Model for Reconstructing, Generating and Simulating 3D Worlds},
|
| 454 |
+
author={Tencent HY-World Team},
|
| 455 |
+
journal={arXiv preprint},
|
| 456 |
+
year={2026}
|
| 457 |
+
}
|
| 458 |
+
|
| 459 |
+
@article{hunyuanworld2025tencent,
|
| 460 |
+
title={HunyuanWorld 1.0: Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels},
|
| 461 |
+
author={Team HunyuanWorld},
|
| 462 |
+
year={2025},
|
| 463 |
+
journal={arXiv preprint}
|
| 464 |
+
}
|
| 465 |
+
```
|
| 466 |
+
|
| 467 |
+
|
| 468 |
+
## 📧 联系方式
|
| 469 |
+
|
| 470 |
+
如有任何问题或反馈,请发送邮件至 tengfeiwang12@gmail.com。
|
| 471 |
+
|
| 472 |
+
|
| 473 |
+
## 🙏 致谢
|
| 474 |
+
|
| 475 |
+
我们衷心感谢 [HunyuanWorld 1.0](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0)、[WorldMirror](https://github.com/Tencent-Hunyuan/HunyuanWorld-Mirror)、[WorldPlay](https://github.com/Tencent-Hunyuan/HY-WorldPlay)、[WorldStereo](https://github.com/FuchengSu/WorldStereo)、[HunyuanImage](https://github.com/Tencent-Hunyuan/HunyuanImage-3.0) 的杰出工作。
|
assets/hyworld2_en.mp4
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bd227cfaf618716dd4d4c8a8a9d032947ac3cd60c1adc6771bd7a066069b94fd
|
| 3 |
+
size 20726000
|
assets/interactive.gif
ADDED
|
Git LFS Details
|
assets/mesh.gif
ADDED
|
Git LFS Details
|
assets/mesh_en.gif
ADDED
|
Git LFS Details
|
assets/overview.png
ADDED
|
Git LFS Details
|
assets/prior_comparison2_wm2.png
ADDED
|
assets/qrcode/discord.png
ADDED
|
assets/qrcode/wechat.png
ADDED
|
assets/qrcode/x.png
ADDED
|
assets/qrcode/xiaohongshu.png
ADDED
|
assets/recon.gif
ADDED
|
Git LFS Details
|
assets/recon_en.gif
ADDED
|
Git LFS Details
|
assets/screenshot_1.gif
ADDED
|
Git LFS Details
|
assets/screenshot_10.gif
ADDED
|
Git LFS Details
|
assets/screenshot_2.gif
ADDED
|
Git LFS Details
|
assets/screenshot_3.gif
ADDED
|
Git LFS Details
|
assets/screenshot_4.gif
ADDED
|
Git LFS Details
|
assets/screenshot_5.gif
ADDED
|
Git LFS Details
|
assets/screenshot_6.gif
ADDED
|
Git LFS Details
|
assets/screenshot_7.gif
ADDED
|
Git LFS Details
|
assets/screenshot_8.gif
ADDED
|
Git LFS Details
|
assets/screenshot_9.gif
ADDED
|
Git LFS Details
|
assets/teaser.png
ADDED
|
Git LFS Details
|