Commit ·
8625d6a
0
Parent(s):
Super-squash branch 'main' using huggingface_hub
Browse files- .gitattributes +35 -0
- README.md +79 -0
- model.safetensors +3 -0
.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,79 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- video-generation
|
| 5 |
+
- game-rendering
|
| 6 |
+
- game-editing
|
| 7 |
+
- diffusion
|
| 8 |
+
- g-buffer
|
| 9 |
+
- relighting
|
| 10 |
+
- text-to-video
|
| 11 |
+
- wan2.1
|
| 12 |
+
pipeline_tag: text-to-video
|
| 13 |
+
base_model: Wan-AI/Wan2.1-T2V-1.3B
|
| 14 |
+
datasets:
|
| 15 |
+
- custom
|
| 16 |
+
library_name: diffusers
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
# Game Editing
|
| 20 |
+
|
| 21 |
+
**Game Editing** is a fine-tuned video diffusion model for controllable game video synthesis. It enables users to manipulate lighting and environmental effects in game footage via text prompts, conditioned on G-buffer inputs.
|
| 22 |
+
|
| 23 |
+
## Model Details
|
| 24 |
+
|
| 25 |
+
| Attribute | Detail |
|
| 26 |
+
|-----------|--------|
|
| 27 |
+
| **Base Model** | [Wan 2.1-T2V-1.3B](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) |
|
| 28 |
+
| **Parameters** | 1.42B (BF16) |
|
| 29 |
+
| **Resolution** | 832 × 480 (480p) |
|
| 30 |
+
| **Frame Rate** | 16 FPS |
|
| 31 |
+
| **Clip Length** | 81 frames |
|
| 32 |
+
| **Format** | SafeTensors |
|
| 33 |
+
|
| 34 |
+
## Inputs
|
| 35 |
+
|
| 36 |
+
The model takes the following inputs:
|
| 37 |
+
|
| 38 |
+
- **G-buffers** as conditional inputs, providing dense geometric and material priors:
|
| 39 |
+
- **Basecolor** (albedo)
|
| 40 |
+
- **Normal** (surface normals)
|
| 41 |
+
- **Depth**
|
| 42 |
+
- **Roughness**
|
| 43 |
+
- **Metallic**
|
| 44 |
+
- **Text prompt** describing the desired lighting and environmental effects
|
| 45 |
+
|
| 46 |
+
The G-buffers encode the scene's geometry and materials, while the text prompt controls lighting conditions, atmospheric effects, and overall visual style. This decoupled design allows users to edit the visual appearance of game footage without altering the underlying scene structure.
|
| 47 |
+
|
| 48 |
+
## Training
|
| 49 |
+
|
| 50 |
+
### Architecture
|
| 51 |
+
|
| 52 |
+
We adapt [Wan 2.1-T2V-1.3B](https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B) by incorporating G-buffers (dense geometric and material priors) as conditional inputs. The model is fully fine-tuned following the original training configuration of the base model.
|
| 53 |
+
|
| 54 |
+
### Data
|
| 55 |
+
|
| 56 |
+
The model is trained on video clips from the [**Black Myth: Wukong** dataset](https://github.com/ShandaAI/AlayaRenderer). Descriptive captions for each clip are generated using [Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct). Since G-buffers already provide dense geometric and material information, the captions focus exclusively on **lighting and environmental effects**, enabling fine-grained text-based control over these attributes during inference.
|
| 57 |
+
|
| 58 |
+
### Procedure
|
| 59 |
+
|
| 60 |
+
- **Full fine-tuning** on the [Black Myth: Wukong dataset](https://github.com/ShandaAI/AlayaRenderer)
|
| 61 |
+
- Spatial resolution: **832 × 480** (480p)
|
| 62 |
+
- Frame rate: **16 FPS**
|
| 63 |
+
- Clip length: **81 frames**
|
| 64 |
+
|
| 65 |
+
## Evaluation & Generalization
|
| 66 |
+
|
| 67 |
+
In the absence of directly comparable methods, we establish a baseline by adapting DiffusionRenderer's forward renderer with DiffusionLight-extracted environment maps as lighting conditions.
|
| 68 |
+
|
| 69 |
+
- A held-out subset of Black Myth: Wukong is used for testing.
|
| 70 |
+
- **Cross-dataset evaluation** on **Cyberpunk 2077** demonstrates strong generalization to unseen game environments, maintaining high-fidelity and controllable video synthesis.
|
| 71 |
+
|
| 72 |
+
## Intended Use
|
| 73 |
+
|
| 74 |
+
- **Game video editing**: Manipulate lighting and environmental effects in game footage through text descriptions.
|
| 75 |
+
- **Controllable video synthesis**: Generate stylized game video conditioned on G-buffers and text prompts.
|
| 76 |
+
|
| 77 |
+
## Citation
|
| 78 |
+
|
| 79 |
+
If you find this model useful, please consider citing our work.
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6ad3f92058110417e68e56c53fc11ce8f077a4dea2689c3a0fef43251d8da853
|
| 3 |
+
size 2839060368
|