Dhanidjulian htdong commited on
Commit
94b1279
Β·
0 Parent(s):

Duplicate from htdong/Wan-Alpha

Browse files

Co-authored-by: Dong Haotian <htdong@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/girl_pha.gif filter=lfs diff=lfs merge=lfs -text
37
+ assets/girl.gif filter=lfs diff=lfs merge=lfs -text
38
+ assets/teaser.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,175 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Wan-AI/Wan2.1-T2V-14B
4
+ license: apache-2.0
5
+ pipeline_tag: text-to-video
6
+ tags:
7
+ - rgba
8
+ - transparency
9
+ ---
10
+
11
+ <div align="center">
12
+
13
+ <h1>
14
+ Wan-Alpha
15
+ </h1>
16
+
17
+ <h3>Wan-Alpha: High-Quality Text-to-Video Generation with Alpha Channel</h3>
18
+
19
+
20
+
21
+ [![arXiv](https://img.shields.io/badge/arXiv-2509.24979-b31b1b)](https://arxiv.org/pdf/2509.24979)
22
+ [![HF Paper](https://img.shields.io/badge/HuggingFace-Paper-blue)](https://huggingface.co/papers/2509.24979)
23
+ [![Project Page](https://img.shields.io/badge/Project_Page-Link-green)](https://donghaotian123.github.io/Wan-Alpha/)
24
+ [![GitHub](https://img.shields.io/badge/GitHub-Repo-black?logo=github)](https://github.com/WeChatCV/Wan-Alpha)
25
+ [![πŸ€— HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-orange)](https://huggingface.co/htdong/Wan-Alpha)
26
+ [![ComfyUI](https://img.shields.io/badge/ComfyUI-Version-blue)](https://huggingface.co/htdong/Wan-Alpha_ComfyUI)
27
+ [![πŸ€— HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model_v2.0-yellow)](https://huggingface.co/htdong/Wan-Alpha-v2.0)
28
+
29
+ </div>
30
+
31
+ <img src="assets/teaser.png" alt="Wan-Alpha Qualitative Results" style="max-width: 100%; height: auto;">
32
+
33
+ >Qualitative results of video generation using **Wan-Alpha**. Our model successfully generates various scenes with accurate and clearly rendered transparency. Notably, it can synthesize diverse semi-transparent objects, glowing effects, and fine-grained details such as hair.
34
+
35
+ ---
36
+
37
+ ## Abstract
38
+ RGBA video generation, which includes an alpha channel to represent transparency, is gaining increasing attention across a wide range of applications. However, existing methods often neglect visual quality, limiting their practical usability. In this paper, we propose Wan-Alpha, a new framework that generates transparent videos by learning both RGB and alpha channels jointly. We design an effective variational autoencoder (VAE) that encodes the alpha channel into the RGB latent space. Then, to support the training of our diffusion transformer, we construct a high-quality and diverse RGBA video dataset. Compared with state-of-the-art methods, our model demonstrates superior performance in visual quality, motion realism, and transparency rendering. Notably, our model can generate a wide variety of semi-transparent objects, glowing effects, and fine-grained details such as hair strands. The released model is available on our website: this https URL .
39
+
40
+ ---
41
+
42
+ ## πŸ”₯ News
43
+ * **[2025.09.30]** Released Wan-Alpha v1.0, the Wan2.1-14B-T2V–adapted weights and inference code are now open-sourced.
44
+
45
+ ---
46
+ ## 🌟 Showcase
47
+
48
+ ### Text-to-Video Generation with Alpha Channel
49
+
50
+
51
+ | Prompt | Preview Video | Alpha Video |
52
+ | :---: | :---: | :---: |
53
+ | "Medium shot. A little girl holds a bubble wand and blows out colorful bubbles that float and pop in the air. The background of this video is transparent. Realistic style." | <img src="assets/girl.gif" width="320" height="180" style="object-fit:contain; display:block; margin:auto;"/> | <img src="assets/girl_pha.gif" width="335" height="180" style="object-fit:contain; display:block; margin:auto;"/> |
54
+
55
+ ### For more results, please visit [Our Website](https://donghaotian123.github.io/Wan-Alpha/)
56
+
57
+ ## πŸš€ Quick Start
58
+
59
+ ### 1. Environment Setup
60
+ ```bash
61
+ # Clone the project repository
62
+ git clone https://github.com/WeChatCV/Wan-Alpha.git
63
+ cd Wan-Alpha
64
+
65
+ # Create and activate Conda environment
66
+ conda create -n Wan-Alpha python=3.11 -y
67
+ conda activate Wan-Alpha
68
+
69
+ # Install dependencies
70
+ pip install -r requirements.txt
71
+ ```
72
+
73
+ ### 2. Model Download
74
+ Download [Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B)
75
+
76
+ Download [Lightx2v-T2V-14B](https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Lightx2v/lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors)
77
+
78
+ Download [Wan-Alpha VAE](https://huggingface.co/htdong/Wan-Alpha)
79
+
80
+ ### πŸ§ͺ Usage
81
+ You can test our model through:
82
+ ```bash
83
+ torchrun --nproc_per_node=8 --master_port=29501 generate_dora_lightx2v.py --size 832*480\
84
+ --ckpt_dir "path/to/your/Wan-2.1/Wan2.1-T2V-14B" \
85
+ --dit_fsdp --t5_fsdp --ulysses_size 8 \
86
+ --vae_lora_checkpoint "path/to/your/decoder.bin" \
87
+ --lora_path "path/to/your/epoch-13-1500.safetensors" \
88
+ --lightx2v_path "path/to/your/lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors" \
89
+ --sample_guide_scale 1.0 \
90
+ --frame_num 81 \
91
+ --sample_steps 4 \
92
+ --lora_ratio 1.0 \
93
+ --lora_prefix "" \
94
+ --prompt_file ./data/prompt.txt \
95
+ --output_dir ./output
96
+ ```
97
+ You can specify the weights of `Wan2.1-T2V-14B` with `--ckpt_dir`, `LightX2V-T2V-14B with` `--lightx2v_path`, `Wan-Alpha-VAE` with `--vae_lora_checkpoint`, and `Wan-Alpha-T2V` with `--lora_path`. Finally, you can find the rendered RGBA videos with a checkerboard background and PNG frames at `--output_dir`.
98
+
99
+ **Prompt Writing Tip:** You need to specify that the background of the video is transparent, the visual style, the shot type (such as close-up, medium shot, wide shot, or extreme close-up), and a description of the main subject. Prompts support both Chinese and English input.
100
+
101
+ ```bash
102
+ # An example of prompt.
103
+ This video has a transparent background. Close-up shot. A colorful parrot flying. Realistic style.
104
+ ```
105
+
106
+ ### πŸ”¨ Official ComfyUI Version
107
+
108
+ Note: We have reorganized our models to ensure they can be easily loaded into ComfyUI. Please note that these models differ from the ones mentioned above.
109
+
110
+ 1. Download models
111
+ - The Wan DiT base model: [wan2.1_t2v_14B_fp16.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/diffusion_models/wan2.1_t2v_14B_fp16.safetensors)
112
+ - The Wan text encoder: [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)
113
+ - The LightX2V model: [lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors](https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Lightx2v/lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors)
114
+ - Our RGBA Dora: [epoch-13-1500_changed.safetensors](https://huggingface.co/htdong/Wan-Alpha_ComfyUI/blob/main/epoch-13-1500_changed.safetensors)
115
+ - Our RGB VAE Decoder: [wan_alpha_2.1_vae_rgb_channel.safetensors.safetensors](https://huggingface.co/htdong/Wan-Alpha_ComfyUI/blob/main/wan_alpha_2.1_vae_rgb_channel.safetensors.safetensors)
116
+ - Our Alpha VAE Decoder: [wan_alpha_2.1_vae_alpha_channel.safetensors.safetensors](https://huggingface.co/htdong/Wan-Alpha_ComfyUI/blob/main/wan_alpha_2.1_vae_alpha_channel.safetensors.safetensors)
117
+
118
+ 2. Copy the files into the `ComfyUI/models` folder and organize them as follows:
119
+
120
+ ```
121
+ ComfyUI/models
122
+ β”œβ”€β”€ diffusion_models
123
+ β”‚ └── wan2.1_t2v_14B_fp16.safetensors
124
+ β”œβ”€β”€ loras
125
+ β”‚ β”œβ”€β”€ epoch-13-1500_changed.safetensors
126
+ β”‚ └── lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors
127
+ β”œβ”€β”€ text_encoders
128
+ β”‚ └── umt5_xxl_fp8_e4m3fn_scaled.safetensors
129
+ β”œβ”€β”€ vae
130
+ β”‚ β”œβ”€β”€ wan_alpha_2.1_vae_alpha_channel.safetensors.safetensors
131
+ β”‚ └── wan_alpha_2.1_vae_rgb_channel.safetensors.safetensors
132
+ ```
133
+
134
+ 3. Install our custom RGBA video previewer and PNG frames zip packer. Copy the file [RGBA_save_tools.py](comfyui/RGBA_save_tools.py) into the `ComfyUI/custom_nodes` folder.
135
+
136
+ - Thanks to @mr-lab for an improved WebP version! You can find it in this [issue](https://github.com/WeChatCV/Wan-Alpha/issues/4).
137
+
138
+ 4. Example workflow: [wan_alpha_t2v_14B.json](comfyui/wan_alpha_t2v_14B.json)
139
+
140
+ <img src="comfyui/comfyui.jpg" style="margin:auto;"/>
141
+
142
+
143
+ ## 🀝 Acknowledgements
144
+
145
+ This project is built upon the following excellent open-source projects:
146
+ * [DiffSynth-Studio](https://github.com/modelscope/DiffSynth-Studio) (training/inference framework)
147
+ * [Wan2.1](https://github.com/Wan-Video/Wan2.1) (base video generation model)
148
+ * [LightX2V](https://github.com/ModelTC/LightX2V) (inference acceleration)
149
+ * [WanVideo_comfy](https://huggingface.co/Kijai/WanVideo_comfy) (inference acceleration)
150
+
151
+ We sincerely thank the authors and contributors of these projects.
152
+
153
+ ---
154
+
155
+ ## ✏ Citation
156
+
157
+ If you find our work helpful for your research, please consider citing our paper:
158
+
159
+ ```bibtex
160
+ @misc{dong2025wanalpha,
161
+ title={Wan-Alpha: High-Quality Text-to-Video Generation with Alpha Channel},
162
+ author={Haotian Dong and Wenjing Wang and Chen Li and Di Lin},
163
+ year={2025},
164
+ eprint={2509.24979},
165
+ archivePrefix={arXiv},
166
+ primaryClass={cs.CV},
167
+ url={https://arxiv.org/abs/2509.24979},
168
+ }
169
+ ```
170
+
171
+ ---
172
+
173
+ ## πŸ“¬ Contact Us
174
+
175
+ If you have any questions or suggestions, feel free to reach out via [GitHub Issues](https://github.com/WeChatCV/Wan-Alpha/issues) . We look forward to your feedback!
assets/girl.gif ADDED

Git LFS Details

  • SHA256: ed6b35e2d7b2ec3f0e101e964ae4d1c584f71b0d68212a726c18d6cc66f62475
  • Pointer size: 133 Bytes
  • Size of remote file: 10.9 MB
assets/girl_pha.gif ADDED

Git LFS Details

  • SHA256: e324db73abaf756161485fdd126792cd3e4e136f1210f8b0934e2bd328cd362b
  • Pointer size: 132 Bytes
  • Size of remote file: 1.24 MB
assets/teaser.png ADDED

Git LFS Details

  • SHA256: 6f232e91d40ba58df07131587c04a914dfb481e503ff43ccd84fd5b88adf9fef
  • Pointer size: 133 Bytes
  • Size of remote file: 23.3 MB
decoder.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ec6b6107d8033c0d1b40a7d64e7c09e1594725f56ba60cb40ffd8be8b926e89
3
+ size 635538952
epoch-13-1500.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9d0f54c4655589eb0862c617221902c53123447690ef4d4fac703030ccf23644
3
+ size 311645240