INSTADOC
/

INSTADOC ZHANGYUXUAN-zR commited on
Commit
f6fdb20
·
0 Parent(s):

Duplicate from zai-org/SSVAE

Browse files

Co-authored-by: zR <ZHANGYUXUAN-zR@users.noreply.huggingface.co>

Files changed (3) hide show
  1. .gitattributes +35 -0
  2. README.md +53 -0
  3. ch48_256p_15w_512p_5w.ckpt +3 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability
6
+
7
+ [![Website](https://img.shields.io/badge/Website-Project%20Page-blue)](https://zhazhan.github.io/ssvae.github.io)
8
+ [![arXiv](https://img.shields.io/badge/arXiv-2512.05394-b31b1b)](https://arxiv.org/abs/2512.05394)
9
+
10
+
11
+ Most existing video VAEs prioritize reconstruction fidelity, often overlooking the latent structure's impact on
12
+ downstream diffusion training. Our research identifies properties of video VAE latent spaces that facilitate diffusion
13
+ training through statistical analysis of VAE latents. Our key finding is that biased, rather than uniform, spectra lead
14
+ to improved diffusability. Motivated by this, we introduce **SSVAE (Spectral-Structured VAE)**, which optimizes the *
15
+ *spectral properties** of the latent space to enhance its **"Diffusability"**.
16
+
17
+ <div align="center">
18
+ <img src="https://raw.githubusercontent.com/zai-org/SSVAE/refs/heads/main/assets/figs/teaser.png" alt="Figure 1" width="400">
19
+ </div>
20
+
21
+ ## 🔥 Key Highlights
22
+
23
+ * **Spectral Analysis of Latents**: We identify two statistical properties essential for efficient diffusion training: a
24
+ **low-frequency biased spatio-temporal spectrum** and a **few-mode biased channel eigenspectrum**.
25
+ * **Local Correlation Regularization (LCR)**: A lightweight regularizer that explicitly enhances local spatio-temporal
26
+ correlations to induce low-frequency bias.
27
+ * **Latent Masked Reconstruction (LMR)**: A mechanism that simultaneously promotes few-mode bias and improves decoder
28
+ robustness against noise.
29
+ * **Superior Performance**:
30
+ * 🚀 **3× Faster Convergence**: Accelerates text-to-video generation convergence by 3× compared to strong baselines.
31
+ * 📈 **Higher Quality**: Achieves a **10% gain** in video reward scores (UnifiedReward).
32
+ * 🏆 **Outperforms SOTA**: Surpasses open-source VAEs (e.g., Wan 2.2, CogVideoX) in generation quality with fewer
33
+ parameters.
34
+
35
+ ## Using Model
36
+
37
+ Please View our [Github](https://github.com/zai-org/SSVAE).
38
+
39
+ ## Citation
40
+
41
+ If you find this work useful in your research, please consider citing:
42
+
43
+ ```bibtex
44
+ @misc{liu2025delvinglatentspectralbiasing,
45
+ title={Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability},
46
+ author={Shizhan Liu and Xinran Deng and Zhuoyi Yang and Jiayan Teng and Xiaotao Gu and Jie Tang},
47
+ year={2025},
48
+ eprint={2512.05394},
49
+ archivePrefix={arXiv},
50
+ primaryClass={cs.CV},
51
+ url={https://arxiv.org/abs/2512.05394},
52
+ }
53
+ ```
ch48_256p_15w_512p_5w.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:49a354e836ac6124f7a1564a29def48bc7b938368aad53a52cc63ca45decba57
3
+ size 1382929206