HaoyiZhu commited on
Commit
4b2d932
·
verified ·
1 Parent(s): f6ea8dc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +4 -9
README.md CHANGED
@@ -6,7 +6,6 @@ tags:
6
  - camera-control
7
  - world-model
8
  - diffusion
9
- library_name: NVlabs-Sana
10
  ---
11
 
12
  # SANA-WM (Bidirectional)
@@ -33,9 +32,9 @@ Four core designs drive the architecture:
33
  Paper: <https://arxiv.org/abs/2605.15178>
34
 
35
  ```bibtex
36
- @article{zhu2026sanawm,
37
  title = {{SANA-WM}: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer},
38
- author = {Zhu, Haoyi and Liu, Haozhe and Zhao, Yuyang and Ye, Tian and Chen, Junsong and Yu, Jincheng and He, Tong and Han, Song and Xie, Enze},
39
  journal = {arXiv preprint arXiv:2605.15178},
40
  year = {2026},
41
  }
@@ -52,13 +51,10 @@ Paper: <https://arxiv.org/abs/2605.15178>
52
  | Inference config | `config.yaml` | — |
53
 
54
  The Sana text encoder (`gemma-2-2b-it`) is **not** bundled here — it is
55
- fetched on demand from `Efficient-Large-Model/gemma-2-2b-it`.
56
 
57
  ## Usage
58
 
59
- Install the inference repo (see [environment_setup_sana_wm.sh](https://github.com/NVlabs/Sana/blob/main/environment_setup_sana_wm.sh))
60
- and run:
61
-
62
  ```bash
63
  python inference_video_scripts/inference_sana_wm.py \
64
  --image asset/sana_wm/demo_0.png \
@@ -91,5 +87,4 @@ aspect-preserving resized + center-cropped to that resolution.
91
  ## License
92
 
93
  Released under the Apache 2.0 license. The bundled LTX-2 refiner and VAE
94
- inherit the LTX-2 upstream license; see the parent NVlabs-Sana
95
- repository for details.
 
6
  - camera-control
7
  - world-model
8
  - diffusion
 
9
  ---
10
 
11
  # SANA-WM (Bidirectional)
 
32
  Paper: <https://arxiv.org/abs/2605.15178>
33
 
34
  ```bibtex
35
+ @article{sanawm2026,
36
  title = {{SANA-WM}: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer},
37
+ author = {Anonymous},
38
  journal = {arXiv preprint arXiv:2605.15178},
39
  year = {2026},
40
  }
 
51
  | Inference config | `config.yaml` | — |
52
 
53
  The Sana text encoder (`gemma-2-2b-it`) is **not** bundled here — it is
54
+ fetched on demand from the public Hugging Face mirror.
55
 
56
  ## Usage
57
 
 
 
 
58
  ```bash
59
  python inference_video_scripts/inference_sana_wm.py \
60
  --image asset/sana_wm/demo_0.png \
 
87
  ## License
88
 
89
  Released under the Apache 2.0 license. The bundled LTX-2 refiner and VAE
90
+ inherit the LTX-2 upstream license.