Instructions to use Efficient-Large-Model/SANA-WM_bidirectional with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Efficient-Large-Model/SANA-WM_bidirectional with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Efficient-Large-Model/SANA-WM_bidirectional", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -6,7 +6,6 @@ tags:
|
|
| 6 |
- camera-control
|
| 7 |
- world-model
|
| 8 |
- diffusion
|
| 9 |
-
library_name: NVlabs-Sana
|
| 10 |
---
|
| 11 |
|
| 12 |
# SANA-WM (Bidirectional)
|
|
@@ -33,9 +32,9 @@ Four core designs drive the architecture:
|
|
| 33 |
Paper: <https://arxiv.org/abs/2605.15178>
|
| 34 |
|
| 35 |
```bibtex
|
| 36 |
-
@article{
|
| 37 |
title = {{SANA-WM}: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer},
|
| 38 |
-
author = {
|
| 39 |
journal = {arXiv preprint arXiv:2605.15178},
|
| 40 |
year = {2026},
|
| 41 |
}
|
|
@@ -52,13 +51,10 @@ Paper: <https://arxiv.org/abs/2605.15178>
|
|
| 52 |
| Inference config | `config.yaml` | — |
|
| 53 |
|
| 54 |
The Sana text encoder (`gemma-2-2b-it`) is **not** bundled here — it is
|
| 55 |
-
fetched on demand from
|
| 56 |
|
| 57 |
## Usage
|
| 58 |
|
| 59 |
-
Install the inference repo (see [environment_setup_sana_wm.sh](https://github.com/NVlabs/Sana/blob/main/environment_setup_sana_wm.sh))
|
| 60 |
-
and run:
|
| 61 |
-
|
| 62 |
```bash
|
| 63 |
python inference_video_scripts/inference_sana_wm.py \
|
| 64 |
--image asset/sana_wm/demo_0.png \
|
|
@@ -91,5 +87,4 @@ aspect-preserving resized + center-cropped to that resolution.
|
|
| 91 |
## License
|
| 92 |
|
| 93 |
Released under the Apache 2.0 license. The bundled LTX-2 refiner and VAE
|
| 94 |
-
inherit the LTX-2 upstream license
|
| 95 |
-
repository for details.
|
|
|
|
| 6 |
- camera-control
|
| 7 |
- world-model
|
| 8 |
- diffusion
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
# SANA-WM (Bidirectional)
|
|
|
|
| 32 |
Paper: <https://arxiv.org/abs/2605.15178>
|
| 33 |
|
| 34 |
```bibtex
|
| 35 |
+
@article{sanawm2026,
|
| 36 |
title = {{SANA-WM}: Efficient Minute-Scale World Modeling with Hybrid Linear Diffusion Transformer},
|
| 37 |
+
author = {Anonymous},
|
| 38 |
journal = {arXiv preprint arXiv:2605.15178},
|
| 39 |
year = {2026},
|
| 40 |
}
|
|
|
|
| 51 |
| Inference config | `config.yaml` | — |
|
| 52 |
|
| 53 |
The Sana text encoder (`gemma-2-2b-it`) is **not** bundled here — it is
|
| 54 |
+
fetched on demand from the public Hugging Face mirror.
|
| 55 |
|
| 56 |
## Usage
|
| 57 |
|
|
|
|
|
|
|
|
|
|
| 58 |
```bash
|
| 59 |
python inference_video_scripts/inference_sana_wm.py \
|
| 60 |
--image asset/sana_wm/demo_0.png \
|
|
|
|
| 87 |
## License
|
| 88 |
|
| 89 |
Released under the Apache 2.0 license. The bundled LTX-2 refiner and VAE
|
| 90 |
+
inherit the LTX-2 upstream license.
|
|
|