Instructions to use Efficient-Large-Model/SANA-WM_bidirectional with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use Efficient-Large-Model/SANA-WM_bidirectional with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("Efficient-Large-Model/SANA-WM_bidirectional", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- text-to-video
|
| 5 |
+
- image-to-video
|
| 6 |
+
- camera-control
|
| 7 |
+
- diffusion
|
| 8 |
+
library_name: NVlabs-Sana
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# SANA-WM (Bidirectional)
|
| 12 |
+
|
| 13 |
+
A 2.6 B parameter image-to-video diffusion model conditioned on a per-frame
|
| 14 |
+
camera trajectory, paired with the LTX-2 sink-bidirectional Euler refiner
|
| 15 |
+
for high-fidelity decoding.
|
| 16 |
+
|
| 17 |
+
| Component | Path in repo | Size |
|
| 18 |
+
|----------------------------|-------------------------------------------|-------|
|
| 19 |
+
| Sana DiT (Stage 1) | `dit/sana_wm_1600m_720p.safetensors` | 10 GB |
|
| 20 |
+
| LTX-2 VAE (diffusers) | `vae/` | 2 GB |
|
| 21 |
+
| LTX-2 refiner (Stage 2) | `refiner/refiner.safetensors` | 41 GB |
|
| 22 |
+
| Gemma text encoder for the refiner | `refiner/text_encoder/` | 46 GB |
|
| 23 |
+
| Inference config | `config.yaml` | |
|
| 24 |
+
|
| 25 |
+
The Sana text encoder (`gemma-2-2b-it`) is **not** bundled here — it is
|
| 26 |
+
fetched on demand from `Efficient-Large-Model/gemma-2-2b-it`.
|
| 27 |
+
|
| 28 |
+
## Usage
|
| 29 |
+
|
| 30 |
+
Install the inference repo and run:
|
| 31 |
+
|
| 32 |
+
```bash
|
| 33 |
+
python inference_video_scripts/inference_sana_wm.py \
|
| 34 |
+
--image examples/scene/first_frame.png \
|
| 35 |
+
--prompt examples/scene/prompt.txt \
|
| 36 |
+
--camera examples/scene/camera.npy \
|
| 37 |
+
--intrinsics examples/scene/intrinsics.npy \
|
| 38 |
+
--output_dir results/demo
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
Weights are fetched from this repository on first use. Pass `--use_refiner`
|
| 42 |
+
to enable the Stage-2 LTX-2 refiner; without it, the Sana VAE decodes the
|
| 43 |
+
Stage-1 latents directly. To run entirely offline, override any of
|
| 44 |
+
`--config` / `--model_path` / `--refiner_checkpoint` / `--refiner_gemma_root`
|
| 45 |
+
with local paths.
|
| 46 |
+
|
| 47 |
+
## Inputs
|
| 48 |
+
|
| 49 |
+
| Argument | Format |
|
| 50 |
+
|-------------------|-----------------------------------------------------------------------------------------|
|
| 51 |
+
| `--image` | RGB image (any PIL-readable format) — used as the first frame. |
|
| 52 |
+
| `--prompt` | UTF-8 text file containing the conditioning prompt. |
|
| 53 |
+
| `--camera` | NumPy `.npy`, shape `(F, 4, 4)`, camera-to-world matrices for `F = --num_frames`. |
|
| 54 |
+
| `--intrinsics` | NumPy `.npy`, shape `(3, 3)`, `(F, 3, 3)`, or `(4,) = (fx, fy, cx, cy)` in input pixels.|
|
| 55 |
+
|
| 56 |
+
## License
|
| 57 |
+
|
| 58 |
+
Released under the Apache 2.0 license. The refiner inherits the LTX-2
|
| 59 |
+
upstream license; see the parent NVlabs-Sana repository for details.
|