Spaces:
Running on Zero
Running on Zero
A newer version of the Gradio SDK is available: 6.15.0
metadata
title: SANA-WM Camera-Controlled World Model
emoji: 🌍
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.14.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Image-to-video with 6-DoF camera control.
models:
- Efficient-Large-Model/SANA-WM_bidirectional
- yyfz233/Pi3X
- google/gemma-2-2b-it
suggested_hardware: zero-a10g
header: default
SANA-WM — Camera-Controlled World Model (ZeroGPU)
Demo of Efficient-Large-Model/SANA-WM_bidirectional
from the NVlabs/Sana project (feat/sana-wm PR branch).
- Upload a first frame + write a prompt.
- Build a camera trajectory with the W A S D / I J K L action queue
(each tap appends a
<keys>-<frames>segment to the DSL). - The Sana DiT samples a
(704, 1280)latent video conditioned on your rolled-out 6-DoF camera trajectory, then the Sana VAE decodes it.
The full pipeline ships an LTX-2 sink-bidirectional Euler refiner that
adds ~87 GB of weights. This Space runs Stage-1 only (--no_refiner)
to fit ZeroGPU; for refined output, run the CLI offline.
Build notes
- The Sana repo is vendored under
./Sana/and prepended tosys.path. flash_attnis stubbed at startup — SANA-WM only uses the Triton GDN path, but a few Sana modules do a top-levelfrom flash_attn import ….- Camera intrinsics are estimated with Pi3X from the input image; pass
--intrinsicsin the CLI for accurate values.