Spaces:

multimodalart
/

sana-wm

Running on Zero

App Files Files Community

sana-wm / README.md

multimodalart HF Staff

Update README.md

3e38a22 verified 7 days ago

preview code

raw

history blame contribute delete

1.59 kB

A newer version of the Gradio SDK is available: 6.15.0

Upgrade

metadata

title: SANA-WM Camera-Controlled World Model
emoji: 🌍
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.14.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Image-to-video with 6-DoF camera control.
models:
  - Efficient-Large-Model/SANA-WM_bidirectional
  - yyfz233/Pi3X
  - google/gemma-2-2b-it
suggested_hardware: zero-a10g
header: default

SANA-WM — Camera-Controlled World Model (ZeroGPU)

Demo of Efficient-Large-Model/SANA-WM_bidirectional from the NVlabs/Sana project (feat/sana-wm PR branch).

Upload a first frame + write a prompt.
Build a camera trajectory with the W A S D / I J K L action queue (each tap appends a <keys>-<frames> segment to the DSL).
The Sana DiT samples a (704, 1280) latent video conditioned on your rolled-out 6-DoF camera trajectory, then the Sana VAE decodes it.

The full pipeline ships an LTX-2 sink-bidirectional Euler refiner that adds ~87 GB of weights. This Space runs Stage-1 only (--no_refiner) to fit ZeroGPU; for refined output, run the CLI offline.

Build notes

The Sana repo is vendored under ./Sana/ and prepended to sys.path.
flash_attn is stubbed at startup — SANA-WM only uses the Triton GDN path, but a few Sana modules do a top-level from flash_attn import ….
Camera intrinsics are estimated with Pi3X from the input image; pass --intrinsics in the CLI for accurate values.