sana-wm / README.md
multimodalart's picture
multimodalart HF Staff
Update README.md
3e38a22 verified

A newer version of the Gradio SDK is available: 6.15.0

Upgrade
metadata
title: SANA-WM Camera-Controlled World Model
emoji: 🌍
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 6.14.0
app_file: app.py
pinned: false
license: apache-2.0
short_description: Image-to-video with 6-DoF camera control.
models:
  - Efficient-Large-Model/SANA-WM_bidirectional
  - yyfz233/Pi3X
  - google/gemma-2-2b-it
suggested_hardware: zero-a10g
header: default

SANA-WM — Camera-Controlled World Model (ZeroGPU)

Demo of Efficient-Large-Model/SANA-WM_bidirectional from the NVlabs/Sana project (feat/sana-wm PR branch).

  • Upload a first frame + write a prompt.
  • Build a camera trajectory with the W A S D / I J K L action queue (each tap appends a <keys>-<frames> segment to the DSL).
  • The Sana DiT samples a (704, 1280) latent video conditioned on your rolled-out 6-DoF camera trajectory, then the Sana VAE decodes it.

The full pipeline ships an LTX-2 sink-bidirectional Euler refiner that adds ~87 GB of weights. This Space runs Stage-1 only (--no_refiner) to fit ZeroGPU; for refined output, run the CLI offline.

Build notes

  • The Sana repo is vendored under ./Sana/ and prepended to sys.path.
  • flash_attn is stubbed at startup — SANA-WM only uses the Triton GDN path, but a few Sana modules do a top-level from flash_attn import ….
  • Camera intrinsics are estimated with Pi3X from the input image; pass --intrinsics in the CLI for accurate values.