# Deploying daVinci-MagiHuman to Hugging Face Spaces ## Overview The deployment uses 3 files: - **`app.py`** — Gradio frontend + model download + inference pipeline - **`Dockerfile`** — Based on `sandai/magi-compiler:latest` (includes MagiCompiler + Flash Attention) - **`requirements.txt`** / **`requirements-nodeps.txt`** — Python dependencies All model weights are downloaded automatically from HF Hub at startup: | HF Repo | Contents | ~Size | |---------|----------|-------| | `GAIR-NLP/daVinci-MagiHuman` | `distill/`, `turbo_vae/` | ~30GB | | `stabilityai/stable-audio-open-1.0` | Audio VAE | ~2GB | | `google/t5gemma-9b-9b-ul2` | Text encoder | ~18GB | | `Wan-AI/Wan2.2-TI2V-5B` | Video VAE | ~10GB | ## Step-by-step ### 1. Create HF Space Via CLI: ```bash pip install huggingface_hub[cli] huggingface-cli login huggingface-cli repo create SII-GAIR/daVinci-MagiHuman \ --type space --space-sdk docker --space-hardware a100-large ``` Or via HF web UI: - Go to huggingface.co → New Space - SDK: **Docker** - Hardware: **A100 Large (80GB)** ### 2. Enable persistent storage In Space Settings → **Persistent storage** → Enable. This stores downloaded models in `/data/` so they survive container restarts. Without it, every restart re-downloads ~60GB of weights. ### 3. Add secrets (if needed) In Space Settings → **Repository secrets**, add: - `HF_TOKEN` — your HF access token (required if any model repo is gated/private) ### 4. Push code to the Space ```bash cd /path/to/daVinci-MagiHuman # Add the Space as a git remote git remote add space https://huggingface.co/spaces/SII-GAIR/daVinci-MagiHuman # Push needed files git add app.py Dockerfile requirements.txt requirements-nodeps.txt inference/ example/ git commit -m "Add Gradio app for HF Spaces deployment" git push space main ``` ### 5. Monitor build & startup - Go to your Space page → **Logs** tab - **Build phase** (~5–10 min): Docker image build, pip install - **Startup phase** (~10–20 min first time): model downloads from HF Hub - **Subsequent restarts** (~2–5 min): models cached in persistent storage, only pipeline init ## What happens at startup ``` Container starts ↓ app.py runs download_models() ├─ GAIR-NLP/daVinci-MagiHuman → /data/models/distill/, /data/models/turbo_vae/ ├─ stabilityai/stable-audio-open-1.0 → /data/models/audio/ ├─ google/t5gemma-9b-9b-ul2 → /data/models/t5/t5gemma-9b-9b-ul2/ └─ Wan-AI/Wan2.2-TI2V-5B → /data/models/wan_vae/Wan2.2-TI2V-5B/ ↓ Simulates single-GPU distributed env (RANK=0, WORLD_SIZE=1) ↓ initialize_infra() → loads DiT model to GPU ↓ MagiPipeline() → loads VAE, Audio VAE, T5-Gemma, TurboVAED ↓ Gradio server starts on :7860 ``` ## Architecture notes - **Distilled model**: 8 denoising steps (vs 32 for base), no CFG → ~4x faster - **Resolution**: 448×256 base - **Inference speed**: ~2s for 5s video on H100 - **Audio**: generated jointly with video via the single-stream Transformer ## Cost - HF Spaces A100-80GB: ~$4.13/hr - Enable "Sleep after N minutes of inactivity" in Space settings to reduce costs - Persistent storage: $0.10/GB/month (small cost, big time saving) ## Local testing ```bash # Models will be downloaded to /data/models by default. # Override with MODEL_ROOT if you have them locally: export MODEL_ROOT=/path/to/your/checkpoints python app.py # Open http://localhost:7860 ``` ## Troubleshooting | Issue | Fix | |-------|-----| | OOM on A100-40GB | Use A100-80GB; model needs ~60GB peak | | Slow first start | Enable persistent storage to cache weights | | `magi_compiler` import error | Ensure Dockerfile uses `sandai/magi-compiler:latest` | | `flash_attn` import error | Same — included in the base image | | Download fails for gated repo | Add `HF_TOKEN` secret, accept model license on HF |