# Deploying daVinci-MagiHuman to Hugging Face Spaces

## Overview

The deployment uses 3 files:
- **`app.py`** — Gradio frontend + model download + inference pipeline
- **`Dockerfile`** — Based on `sandai/magi-compiler:latest` (includes MagiCompiler + Flash Attention)
- **`requirements.txt`** / **`requirements-nodeps.txt`** — Python dependencies

All model weights are downloaded automatically from HF Hub at startup:

| HF Repo | Contents | ~Size |
|---------|----------|-------|
| `GAIR-NLP/daVinci-MagiHuman` | `distill/`, `turbo_vae/` | ~30GB |
| `stabilityai/stable-audio-open-1.0` | Audio VAE | ~2GB |
| `google/t5gemma-9b-9b-ul2` | Text encoder | ~18GB |
| `Wan-AI/Wan2.2-TI2V-5B` | Video VAE | ~10GB |

## Step-by-step

### 1. Create HF Space

Via CLI:
```bash
pip install huggingface_hub[cli]
huggingface-cli login

huggingface-cli repo create SII-GAIR/daVinci-MagiHuman \
  --type space --space-sdk docker --space-hardware a100-large
```

Or via HF web UI:
- Go to huggingface.co → New Space
- SDK: **Docker**
- Hardware: **A100 Large (80GB)**

### 2. Enable persistent storage

In Space Settings → **Persistent storage** → Enable.

This stores downloaded models in `/data/` so they survive container restarts.
Without it, every restart re-downloads ~60GB of weights.

### 3. Add secrets (if needed)

In Space Settings → **Repository secrets**, add:
- `HF_TOKEN` — your HF access token (required if any model repo is gated/private)

### 4. Push code to the Space

```bash
cd /path/to/daVinci-MagiHuman

# Add the Space as a git remote
git remote add space https://huggingface.co/spaces/SII-GAIR/daVinci-MagiHuman

# Push needed files
git add app.py Dockerfile requirements.txt requirements-nodeps.txt inference/ example/
git commit -m "Add Gradio app for HF Spaces deployment"
git push space main
```

### 5. Monitor build & startup

- Go to your Space page → **Logs** tab
- **Build phase** (~5–10 min): Docker image build, pip install
- **Startup phase** (~10–20 min first time): model downloads from HF Hub
- **Subsequent restarts** (~2–5 min): models cached in persistent storage, only pipeline init

## What happens at startup

```
Container starts
  ↓
app.py runs download_models()
  ├─ GAIR-NLP/daVinci-MagiHuman  → /data/models/distill/, /data/models/turbo_vae/
  ├─ stabilityai/stable-audio-open-1.0  → /data/models/audio/
  ├─ google/t5gemma-9b-9b-ul2  → /data/models/t5/t5gemma-9b-9b-ul2/
  └─ Wan-AI/Wan2.2-TI2V-5B  → /data/models/wan_vae/Wan2.2-TI2V-5B/
  ↓
Simulates single-GPU distributed env (RANK=0, WORLD_SIZE=1)
  ↓
initialize_infra() → loads DiT model to GPU
  ↓
MagiPipeline() → loads VAE, Audio VAE, T5-Gemma, TurboVAED
  ↓
Gradio server starts on :7860
```

## Architecture notes

- **Distilled model**: 8 denoising steps (vs 32 for base), no CFG → ~4x faster
- **Resolution**: 448×256 base
- **Inference speed**: ~2s for 5s video on H100
- **Audio**: generated jointly with video via the single-stream Transformer

## Cost

- HF Spaces A100-80GB: ~$4.13/hr
- Enable "Sleep after N minutes of inactivity" in Space settings to reduce costs
- Persistent storage: $0.10/GB/month (small cost, big time saving)

## Local testing

```bash
# Models will be downloaded to /data/models by default.
# Override with MODEL_ROOT if you have them locally:
export MODEL_ROOT=/path/to/your/checkpoints

python app.py
# Open http://localhost:7860
```

## Troubleshooting

| Issue | Fix |
|-------|-----|
| OOM on A100-40GB | Use A100-80GB; model needs ~60GB peak |
| Slow first start | Enable persistent storage to cache weights |
| `magi_compiler` import error | Ensure Dockerfile uses `sandai/magi-compiler:latest` |
| `flash_attn` import error | Same — included in the base image |
| Download fails for gated repo | Add `HF_TOKEN` secret, accept model license on HF |