Spaces:

SII-GAIR
/

daVinci-MagiHuman

Paused

App Files Files Community

jiadisu commited on about 1 month ago

Commit

febacca

1 Parent(s): 873b6ec

init space

Browse files

Files changed (4) hide show

.gitignore +1 -1
Dockerfile +51 -0
README_DEPLOY.md +122 -0
app.py +251 -0

.gitignore CHANGED Viewed

@@ -1,7 +1,7 @@
 tmp*
 depyf
 torch_compile_cache
 __pycache__
 *.so
 build

 tmp*
 depyf
 torch_compile_cache
+venv/
 __pycache__
 *.so
 build

Dockerfile ADDED Viewed

	@@ -0,0 +1,51 @@

+# =============================================================================
+# HF Spaces Docker image for daVinci-MagiHuman
+# Hardware: A100-80GB (or H100)
+# =============================================================================
+# Based on the official MagiCompiler image which includes:
+#   - CUDA 12.4, cuDNN, Python 3.12, PyTorch 2.9
+#   - MagiCompiler (pre-installed)
+#   - Flash Attention 3 (Hopper) (pre-installed)
+# =============================================================================
+FROM sandai/magi-compiler:latest
+ENV DEBIAN_FRONTEND=noninteractive
+ENV PYTHONUNBUFFERED=1
+ENV GRADIO_SERVER_NAME=0.0.0.0
+ENV GRADIO_SERVER_PORT=7860
+# System deps needed for audio/video processing
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ffmpeg libsndfile1 && \
+    rm -rf /var/lib/apt/lists/*
+WORKDIR /app
+# ---------------------------------------------------------------------------
+# Python dependencies
+# ---------------------------------------------------------------------------
+COPY requirements.txt requirements-nodeps.txt ./
+RUN pip install --no-cache-dir -r requirements.txt && \
+    pip install --no-cache-dir --no-deps -r requirements-nodeps.txt && \
+    pip install --no-cache-dir gradio huggingface_hub soundfile
+# ---------------------------------------------------------------------------
+# Project code
+# ---------------------------------------------------------------------------
+COPY inference/ inference/
+COPY example/ example/
+COPY app.py .
+# ---------------------------------------------------------------------------
+# Model weights are downloaded at runtime from HF Hub.
+# Set HF_TOKEN as a Space secret if any repos are gated/private.
+#
+# Persistent storage (/data) is recommended on HF Spaces so weights survive
+# container restarts. Enable it in Space settings → "Persistent storage".
+# ---------------------------------------------------------------------------
+ENV MODEL_ROOT=/data/models
+# HF Spaces requires the app to listen on port 7860
+EXPOSE 7860
+CMD ["python", "app.py"]

README_DEPLOY.md ADDED Viewed

	@@ -0,0 +1,122 @@

+# Deploying daVinci-MagiHuman to Hugging Face Spaces
+## Overview
+The deployment uses 3 files:
+- **`app.py`** — Gradio frontend + model download + inference pipeline
+- **`Dockerfile`** — Based on `sandai/magi-compiler:latest` (includes MagiCompiler + Flash Attention)
+- **`requirements.txt`** / **`requirements-nodeps.txt`** — Python dependencies
+All model weights are downloaded automatically from HF Hub at startup:
+| HF Repo | Contents | ~Size |
+|---------|----------|-------|
+| `GAIR-NLP/daVinci-MagiHuman` | `distill/`, `turbo_vae/` | ~30GB |
+| `stabilityai/stable-audio-open-1.0` | Audio VAE | ~2GB |
+| `google/t5gemma-9b-9b-ul2` | Text encoder | ~18GB |
+| `Wan-AI/Wan2.2-TI2V-5B` | Video VAE | ~10GB |
+## Step-by-step
+### 1. Create HF Space
+Via CLI:
+```bash
+pip install huggingface_hub[cli]
+huggingface-cli login
+huggingface-cli repo create SII-GAIR/daVinci-MagiHuman \
+  --type space --space-sdk docker --space-hardware a100-large
+```
+Or via HF web UI:
+- Go to huggingface.co → New Space
+- SDK: **Docker**
+- Hardware: **A100 Large (80GB)**
+### 2. Enable persistent storage
+In Space Settings → **Persistent storage** → Enable.
+This stores downloaded models in `/data/` so they survive container restarts.
+Without it, every restart re-downloads ~60GB of weights.
+### 3. Add secrets (if needed)
+In Space Settings → **Repository secrets**, add:
+- `HF_TOKEN` — your HF access token (required if any model repo is gated/private)
+### 4. Push code to the Space
+```bash
+cd /path/to/daVinci-MagiHuman
+# Add the Space as a git remote
+git remote add space https://huggingface.co/spaces/SII-GAIR/daVinci-MagiHuman
+# Push needed files
+git add app.py Dockerfile requirements.txt requirements-nodeps.txt inference/ example/
+git commit -m "Add Gradio app for HF Spaces deployment"
+git push space main
+```
+### 5. Monitor build & startup
+- Go to your Space page → **Logs** tab
+- **Build phase** (~5–10 min): Docker image build, pip install
+- **Startup phase** (~10–20 min first time): model downloads from HF Hub
+- **Subsequent restarts** (~2–5 min): models cached in persistent storage, only pipeline init
+## What happens at startup
+```
+Container starts
+  ↓
+app.py runs download_models()
+  ├─ GAIR-NLP/daVinci-MagiHuman  → /data/models/distill/, /data/models/turbo_vae/
+  ├─ stabilityai/stable-audio-open-1.0  → /data/models/audio/
+  ├─ google/t5gemma-9b-9b-ul2  → /data/models/t5/t5gemma-9b-9b-ul2/
+  └─ Wan-AI/Wan2.2-TI2V-5B  → /data/models/wan_vae/Wan2.2-TI2V-5B/
+  ↓
+Simulates single-GPU distributed env (RANK=0, WORLD_SIZE=1)
+  ↓
+initialize_infra() → loads DiT model to GPU
+  ↓
+MagiPipeline() → loads VAE, Audio VAE, T5-Gemma, TurboVAED
+  ↓
+Gradio server starts on :7860
+```
+## Architecture notes
+- **Distilled model**: 8 denoising steps (vs 32 for base), no CFG → ~4x faster
+- **Resolution**: 448×256 base
+- **Inference speed**: ~2s for 5s video on H100
+- **Audio**: generated jointly with video via the single-stream Transformer
+## Cost
+- HF Spaces A100-80GB: ~$4.13/hr
+- Enable "Sleep after N minutes of inactivity" in Space settings to reduce costs
+- Persistent storage: $0.10/GB/month (small cost, big time saving)
+## Local testing
+```bash
+# Models will be downloaded to /data/models by default.
+# Override with MODEL_ROOT if you have them locally:
+export MODEL_ROOT=/path/to/your/checkpoints
+python app.py
+# Open http://localhost:7860
+```
+## Troubleshooting
+| Issue | Fix |
+|-------|-----|
+| OOM on A100-40GB | Use A100-80GB; model needs ~60GB peak |
+| Slow first start | Enable persistent storage to cache weights |
+| `magi_compiler` import error | Ensure Dockerfile uses `sandai/magi-compiler:latest` |
+| `flash_attn` import error | Same — included in the base image |
+| Download fails for gated repo | Add `HF_TOKEN` secret, accept model license on HF |

app.py ADDED Viewed

	@@ -0,0 +1,251 @@

+#!/usr/bin/env python3
+"""
+Gradio frontend for daVinci-MagiHuman distilled model.
+Designed for Hugging Face Spaces (A100-80GB GPU).
+Accepts an image + text prompt + duration, generates audio-video output.
+"""
+import json
+import os
+import sys
+import tempfile
+import uuid
+# ---------------------------------------------------------------------------
+# 1. Download all model weights from HF Hub (runs once, cached afterwards)
+# ---------------------------------------------------------------------------
+# HF Spaces persistent storage: /data  (survives restarts if enabled)
+# Fallback to /tmp/models if /data is not available.
+MODEL_ROOT = os.environ.get("MODEL_ROOT", "/data/models")
+os.makedirs(MODEL_ROOT, exist_ok=True)
+# HF repo → local sub-directory mapping
+HF_REPOS = {
+    # Project's own weights
+    "GAIR-NLP/daVinci-MagiHuman": {
+        "subdir": ".",           # download to MODEL_ROOT root
+        "allow_patterns": [
+            "distill/**",
+            "turbo_vae/**",
+        ],
+    },
+    # Third-party open-source models
+    "stabilityai/stable-audio-open-1.0": {
+        "subdir": "audio",
+    },
+    "google/t5gemma-9b-9b-ul2": {
+        "subdir": "t5/t5gemma-9b-9b-ul2",
+    },
+    "Wan-AI/Wan2.2-TI2V-5B": {
+        "subdir": "wan_vae/Wan2.2-TI2V-5B",
+    },
+}
+def download_models():
+    """Download all required model weights from HF Hub."""
+    from huggingface_hub import snapshot_download
+    hf_token = os.environ.get("HF_TOKEN")
+    for repo_id, spec in HF_REPOS.items():
+        local_dir = os.path.join(MODEL_ROOT, spec["subdir"])
+        # Simple check: if directory already has files, skip download
+        if os.path.isdir(local_dir) and os.listdir(local_dir):
+            print(f"[download] {repo_id} → {local_dir}  (already cached, skipping)")
+            continue
+        print(f"[download] {repo_id} → {local_dir}  (downloading …)")
+        os.makedirs(local_dir, exist_ok=True)
+        kwargs = {
+            "repo_id": repo_id,
+            "local_dir": local_dir,
+            "token": hf_token,
+        }
+        if "allow_patterns" in spec:
+            kwargs["allow_patterns"] = spec["allow_patterns"]
+        snapshot_download(**kwargs)
+        print(f"[download] {repo_id} done.")
+    print("[download] All models ready.")
+print("[app] Checking / downloading model weights …")
+download_models()
+# ---------------------------------------------------------------------------
+# 2. Environment bootstrap – must happen BEFORE any inference imports
+# ---------------------------------------------------------------------------
+# HF Spaces launches a single process; we simulate the minimal distributed
+# environment that the pipeline expects (world_size=1, rank=0).
+os.environ.setdefault("MASTER_ADDR", "localhost")
+os.environ.setdefault("MASTER_PORT", "29500")
+os.environ.setdefault("RANK", "0")
+os.environ.setdefault("WORLD_SIZE", "1")
+os.environ.setdefault("LOCAL_RANK", "0")
+os.environ.setdefault("PYTORCH_CUDA_ALLOC_CONF", "expandable_segments:True")
+# Project root must be on sys.path so `inference.*` imports resolve.
+PROJECT_ROOT = os.path.dirname(os.path.abspath(__file__))
+if PROJECT_ROOT not in sys.path:
+    sys.path.insert(0, PROJECT_ROOT)
+# Build the config JSON that maps to the downloaded paths.
+CONFIG_OVERRIDES = {
+    "engine_config": {
+        "load": os.path.join(MODEL_ROOT, "distill"),
+        "distill": True,
+        "cp_size": 1,
+    },
+    "evaluation_config": {
+        "cfg_number": 1,
+        "num_inference_steps": 8,
+        "audio_model_path": os.path.join(MODEL_ROOT, "audio"),
+        "txt_model_path": os.path.join(MODEL_ROOT, "t5/t5gemma-9b-9b-ul2"),
+        "vae_model_path": os.path.join(MODEL_ROOT, "wan_vae/Wan2.2-TI2V-5B"),
+        "use_turbo_vae": True,
+        "student_config_path": os.path.join(MODEL_ROOT, "turbo_vae/TurboV3-Wan22-TinyShallow_7_7.json"),
+        "student_ckpt_path": os.path.join(MODEL_ROOT, "turbo_vae/checkpoint-340000.ckpt"),
+    },
+}
+# Write a temporary config JSON that parse_config() can pick up via CLI args.
+_tmp_config = os.path.join(tempfile.gettempdir(), "magihuman_config.json")
+with open(_tmp_config, "w") as f:
+    json.dump(CONFIG_OVERRIDES, f)
+# Inject the config path into sys.argv so that parse_config() finds it.
+sys.argv = [sys.argv[0], "--config-load-path", _tmp_config]
+# ---------------------------------------------------------------------------
+# 3. Initialize infrastructure & build pipeline (runs once at startup)
+# ---------------------------------------------------------------------------
+import gradio as gr
+import torch  # noqa: E402
+from inference.infra import initialize_infra
+from inference.common import parse_config
+from inference.model.dit import get_dit
+from inference.pipeline.pipeline import MagiPipeline
+print("[app] Initializing infrastructure …")
+initialize_infra()
+print("[app] Loading model …")
+config = parse_config()
+model = get_dit(config.arch_config, config.engine_config)
+pipeline = MagiPipeline(model, config.evaluation_config)
+print("[app] Pipeline ready.")
+# ---------------------------------------------------------------------------
+# 4. Inference wrapper
+# ---------------------------------------------------------------------------
+def generate_video(
+    image,
+    prompt: str,
+    seconds: int,
+    seed: int,
+):
+    """Called by Gradio – returns path to the output .mp4 file."""
+    if image is None:
+        raise gr.Error("Please upload a reference image.")
+    if not prompt or not prompt.strip():
+        raise gr.Error("Please enter a text prompt.")
+    # Gradio passes a filepath (str) for gr.Image(type="filepath")
+    image_path = image
+    output_dir = tempfile.mkdtemp(prefix="magihuman_")
+    save_prefix = os.path.join(output_dir, f"output_{uuid.uuid4().hex[:8]}")
+    result_path = pipeline.run_offline(
+        prompt=prompt,
+        image=image_path,
+        audio=None,
+        save_path_prefix=save_prefix,
+        seed=int(seed),
+        seconds=int(seconds),
+        br_width=448,
+        br_height=256,
+    )
+    return result_path
+# ---------------------------------------------------------------------------
+# 5. Gradio UI
+# ---------------------------------------------------------------------------
+TITLE = "daVinci-MagiHuman – Audio-Video Generation"
+DESCRIPTION = (
+    "Upload a reference image, enter a descriptive prompt, choose the video "
+    "duration (4–10 s), and click **Generate**. The model produces a video "
+    "with synchronized audio.\n\n"
+    "**Model**: 15B single-stream Transformer (distilled, 8-step inference) "
+    "| **Resolution**: 448×256 | **FPS**: 25"
+)
+with gr.Blocks(title=TITLE, theme=gr.themes.Soft()) as demo:
+    gr.Markdown(f"# {TITLE}")
+    gr.Markdown(DESCRIPTION)
+    with gr.Row():
+        with gr.Column(scale=1):
+            image_input = gr.Image(
+                label="Reference Image",
+                type="filepath",
+                height=300,
+            )
+            prompt_input = gr.Textbox(
+                label="Prompt",
+                placeholder="Describe the scene you want to generate …",
+                lines=4,
+            )
+            with gr.Row():
+                seconds_slider = gr.Slider(
+                    minimum=4,
+                    maximum=10,
+                    step=1,
+                    value=4,
+                    label="Duration (seconds)",
+                )
+                seed_input = gr.Number(
+                    value=42,
+                    label="Seed",
+                    precision=0,
+                )
+            generate_btn = gr.Button("Generate", variant="primary")
+        with gr.Column(scale=1):
+            video_output = gr.Video(label="Generated Video")
+    generate_btn.click(
+        fn=generate_video,
+        inputs=[image_input, prompt_input, seconds_slider, seed_input],
+        outputs=[video_output],
+    )
+    # Pre-loaded example (uses bundled assets from the repo)
+    example_prompt_path = os.path.join(PROJECT_ROOT, "example/assets/prompt.txt")
+    example_prompt = "A person talking in a living room."
+    if os.path.exists(example_prompt_path):
+        with open(example_prompt_path) as f:
+            example_prompt = f.read().strip()
+    example_image_path = os.path.join(PROJECT_ROOT, "example/assets/image.png")
+    if os.path.exists(example_image_path):
+        gr.Examples(
+            examples=[
+                [example_image_path, example_prompt, 10, 42],
+            ],
+            inputs=[image_input, prompt_input, seconds_slider, seed_input],
+            outputs=[video_output],
+            cache_examples=False,
+        )
+if __name__ == "__main__":
+    demo.queue(max_size=2).launch(server_name="0.0.0.0", server_port=7860)