jiadisu commited on
Commit
febacca
·
1 Parent(s): 873b6ec

init space

Browse files
Files changed (4) hide show
  1. .gitignore +1 -1
  2. Dockerfile +51 -0
  3. README_DEPLOY.md +122 -0
  4. app.py +251 -0
.gitignore CHANGED
@@ -1,7 +1,7 @@
1
  tmp*
2
  depyf
3
  torch_compile_cache
4
-
5
  __pycache__
6
  *.so
7
  build
 
1
  tmp*
2
  depyf
3
  torch_compile_cache
4
+ venv/
5
  __pycache__
6
  *.so
7
  build
Dockerfile ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # =============================================================================
2
+ # HF Spaces Docker image for daVinci-MagiHuman
3
+ # Hardware: A100-80GB (or H100)
4
+ # =============================================================================
5
+ # Based on the official MagiCompiler image which includes:
6
+ # - CUDA 12.4, cuDNN, Python 3.12, PyTorch 2.9
7
+ # - MagiCompiler (pre-installed)
8
+ # - Flash Attention 3 (Hopper) (pre-installed)
9
+ # =============================================================================
10
+ FROM sandai/magi-compiler:latest
11
+
12
+ ENV DEBIAN_FRONTEND=noninteractive
13
+ ENV PYTHONUNBUFFERED=1
14
+ ENV GRADIO_SERVER_NAME=0.0.0.0
15
+ ENV GRADIO_SERVER_PORT=7860
16
+
17
+ # System deps needed for audio/video processing
18
+ RUN apt-get update && apt-get install -y --no-install-recommends \
19
+ ffmpeg libsndfile1 && \
20
+ rm -rf /var/lib/apt/lists/*
21
+
22
+ WORKDIR /app
23
+
24
+ # ---------------------------------------------------------------------------
25
+ # Python dependencies
26
+ # ---------------------------------------------------------------------------
27
+ COPY requirements.txt requirements-nodeps.txt ./
28
+ RUN pip install --no-cache-dir -r requirements.txt && \
29
+ pip install --no-cache-dir --no-deps -r requirements-nodeps.txt && \
30
+ pip install --no-cache-dir gradio huggingface_hub soundfile
31
+
32
+ # ---------------------------------------------------------------------------
33
+ # Project code
34
+ # ---------------------------------------------------------------------------
35
+ COPY inference/ inference/
36
+ COPY example/ example/
37
+ COPY app.py .
38
+
39
+ # ---------------------------------------------------------------------------
40
+ # Model weights are downloaded at runtime from HF Hub.
41
+ # Set HF_TOKEN as a Space secret if any repos are gated/private.
42
+ #
43
+ # Persistent storage (/data) is recommended on HF Spaces so weights survive
44
+ # container restarts. Enable it in Space settings → "Persistent storage".
45
+ # ---------------------------------------------------------------------------
46
+ ENV MODEL_ROOT=/data/models
47
+
48
+ # HF Spaces requires the app to listen on port 7860
49
+ EXPOSE 7860
50
+
51
+ CMD ["python", "app.py"]
README_DEPLOY.md ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deploying daVinci-MagiHuman to Hugging Face Spaces
2
+
3
+ ## Overview
4
+
5
+ The deployment uses 3 files:
6
+ - **`app.py`** — Gradio frontend + model download + inference pipeline
7
+ - **`Dockerfile`** — Based on `sandai/magi-compiler:latest` (includes MagiCompiler + Flash Attention)
8
+ - **`requirements.txt`** / **`requirements-nodeps.txt`** — Python dependencies
9
+
10
+ All model weights are downloaded automatically from HF Hub at startup:
11
+
12
+ | HF Repo | Contents | ~Size |
13
+ |---------|----------|-------|
14
+ | `GAIR-NLP/daVinci-MagiHuman` | `distill/`, `turbo_vae/` | ~30GB |
15
+ | `stabilityai/stable-audio-open-1.0` | Audio VAE | ~2GB |
16
+ | `google/t5gemma-9b-9b-ul2` | Text encoder | ~18GB |
17
+ | `Wan-AI/Wan2.2-TI2V-5B` | Video VAE | ~10GB |
18
+
19
+ ## Step-by-step
20
+
21
+ ### 1. Create HF Space
22
+
23
+ Via CLI:
24
+ ```bash
25
+ pip install huggingface_hub[cli]
26
+ huggingface-cli login
27
+
28
+ huggingface-cli repo create SII-GAIR/daVinci-MagiHuman \
29
+ --type space --space-sdk docker --space-hardware a100-large
30
+ ```
31
+
32
+ Or via HF web UI:
33
+ - Go to huggingface.co → New Space
34
+ - SDK: **Docker**
35
+ - Hardware: **A100 Large (80GB)**
36
+
37
+ ### 2. Enable persistent storage
38
+
39
+ In Space Settings → **Persistent storage** → Enable.
40
+
41
+ This stores downloaded models in `/data/` so they survive container restarts.
42
+ Without it, every restart re-downloads ~60GB of weights.
43
+
44
+ ### 3. Add secrets (if needed)
45
+
46
+ In Space Settings → **Repository secrets**, add:
47
+ - `HF_TOKEN` — your HF access token (required if any model repo is gated/private)
48
+
49
+ ### 4. Push code to the Space
50
+
51
+ ```bash
52
+ cd /path/to/daVinci-MagiHuman
53
+
54
+ # Add the Space as a git remote
55
+ git remote add space https://huggingface.co/spaces/SII-GAIR/daVinci-MagiHuman
56
+
57
+ # Push needed files
58
+ git add app.py Dockerfile requirements.txt requirements-nodeps.txt inference/ example/
59
+ git commit -m "Add Gradio app for HF Spaces deployment"
60
+ git push space main
61
+ ```
62
+
63
+ ### 5. Monitor build & startup
64
+
65
+ - Go to your Space page → **Logs** tab
66
+ - **Build phase** (~5–10 min): Docker image build, pip install
67
+ - **Startup phase** (~10–20 min first time): model downloads from HF Hub
68
+ - **Subsequent restarts** (~2–5 min): models cached in persistent storage, only pipeline init
69
+
70
+ ## What happens at startup
71
+
72
+ ```
73
+ Container starts
74
+
75
+ app.py runs download_models()
76
+ ├─ GAIR-NLP/daVinci-MagiHuman → /data/models/distill/, /data/models/turbo_vae/
77
+ ├─ stabilityai/stable-audio-open-1.0 → /data/models/audio/
78
+ ├─ google/t5gemma-9b-9b-ul2 → /data/models/t5/t5gemma-9b-9b-ul2/
79
+ └─ Wan-AI/Wan2.2-TI2V-5B → /data/models/wan_vae/Wan2.2-TI2V-5B/
80
+
81
+ Simulates single-GPU distributed env (RANK=0, WORLD_SIZE=1)
82
+
83
+ initialize_infra() → loads DiT model to GPU
84
+
85
+ MagiPipeline() → loads VAE, Audio VAE, T5-Gemma, TurboVAED
86
+
87
+ Gradio server starts on :7860
88
+ ```
89
+
90
+ ## Architecture notes
91
+
92
+ - **Distilled model**: 8 denoising steps (vs 32 for base), no CFG → ~4x faster
93
+ - **Resolution**: 448×256 base
94
+ - **Inference speed**: ~2s for 5s video on H100
95
+ - **Audio**: generated jointly with video via the single-stream Transformer
96
+
97
+ ## Cost
98
+
99
+ - HF Spaces A100-80GB: ~$4.13/hr
100
+ - Enable "Sleep after N minutes of inactivity" in Space settings to reduce costs
101
+ - Persistent storage: $0.10/GB/month (small cost, big time saving)
102
+
103
+ ## Local testing
104
+
105
+ ```bash
106
+ # Models will be downloaded to /data/models by default.
107
+ # Override with MODEL_ROOT if you have them locally:
108
+ export MODEL_ROOT=/path/to/your/checkpoints
109
+
110
+ python app.py
111
+ # Open http://localhost:7860
112
+ ```
113
+
114
+ ## Troubleshooting
115
+
116
+ | Issue | Fix |
117
+ |-------|-----|
118
+ | OOM on A100-40GB | Use A100-80GB; model needs ~60GB peak |
119
+ | Slow first start | Enable persistent storage to cache weights |
120
+ | `magi_compiler` import error | Ensure Dockerfile uses `sandai/magi-compiler:latest` |
121
+ | `flash_attn` import error | Same — included in the base image |
122
+ | Download fails for gated repo | Add `HF_TOKEN` secret, accept model license on HF |
app.py ADDED
@@ -0,0 +1,251 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Gradio frontend for daVinci-MagiHuman distilled model.
4
+
5
+ Designed for Hugging Face Spaces (A100-80GB GPU).
6
+ Accepts an image + text prompt + duration, generates audio-video output.
7
+ """
8
+
9
+ import json
10
+ import os
11
+ import sys
12
+ import tempfile
13
+ import uuid
14
+
15
+ # ---------------------------------------------------------------------------
16
+ # 1. Download all model weights from HF Hub (runs once, cached afterwards)
17
+ # ---------------------------------------------------------------------------
18
+ # HF Spaces persistent storage: /data (survives restarts if enabled)
19
+ # Fallback to /tmp/models if /data is not available.
20
+ MODEL_ROOT = os.environ.get("MODEL_ROOT", "/data/models")
21
+ os.makedirs(MODEL_ROOT, exist_ok=True)
22
+
23
+ # HF repo → local sub-directory mapping
24
+ HF_REPOS = {
25
+ # Project's own weights
26
+ "GAIR-NLP/daVinci-MagiHuman": {
27
+ "subdir": ".", # download to MODEL_ROOT root
28
+ "allow_patterns": [
29
+ "distill/**",
30
+ "turbo_vae/**",
31
+ ],
32
+ },
33
+ # Third-party open-source models
34
+ "stabilityai/stable-audio-open-1.0": {
35
+ "subdir": "audio",
36
+ },
37
+ "google/t5gemma-9b-9b-ul2": {
38
+ "subdir": "t5/t5gemma-9b-9b-ul2",
39
+ },
40
+ "Wan-AI/Wan2.2-TI2V-5B": {
41
+ "subdir": "wan_vae/Wan2.2-TI2V-5B",
42
+ },
43
+ }
44
+
45
+
46
+ def download_models():
47
+ """Download all required model weights from HF Hub."""
48
+ from huggingface_hub import snapshot_download
49
+
50
+ hf_token = os.environ.get("HF_TOKEN")
51
+
52
+ for repo_id, spec in HF_REPOS.items():
53
+ local_dir = os.path.join(MODEL_ROOT, spec["subdir"])
54
+ # Simple check: if directory already has files, skip download
55
+ if os.path.isdir(local_dir) and os.listdir(local_dir):
56
+ print(f"[download] {repo_id} → {local_dir} (already cached, skipping)")
57
+ continue
58
+
59
+ print(f"[download] {repo_id} → {local_dir} (downloading …)")
60
+ os.makedirs(local_dir, exist_ok=True)
61
+
62
+ kwargs = {
63
+ "repo_id": repo_id,
64
+ "local_dir": local_dir,
65
+ "token": hf_token,
66
+ }
67
+ if "allow_patterns" in spec:
68
+ kwargs["allow_patterns"] = spec["allow_patterns"]
69
+
70
+ snapshot_download(**kwargs)
71
+ print(f"[download] {repo_id} done.")
72
+
73
+ print("[download] All models ready.")
74
+
75
+
76
+ print("[app] Checking / downloading model weights …")
77
+ download_models()
78
+
79
+ # ---------------------------------------------------------------------------
80
+ # 2. Environment bootstrap – must happen BEFORE any inference imports
81
+ # ---------------------------------------------------------------------------
82
+ # HF Spaces launches a single process; we simulate the minimal distributed
83
+ # environment that the pipeline expects (world_size=1, rank=0).
84
+ os.environ.setdefault("MASTER_ADDR", "localhost")
85
+ os.environ.setdefault("MASTER_PORT", "29500")
86
+ os.environ.setdefault("RANK", "0")
87
+ os.environ.setdefault("WORLD_SIZE", "1")
88
+ os.environ.setdefault("LOCAL_RANK", "0")
89
+ os.environ.setdefault("PYTORCH_CUDA_ALLOC_CONF", "expandable_segments:True")
90
+
91
+ # Project root must be on sys.path so `inference.*` imports resolve.
92
+ PROJECT_ROOT = os.path.dirname(os.path.abspath(__file__))
93
+ if PROJECT_ROOT not in sys.path:
94
+ sys.path.insert(0, PROJECT_ROOT)
95
+
96
+ # Build the config JSON that maps to the downloaded paths.
97
+ CONFIG_OVERRIDES = {
98
+ "engine_config": {
99
+ "load": os.path.join(MODEL_ROOT, "distill"),
100
+ "distill": True,
101
+ "cp_size": 1,
102
+ },
103
+ "evaluation_config": {
104
+ "cfg_number": 1,
105
+ "num_inference_steps": 8,
106
+ "audio_model_path": os.path.join(MODEL_ROOT, "audio"),
107
+ "txt_model_path": os.path.join(MODEL_ROOT, "t5/t5gemma-9b-9b-ul2"),
108
+ "vae_model_path": os.path.join(MODEL_ROOT, "wan_vae/Wan2.2-TI2V-5B"),
109
+ "use_turbo_vae": True,
110
+ "student_config_path": os.path.join(MODEL_ROOT, "turbo_vae/TurboV3-Wan22-TinyShallow_7_7.json"),
111
+ "student_ckpt_path": os.path.join(MODEL_ROOT, "turbo_vae/checkpoint-340000.ckpt"),
112
+ },
113
+ }
114
+
115
+ # Write a temporary config JSON that parse_config() can pick up via CLI args.
116
+ _tmp_config = os.path.join(tempfile.gettempdir(), "magihuman_config.json")
117
+ with open(_tmp_config, "w") as f:
118
+ json.dump(CONFIG_OVERRIDES, f)
119
+
120
+ # Inject the config path into sys.argv so that parse_config() finds it.
121
+ sys.argv = [sys.argv[0], "--config-load-path", _tmp_config]
122
+
123
+ # ---------------------------------------------------------------------------
124
+ # 3. Initialize infrastructure & build pipeline (runs once at startup)
125
+ # ---------------------------------------------------------------------------
126
+ import gradio as gr
127
+ import torch # noqa: E402
128
+
129
+ from inference.infra import initialize_infra
130
+ from inference.common import parse_config
131
+ from inference.model.dit import get_dit
132
+ from inference.pipeline.pipeline import MagiPipeline
133
+
134
+ print("[app] Initializing infrastructure …")
135
+ initialize_infra()
136
+
137
+ print("[app] Loading model …")
138
+ config = parse_config()
139
+ model = get_dit(config.arch_config, config.engine_config)
140
+ pipeline = MagiPipeline(model, config.evaluation_config)
141
+ print("[app] Pipeline ready.")
142
+
143
+
144
+ # ---------------------------------------------------------------------------
145
+ # 4. Inference wrapper
146
+ # ---------------------------------------------------------------------------
147
+ def generate_video(
148
+ image,
149
+ prompt: str,
150
+ seconds: int,
151
+ seed: int,
152
+ ):
153
+ """Called by Gradio – returns path to the output .mp4 file."""
154
+ if image is None:
155
+ raise gr.Error("Please upload a reference image.")
156
+ if not prompt or not prompt.strip():
157
+ raise gr.Error("Please enter a text prompt.")
158
+
159
+ # Gradio passes a filepath (str) for gr.Image(type="filepath")
160
+ image_path = image
161
+
162
+ output_dir = tempfile.mkdtemp(prefix="magihuman_")
163
+ save_prefix = os.path.join(output_dir, f"output_{uuid.uuid4().hex[:8]}")
164
+
165
+ result_path = pipeline.run_offline(
166
+ prompt=prompt,
167
+ image=image_path,
168
+ audio=None,
169
+ save_path_prefix=save_prefix,
170
+ seed=int(seed),
171
+ seconds=int(seconds),
172
+ br_width=448,
173
+ br_height=256,
174
+ )
175
+
176
+ return result_path
177
+
178
+
179
+ # ---------------------------------------------------------------------------
180
+ # 5. Gradio UI
181
+ # ---------------------------------------------------------------------------
182
+ TITLE = "daVinci-MagiHuman – Audio-Video Generation"
183
+ DESCRIPTION = (
184
+ "Upload a reference image, enter a descriptive prompt, choose the video "
185
+ "duration (4–10 s), and click **Generate**. The model produces a video "
186
+ "with synchronized audio.\n\n"
187
+ "**Model**: 15B single-stream Transformer (distilled, 8-step inference) "
188
+ "| **Resolution**: 448×256 | **FPS**: 25"
189
+ )
190
+
191
+ with gr.Blocks(title=TITLE, theme=gr.themes.Soft()) as demo:
192
+ gr.Markdown(f"# {TITLE}")
193
+ gr.Markdown(DESCRIPTION)
194
+
195
+ with gr.Row():
196
+ with gr.Column(scale=1):
197
+ image_input = gr.Image(
198
+ label="Reference Image",
199
+ type="filepath",
200
+ height=300,
201
+ )
202
+ prompt_input = gr.Textbox(
203
+ label="Prompt",
204
+ placeholder="Describe the scene you want to generate …",
205
+ lines=4,
206
+ )
207
+ with gr.Row():
208
+ seconds_slider = gr.Slider(
209
+ minimum=4,
210
+ maximum=10,
211
+ step=1,
212
+ value=4,
213
+ label="Duration (seconds)",
214
+ )
215
+ seed_input = gr.Number(
216
+ value=42,
217
+ label="Seed",
218
+ precision=0,
219
+ )
220
+ generate_btn = gr.Button("Generate", variant="primary")
221
+
222
+ with gr.Column(scale=1):
223
+ video_output = gr.Video(label="Generated Video")
224
+
225
+ generate_btn.click(
226
+ fn=generate_video,
227
+ inputs=[image_input, prompt_input, seconds_slider, seed_input],
228
+ outputs=[video_output],
229
+ )
230
+
231
+ # Pre-loaded example (uses bundled assets from the repo)
232
+ example_prompt_path = os.path.join(PROJECT_ROOT, "example/assets/prompt.txt")
233
+ example_prompt = "A person talking in a living room."
234
+ if os.path.exists(example_prompt_path):
235
+ with open(example_prompt_path) as f:
236
+ example_prompt = f.read().strip()
237
+
238
+ example_image_path = os.path.join(PROJECT_ROOT, "example/assets/image.png")
239
+ if os.path.exists(example_image_path):
240
+ gr.Examples(
241
+ examples=[
242
+ [example_image_path, example_prompt, 10, 42],
243
+ ],
244
+ inputs=[image_input, prompt_input, seconds_slider, seed_input],
245
+ outputs=[video_output],
246
+ cache_examples=False,
247
+ )
248
+
249
+
250
+ if __name__ == "__main__":
251
+ demo.queue(max_size=2).launch(server_name="0.0.0.0", server_port=7860)