pollix commited on
Commit
4f7e639
Β·
verified Β·
1 Parent(s): fdf7b33

restore space from local (app.py + showcase + thumbnail)

Browse files
Files changed (7) hide show
  1. .gitattributes +4 -35
  2. LICENSE +21 -0
  3. README.md +100 -9
  4. app.py +1509 -0
  5. requirements.txt +2 -0
  6. showcase/sf_walk.mp4 +3 -0
  7. thumbnail.png +3 -0
.gitattributes CHANGED
@@ -1,35 +1,4 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
- *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
- *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
- *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
1
+ *.mp4 filter=lfs diff=lfs merge=lfs -text
2
+ *.png filter=lfs diff=lfs merge=lfs -text
3
+ *.jpg filter=lfs diff=lfs merge=lfs -text
4
+ *.wav filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2026 StudioMI300
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md CHANGED
@@ -1,15 +1,106 @@
1
  ---
2
- title: Studiomi300
3
- emoji: πŸ’»
4
- colorFrom: yellow
5
- colorTo: purple
6
  sdk: gradio
7
- sdk_version: 6.14.0
8
- python_version: '3.13'
9
  app_file: app.py
10
- pinned: false
11
  license: mit
12
- short_description: One prompt β†’ 30s cinematic reel. End-to-end on a single AMD.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: StudioMI300
3
+ emoji: 🎬
4
+ colorFrom: indigo
5
+ colorTo: pink
6
  sdk: gradio
7
+ sdk_version: 5.29.0
 
8
  app_file: app.py
9
+ pinned: true
10
  license: mit
11
+ short_description: One prompt β†’ 30s cinematic reel on a single AMD MI300X
12
+ thumbnail: thumbnail.png
13
+ tags:
14
+ - amd
15
+ - amd-hackathon-2026
16
+ - mi300x
17
+ - rocm
18
+ - video-generation
19
+ - wan2.2
20
+ - flux
21
+ - qwen
22
+ - text-to-video
23
+ - text-to-film
24
+ - cinematic
25
+ - gradio
26
  ---
27
 
28
+ # StudioMI300
29
+
30
+ **One prompt β†’ 30-second cinematic reel.** Built for the AMD Developer Hackathon 2026
31
+ on a single AMD Instinct MI300X (192 GB HBM3, ROCm 7.2).
32
+
33
+ ## What it does
34
+
35
+ You write one sentence. The pipeline plans a six-shot story, paints character
36
+ keyframes, animates them, scores the music, narrates the voice-over, and stitches
37
+ everything into a 30-second `mp4`. No setup. No LoRA training. No per-shot prompting.
38
+
39
+ ```
40
+ "A young woman walks through neon-lit Tokyo at night and meets two friends."
41
+ ↓
42
+ [ ~45 minutes on a single MI300X ]
43
+ ↓
44
+ 30s cinematic reel.mp4 + audio
45
+ ```
46
+
47
+ ## How it works (single MI300X, sequential)
48
+
49
+ 1. **Director Agent** β€” Qwen3.5-35B-A3B (BF16, vLLM, AITER MoE) plans 6 shots,
50
+ character portraits, music brief, VO script, language tag.
51
+ 2. **Per-shot keyframes** β€” FLUX.2 [klein] 4B reference editing seeds each
52
+ shot from a single canonical character master, pinning identity.
53
+ 3. **Animation** β€” Wan2.2-I2V-A14B with ParaAttention FBCache (2Γ— lossless)
54
+ and selective `torch.compile` on `transformer_2` (1.2Γ— compile win).
55
+ 4. **Vision Critic** β€” the same Qwen3.5 looks at four sampled frames per clip,
56
+ labels failure modes (`STYLIZED_AI_LOOK`, `CHARACTER_DRIFT`, `EXTRAS_INVADE_FRAME`...)
57
+ and triggers a re-render with a bumped seed if the score is below threshold.
58
+ 5. **Music** β€” ACE-Step v1 3.5B generates a 30-second instrumental from the
59
+ Director's music brief.
60
+ 6. **Voice-over** β€” Kokoro-82M narrates the Director's script in any of 9
61
+ languages (Director picks the language to match the setting).
62
+ 7. **Mix** β€” `ffmpeg` concat-and-loudnorm into the final `mp4`.
63
+
64
+ ## The full open-source stack (Apache 2.0 / MIT throughout)
65
+
66
+ | Stage | Model | License |
67
+ |---|---|---|
68
+ | Planner / Critic | Qwen3.5-35B-A3B | Apache 2.0 |
69
+ | Image | FLUX.2 [klein] 4B | Apache 2.0 |
70
+ | Video | Wan2.2-I2V-A14B | Apache 2.0 |
71
+ | Music | ACE-Step v1 3.5B | Apache 2.0 |
72
+ | TTS | Kokoro-82M | Apache 2.0 |
73
+ | Serving | vLLM 0.17 | Apache 2.0 |
74
+ | Caching | ParaAttention FBCache | Apache 2.0 |
75
+ | AMD kernels | AITER 0.1.13 | MIT |
76
+ | Project code | StudioMI300 | MIT |
77
+
78
+ Every output you generate from this stack is yours to use commercially.
79
+
80
+ ## Why a single MI300X
81
+
82
+ Most cinematic generation pipelines assume you have a multi-GPU cluster: one GPU
83
+ for the planner, one for the image model, one for the video model, etc. On 192 GB
84
+ HBM3 the pipeline runs them all sequentially on the same card. That's the project's central
85
+ constraint and also its main flex:
86
+
87
+ - Qwen3.5-35B planner loads / unloads cleanly between Director and Critic phases.
88
+ - Wan2.2-I2V-A14B (~80 GB BF16) leaves headroom for FLUX.2 [klein] 4B (~8 GB)
89
+ and ACE-Step (~12 GB) to live alongside in subprocess scope.
90
+ - AITER MoE for the planner. AITER FA / FP8 was evaluated for Wan2.2 β€” results
91
+ documented in `incidents.md` of the GitHub repo (FP8 path crashes mid-pipeline
92
+ on ROCm 7.2, AITER/issues#2187, BF16 ships).
93
+
94
+ ## Live demo
95
+
96
+ This Space hosts the showcase. Live generation requires an MI300X (45 minutes
97
+ per reel is too long for a casual visitor anyway). The full pipeline is on
98
+ GitHub β€” clone, point it at your MI300X, and it generates.
99
+
100
+ ## Credits
101
+
102
+ AMD Developer Hackathon 2026 entry. Built solo over six days on one AMD
103
+ Developer Cloud MI300X droplet.
104
+
105
+ Made with the open-source ecosystem: Black Forest Labs, Wan-AI, Alibaba Qwen,
106
+ StepFun, hexgrad/Kokoro, vLLM, ParaAttention, diffusers, AMD ROCm + AITER.
app.py ADDED
@@ -0,0 +1,1509 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import time
3
+ from pathlib import Path
4
+
5
+ import gradio as gr
6
+ import requests
7
+
8
+ APP_ROOT = Path(__file__).parent
9
+ SHOWCASE_DIR = APP_ROOT / "showcase"
10
+
11
+ GITHUB_URL = "https://github.com/bladedevoff/studiomi300"
12
+
13
+ API_URL = (os.environ.get("STUDIO_API_URL", "") or "").rstrip("/")
14
+ API_TOKEN = os.environ.get("STUDIO_API_TOKEN", "")
15
+ API_HEADERS = {"X-API-Token": API_TOKEN} if API_TOKEN else {}
16
+
17
+ # local mp4 cache β€” Space downloads from droplet over HTTP (server-side, no
18
+ # mixed-content), then serves to browser over Gradio's HTTPS file route.
19
+ DEMO_CACHE = APP_ROOT / "demo_cache"
20
+ DEMO_CACHE.mkdir(exist_ok=True)
21
+
22
+
23
+ def cache_demo_mp4(job_id):
24
+ """Fetch demo mp4 from droplet API into the Space's local cache. Returns Path or None."""
25
+ p = DEMO_CACHE / f"{job_id}.mp4"
26
+ if p.exists() and p.stat().st_size > 1024:
27
+ return p
28
+ if not API_URL:
29
+ return None
30
+ try:
31
+ r = requests.get(f"{API_URL}/demos/{job_id}.mp4", timeout=120, stream=True)
32
+ if r.status_code != 200:
33
+ return None
34
+ with open(p, "wb") as f:
35
+ for chunk in r.iter_content(64 * 1024):
36
+ f.write(chunk)
37
+ return p
38
+ except requests.RequestException:
39
+ return None
40
+
41
+
42
+ SHOWCASE_REELS = [
43
+ {
44
+ "title": "San Francisco walk - golden hour to blue hour",
45
+ "video": "sf_walk.mp4",
46
+ "logline": "A young woman walks alone down a steep Pacific Heights street, past painted Victorians and rolling fog, to a quiet overlook of the Golden Gate Bridge as the light shifts to blue hour.",
47
+ "prompt": (
48
+ "30-second cinematic reel: a young woman walks alone through San Francisco "
49
+ "at golden hour - down a steep Pacific Heights street with bay views, past "
50
+ "painted Victorian houses, fog rolling in over the Pacific, ending at a "
51
+ "quiet overlook of the Golden Gate Bridge as the light shifts to blue hour"
52
+ ),
53
+ "music_style": "intimate ambient piano with a soft synth pad, 75 BPM, contemplative",
54
+ "vo_lang": "American English (Director picked from setting)",
55
+ "render_time_min": 81,
56
+ "shots": 6,
57
+ "stack_used": [
58
+ "Director Agent: Qwen3.5-35B-A3B (vLLM, AITER MoE)",
59
+ "Vision Critic: same Qwen3.5 reload, 4 frames per clip",
60
+ "Image: FLUX.2 [klein] 4B reference editing",
61
+ "Video: Wan2.2-I2V-A14B + FBCache + torch.compile + FLF2V on cut:false arcs",
62
+ "Music: ACE-Step v1 3.5B",
63
+ "Voice-over: Kokoro-82M, per-shot wavs, ffmpeg adelay sync",
64
+ ],
65
+ },
66
+ ]
67
+
68
+
69
+ HACKATHON_BADGE = "amd-hackathon-2026"
70
+
71
+
72
+ def fetch_demos(limit=50):
73
+ if not API_URL:
74
+ return []
75
+ try:
76
+ r = requests.get(f"{API_URL}/demos", params={"limit": limit}, timeout=10)
77
+ if r.status_code == 200:
78
+ return r.json()
79
+ except requests.RequestException:
80
+ pass
81
+ return []
82
+
83
+
84
+ def backend_health():
85
+ if not API_URL:
86
+ return "not configured"
87
+ try:
88
+ r = requests.get(f"{API_URL}/health", timeout=5)
89
+ if r.status_code == 200:
90
+ j = r.json()
91
+ return "busy (rendering)" if j.get("gpu_busy") else "idle"
92
+ except requests.RequestException:
93
+ pass
94
+ return "offline"
95
+
96
+
97
+ def render_demo_card(d):
98
+ prompt = (d.get("prompt") or "")[:240]
99
+ duration = d.get("duration_s") or 0
100
+ p = cache_demo_mp4(d["id"])
101
+ if p is None:
102
+ return ""
103
+ src = f"/gradio_api/file={p}" # Gradio HTTPS file route
104
+ return (
105
+ f'<div class="demo-card">'
106
+ f'<video src="{src}" controls preload="metadata" loop muted playsinline></video>'
107
+ f'<div class="demo-prompt">{prompt}</div>'
108
+ f'<div class="demo-meta">{int(duration)}s render</div>'
109
+ f'</div>'
110
+ )
111
+
112
+
113
+ def render_demo_grid(demos, top_n=10):
114
+ if not demos:
115
+ if not API_URL:
116
+ msg = "Live demo backend not configured."
117
+ else:
118
+ msg = "No live generations yet. Be the first."
119
+ return f'<div class="demo-empty">{msg}</div>'
120
+ head = demos[:top_n]
121
+ tail = demos[top_n:]
122
+ cards = "".join(render_demo_card(d) for d in head)
123
+ out = f'<div class="demo-grid">{cards}</div>'
124
+ if tail:
125
+ more = "".join(render_demo_card(d) for d in tail)
126
+ out += (
127
+ f'<details class="demo-more"><summary>Show {len(tail)} older'
128
+ f'</summary><div class="demo-grid">{more}</div></details>'
129
+ )
130
+ return out
131
+
132
+
133
+ STAGE_LABELS = {
134
+ "queued": "queued",
135
+ "starting": "starting up",
136
+ "klein_loading": "loading FLUX.2 klein 4B",
137
+ "keyframe_starting": "painting keyframe",
138
+ "keyframe_ready": "keyframe ready",
139
+ "wan_loading": "loading Wan2.2-I2V-A14B",
140
+ "wan_rendering": "animating with Wan2.2",
141
+ "rendered": "video rendered",
142
+ "music_starting": "generating music (ACE-Step)",
143
+ "music_ready": "music ready",
144
+ "music_skipped": "music skipped",
145
+ "music_failed": "music failed (silent video)",
146
+ "mix_starting": "mixing audio onto video",
147
+ "mix_done": "final mp4 ready",
148
+ "completed": "done",
149
+ "done": "done",
150
+ }
151
+
152
+ STAGE_PROGRESS = {
153
+ "queued": 0.02, "starting": 0.04,
154
+ "klein_loading": 0.08, "keyframe_starting": 0.12, "keyframe_ready": 0.18,
155
+ "wan_loading": 0.24,
156
+ "wan_rendering": 0.80,
157
+ "rendered": 0.86,
158
+ "music_starting": 0.88,
159
+ "music_ready": 0.95,
160
+ "music_skipped": 0.95, "music_failed": 0.95,
161
+ "mix_starting": 0.97,
162
+ "mix_done": 1.0,
163
+ "completed": 1.0, "done": 1.0,
164
+ }
165
+
166
+
167
+ def submit_demo(prompt, request: gr.Request = None):
168
+ if not API_URL:
169
+ raise gr.Error("Live demo backend not configured. Visit later.")
170
+ p = (prompt or "").strip()
171
+ if len(p) < 20:
172
+ raise gr.Error("Prompt must be at least 20 characters.")
173
+ if len(p) > 1500:
174
+ raise gr.Error("Prompt too long (1500 char max).")
175
+
176
+ headers = dict(API_HEADERS)
177
+ if request is not None:
178
+ try:
179
+ fwd = request.headers.get("x-forwarded-for", "") if request.headers else ""
180
+ user_ip = fwd.split(",")[0].strip() if fwd else (request.client.host if request.client else "")
181
+ if user_ip:
182
+ headers["X-Forwarded-For"] = user_ip
183
+ ua = request.headers.get("user-agent", "") if request.headers else ""
184
+ if ua:
185
+ headers["X-Original-User-Agent"] = ua[:200]
186
+ except (AttributeError, KeyError):
187
+ pass
188
+
189
+ try:
190
+ r = requests.post(f"{API_URL}/jobs", headers=headers, json={
191
+ "prompt": p, "mode": "demo", "use_critic": False,
192
+ }, timeout=15)
193
+ except requests.RequestException as e:
194
+ raise gr.Error(f"backend unreachable: {e}")
195
+ if r.status_code == 401:
196
+ raise gr.Error("backend rejected token (Space secret out of sync)")
197
+ if r.status_code == 422:
198
+ raise gr.Error("Prompt rejected by content policy. Please rephrase.")
199
+ if r.status_code != 200:
200
+ raise gr.Error(f"submit failed: {r.text[:200]}")
201
+ job_id = r.json()["job_id"]
202
+
203
+ yield f"**Job {job_id}** Β· submitted, waiting for GPU\n\n> {p}", None, gr.update()
204
+
205
+ deadline = time.time() + 900
206
+ last_render = ""
207
+ while time.time() < deadline:
208
+ time.sleep(2)
209
+ try:
210
+ meta = requests.get(f"{API_URL}/jobs/{job_id}", headers=API_HEADERS, timeout=10).json()
211
+ except requests.RequestException:
212
+ continue
213
+ stage = meta.get("stage", "queued")
214
+ status = meta.get("status", "queued")
215
+
216
+ elapsed = int(time.time() - meta.get("started", time.time())) if meta.get("started") else 0
217
+ if status == "queued":
218
+ pos = meta.get("queue_position", 0)
219
+ qsize = meta.get("queue_size", 1)
220
+ if pos:
221
+ status_md = f"**Job {job_id}** Β· queued at **position {pos} of {qsize}**, waiting for GPU\n\n> {p}"
222
+ else:
223
+ status_md = f"**Job {job_id}** Β· queued\n\n> {p}"
224
+ else:
225
+ label = STAGE_LABELS.get(stage, stage)
226
+ status_md = f"**Job {job_id}** Β· {label} Β· {elapsed}s elapsed\n\n> {p}"
227
+
228
+ if status == "done":
229
+ duration = int((meta.get("finished") or 0) - (meta.get("started") or 0))
230
+ local = cache_demo_mp4(job_id) # download mp4 to Space's local fs
231
+ done_md = f"### Done in {duration}s\n\n**Job {job_id}** Β· saved to server, added to gallery below.\n\n> {p}"
232
+ yield done_md, str(local) if local else None, gr.update(value=render_demo_grid(fetch_demos()))
233
+ return
234
+ if status == "failed":
235
+ raise gr.Error(f"job failed at stage `{stage}`. Check droplet logs.")
236
+
237
+ if status_md != last_render:
238
+ last_render = status_md
239
+ yield status_md, None, gr.update()
240
+
241
+ raise gr.Error("timeout (>15 min). The droplet may be stuck or queue too long.")
242
+
243
+
244
+ def refresh_gallery():
245
+ return render_demo_grid(fetch_demos())
246
+
247
+
248
+ CUSTOM_CSS = r"""
249
+ :root {
250
+ --grad-a: #a78bfa;
251
+ --grad-b: #f472b6;
252
+ --grad-c: #fbbf24;
253
+ --bg-card: #0f172a;
254
+ --bg-deep: #020617;
255
+ --border-card: rgba(167, 139, 250, 0.32);
256
+ --text-main: #f1f5f9;
257
+ --text-mute: #94a3b8;
258
+ }
259
+
260
+ .gradio-container { max-width: 1100px !important; margin: 0 auto !important; padding-left: 1rem !important; padding-right: 1rem !important; }
261
+ .app, .main, footer { margin: 0 auto !important; }
262
+
263
+ /* hero - always dark backdrop so the gradient text stays vivid in light/dark themes alike */
264
+ .hero {
265
+ text-align: center;
266
+ padding: 3rem 1.2rem 2rem 1.2rem;
267
+ background:
268
+ radial-gradient(ellipse 70% 90% at 50% 0%, rgba(244, 114, 182, .35), transparent 65%),
269
+ radial-gradient(ellipse 70% 90% at 50% 100%, rgba(167, 139, 250, .30), transparent 65%),
270
+ linear-gradient(180deg, #0b1120 0%, #050816 100%);
271
+ border-radius: 22px;
272
+ margin-bottom: 1rem;
273
+ border: 1px solid rgba(167, 139, 250, .25);
274
+ box-shadow: 0 14px 50px rgba(124, 58, 237, .18);
275
+ }
276
+ .hero-title {
277
+ font-size: clamp(2.6rem, 6vw, 4.6rem);
278
+ font-weight: 900;
279
+ line-height: 1;
280
+ letter-spacing: -0.03em;
281
+ background: linear-gradient(95deg, #c4b5fd 0%, #f9a8d4 50%, #fde68a 100%);
282
+ -webkit-background-clip: text;
283
+ background-clip: text;
284
+ color: transparent;
285
+ -webkit-text-fill-color: transparent;
286
+ text-shadow: 0 4px 36px rgba(244, 114, 182, .25);
287
+ margin: 0;
288
+ }
289
+ .hero-tagline {
290
+ font-size: clamp(1.05rem, 2vw, 1.35rem);
291
+ color: #e2e8f0;
292
+ margin-top: 0.85rem;
293
+ font-weight: 500;
294
+ max-width: 720px;
295
+ margin-left: auto;
296
+ margin-right: auto;
297
+ line-height: 1.5;
298
+ }
299
+ .badge-row { display: flex; justify-content: center; gap: 0.5rem; flex-wrap: wrap; margin-top: 1.4rem; }
300
+ .badge {
301
+ background: rgba(15, 23, 42, 0.85);
302
+ border: 1px solid rgba(148, 163, 184, .25);
303
+ padding: 0.4rem 0.95rem;
304
+ border-radius: 999px;
305
+ font-size: 0.83rem;
306
+ font-weight: 700;
307
+ letter-spacing: 0.01em;
308
+ backdrop-filter: blur(4px);
309
+ }
310
+ .badge-amd { color: #fca5a5; }
311
+ .badge-rocm { color: #fde68a; }
312
+ .badge-license { color: #6ee7b7; }
313
+ .badge-tag { color: #c4b5fd; }
314
+
315
+ /* stats strip - always dark tiles with bright gradient numbers */
316
+ .stat-strip { display: grid; grid-template-columns: repeat(4, 1fr); gap: 0.75rem; margin: 1.2rem 0 1.8rem 0; }
317
+ .stat-tile {
318
+ background: linear-gradient(160deg, #131c33 0%, #0a1023 100%);
319
+ border: 1px solid var(--border-card);
320
+ border-radius: 14px;
321
+ padding: 1.1rem 0.8rem;
322
+ text-align: center;
323
+ box-shadow: 0 6px 22px rgba(124, 58, 237, .08);
324
+ }
325
+ .stat-num {
326
+ font-size: 2.2rem;
327
+ font-weight: 900;
328
+ background: linear-gradient(95deg, #c4b5fd 0%, #f9a8d4 100%);
329
+ -webkit-background-clip: text;
330
+ background-clip: text;
331
+ color: transparent;
332
+ -webkit-text-fill-color: transparent;
333
+ line-height: 1.05;
334
+ text-shadow: 0 2px 18px rgba(244, 114, 182, .28);
335
+ }
336
+ .stat-lbl { font-size: 0.76rem; color: #cbd5e1; margin-top: 0.4rem; text-transform: uppercase; letter-spacing: 0.06em; font-weight: 600; }
337
+ @media (max-width: 720px) { .stat-strip { grid-template-columns: repeat(2, 1fr); } }
338
+
339
+ /* pipeline diagram */
340
+ .pipeline {
341
+ display: grid;
342
+ grid-template-columns: repeat(2, 1fr);
343
+ gap: 0.85rem;
344
+ margin: 1.5rem 0;
345
+ }
346
+ @media (max-width: 720px) { .pipeline { grid-template-columns: 1fr; } }
347
+
348
+ .stage {
349
+ position: relative;
350
+ background: linear-gradient(160deg, rgba(124, 58, 237, .07), rgba(15, 23, 42, .72));
351
+ border: 1px solid var(--border-card);
352
+ border-radius: 14px;
353
+ padding: 1.05rem 1.15rem;
354
+ display: flex;
355
+ gap: 0.85rem;
356
+ align-items: flex-start;
357
+ transition: transform .12s ease, border-color .12s ease;
358
+ }
359
+ .stage:hover { transform: translateY(-2px); border-color: rgba(236, 72, 153, .45); }
360
+ .stage-num {
361
+ flex: 0 0 2.4rem;
362
+ height: 2.4rem;
363
+ border-radius: 12px;
364
+ background: linear-gradient(135deg, var(--grad-a), var(--grad-b));
365
+ color: white;
366
+ font-weight: 800;
367
+ font-size: 1.05rem;
368
+ display: flex;
369
+ align-items: center;
370
+ justify-content: center;
371
+ }
372
+ .stage-body { flex: 1; }
373
+ .stage-title { font-weight: 700; font-size: 1.02rem; margin: 0 0 0.2rem 0; color: #e2e8f0; }
374
+ .stage-meta { font-size: 0.78rem; color: #fbbf24; font-weight: 600; margin-bottom: 0.35rem; letter-spacing: 0.02em; }
375
+ .stage-desc { font-size: 0.88rem; color: #cbd5e1; line-height: 1.5; margin: 0; }
376
+
377
+ /* failure label table */
378
+ .label-grid {
379
+ display: grid;
380
+ grid-template-columns: repeat(2, 1fr);
381
+ gap: 0.65rem;
382
+ margin: 1rem 0 0.5rem 0;
383
+ }
384
+ @media (max-width: 720px) { .label-grid { grid-template-columns: 1fr; } }
385
+ .label-card {
386
+ background: var(--bg-card);
387
+ border: 1px solid var(--border-card);
388
+ border-radius: 12px;
389
+ padding: 0.85rem 1rem;
390
+ }
391
+ .label-name {
392
+ font-family: 'JetBrains Mono', ui-monospace, monospace;
393
+ font-size: 0.78rem;
394
+ color: #f87171;
395
+ font-weight: 700;
396
+ letter-spacing: 0.02em;
397
+ background: rgba(248, 113, 113, .08);
398
+ padding: 0.2rem 0.45rem;
399
+ border-radius: 6px;
400
+ display: inline-block;
401
+ }
402
+ .label-desc { font-size: 0.85rem; color: #cbd5e1; margin-top: 0.5rem; line-height: 1.45; }
403
+ .label-fix { font-size: 0.8rem; color: #34d399; margin-top: 0.4rem; }
404
+
405
+ /* incident card */
406
+ .incident {
407
+ border-left: 3px solid var(--grad-b);
408
+ background: linear-gradient(95deg, rgba(236, 72, 153, .08), transparent 70%);
409
+ padding: 1rem 1.2rem;
410
+ border-radius: 10px;
411
+ margin: 0.85rem 0;
412
+ }
413
+ .incident-date { font-size: 0.75rem; color: #f87171; font-weight: 700; letter-spacing: 0.03em; text-transform: uppercase; }
414
+ .incident-title { font-weight: 700; font-size: 1.05rem; color: #e2e8f0; margin: 0.25rem 0 0.4rem 0; }
415
+ .incident-body { font-size: 0.9rem; color: #cbd5e1; line-height: 1.55; }
416
+ .incident-fix { font-size: 0.85rem; color: #86efac; margin-top: 0.4rem; }
417
+
418
+ /* perf bar */
419
+ .perf {
420
+ background: var(--bg-card);
421
+ border-radius: 10px;
422
+ padding: 0.65rem 0.85rem;
423
+ margin: 0.4rem 0;
424
+ display: grid;
425
+ grid-template-columns: 1fr 5rem;
426
+ gap: 0.75rem;
427
+ align-items: center;
428
+ }
429
+ .perf-label { font-size: 0.88rem; color: #e2e8f0; }
430
+ .perf-val { font-weight: 700; color: #6ee7b7; text-align: right; font-size: 0.9rem; }
431
+ .perf-bar { grid-column: 1 / -1; height: 6px; background: rgba(148, 163, 184, .14); border-radius: 4px; overflow: hidden; }
432
+ .perf-fill { height: 100%; background: linear-gradient(90deg, var(--grad-a), var(--grad-b)); border-radius: 4px; }
433
+
434
+ /* chart blocks */
435
+ .chart-card {
436
+ background: linear-gradient(160deg, #0f172a 0%, #060b1c 100%);
437
+ border: 1px solid var(--border-card);
438
+ border-radius: 14px;
439
+ padding: 1.1rem 1.2rem;
440
+ margin: 1rem 0;
441
+ }
442
+ .chart-title {
443
+ font-weight: 700;
444
+ font-size: 1rem;
445
+ color: #e2e8f0;
446
+ margin: 0 0 0.2rem 0;
447
+ letter-spacing: 0.01em;
448
+ }
449
+ .chart-sub { color: var(--text-mute); font-size: 0.82rem; margin: 0 0 0.85rem 0; }
450
+
451
+ /* horizontal bar chart - one row */
452
+ .hbar-row {
453
+ display: grid;
454
+ grid-template-columns: 12rem 1fr 4.5rem;
455
+ gap: 0.7rem;
456
+ align-items: center;
457
+ padding: 0.32rem 0;
458
+ font-size: 0.84rem;
459
+ }
460
+ .hbar-label { color: #e2e8f0; }
461
+ .hbar-track { background: rgba(148, 163, 184, .12); height: 12px; border-radius: 4px; overflow: hidden; }
462
+ .hbar-fill {
463
+ height: 100%;
464
+ border-radius: 4px;
465
+ background: linear-gradient(90deg, #c4b5fd, #f472b6);
466
+ display: flex; align-items: center; justify-content: flex-end;
467
+ padding-right: 0.4rem;
468
+ box-shadow: 0 0 12px rgba(244, 114, 182, .35);
469
+ }
470
+ .hbar-val { color: #6ee7b7; font-weight: 700; text-align: right; font-feature-settings: "tnum"; }
471
+ .hbar-val.muted { color: #fbbf24; }
472
+ .hbar-fill.warm { background: linear-gradient(90deg, #fde68a, #f97316); box-shadow: 0 0 12px rgba(249, 115, 22, .35); }
473
+ .hbar-fill.cold { background: linear-gradient(90deg, #67e8f9, #818cf8); box-shadow: 0 0 12px rgba(129, 140, 248, .35); }
474
+ @media (max-width: 720px) { .hbar-row { grid-template-columns: 8rem 1fr 3.5rem; font-size: 0.78rem; } }
475
+
476
+ /* stacked bar (one row, multiple segments) */
477
+ .stack-bar {
478
+ width: 100%;
479
+ height: 26px;
480
+ border-radius: 6px;
481
+ display: flex;
482
+ overflow: hidden;
483
+ margin: 0.4rem 0;
484
+ border: 1px solid rgba(148, 163, 184, .15);
485
+ }
486
+ .stack-seg {
487
+ height: 100%;
488
+ display: flex; align-items: center; justify-content: center;
489
+ font-size: 0.72rem;
490
+ color: white;
491
+ font-weight: 700;
492
+ text-shadow: 0 1px 2px rgba(0,0,0,.4);
493
+ white-space: nowrap;
494
+ overflow: hidden;
495
+ }
496
+ .stack-legend { display: flex; flex-wrap: wrap; gap: 0.6rem; margin-top: 0.5rem; font-size: 0.78rem; color: #cbd5e1; }
497
+ .stack-dot { display: inline-block; width: 0.65rem; height: 0.65rem; border-radius: 50%; margin-right: 0.3rem; vertical-align: middle; }
498
+
499
+ /* placeholder card while reel renders */
500
+ .placeholder {
501
+ border: 1.5px dashed rgba(148, 163, 184, .3);
502
+ border-radius: 14px;
503
+ padding: 2.2rem 1.5rem;
504
+ text-align: center;
505
+ background: linear-gradient(160deg, rgba(124, 58, 237, .06), transparent);
506
+ }
507
+ .placeholder-emoji { font-size: 2.4rem; margin-bottom: 0.6rem; }
508
+ .placeholder-title { font-weight: 700; font-size: 1.1rem; color: #e2e8f0; margin-bottom: 0.4rem; }
509
+ .placeholder-body { font-size: 0.92rem; color: var(--text-mute); max-width: 520px; margin: 0 auto; line-height: 1.5; }
510
+
511
+ /* live demo */
512
+ .demo-grid {
513
+ display: grid;
514
+ grid-template-columns: repeat(auto-fill, minmax(280px, 1fr));
515
+ gap: 0.85rem;
516
+ margin: 0.6rem 0 0.3rem 0;
517
+ }
518
+ .demo-card {
519
+ background: linear-gradient(160deg, #131c33 0%, #0a1023 100%);
520
+ border: 1px solid var(--border-card);
521
+ border-radius: 12px;
522
+ padding: 0.6rem;
523
+ display: flex; flex-direction: column; gap: 0.4rem;
524
+ }
525
+ .demo-card video { width: 100%; border-radius: 8px; background: #000; aspect-ratio: 16/9; object-fit: cover; }
526
+ .demo-prompt { font-size: 0.82rem; color: #cbd5e1; line-height: 1.35; }
527
+ .demo-meta { font-size: 0.72rem; color: var(--text-mute); letter-spacing: 0.04em; }
528
+ .demo-empty { padding: 1.5rem 1rem; text-align: center; color: var(--text-mute); border: 1.5px dashed rgba(148,163,184,.25); border-radius: 12px; }
529
+ .demo-more { margin-top: 0.8rem; }
530
+ .demo-more summary { cursor: pointer; color: #c4b5fd; font-weight: 600; padding: 0.6rem 1rem; background: rgba(124,58,237,.08); border-radius: 8px; user-select: none; }
531
+ .demo-more summary:hover { background: rgba(124,58,237,.16); }
532
+ .demo-more[open] summary { margin-bottom: 0.8rem; }
533
+
534
+ /* footer */
535
+ .footer {
536
+ text-align: center;
537
+ color: var(--text-mute);
538
+ font-size: 0.85rem;
539
+ padding: 1.5rem 0 0.5rem 0;
540
+ border-top: 1px solid rgba(148, 163, 184, .12);
541
+ margin-top: 2rem;
542
+ }
543
+ .footer a { color: #a78bfa; text-decoration: none; }
544
+ .footer a:hover { color: #ec4899; }
545
+
546
+ /* mobile - tighten everything for <=720px */
547
+ @media (max-width: 720px) {
548
+ .gradio-container { padding-left: 0.5rem !important; padding-right: 0.5rem !important; }
549
+ .hero { padding: 1.5rem 0.7rem 1.1rem 0.7rem; border-radius: 14px; margin-bottom: 0.6rem; }
550
+ .hero-title { font-size: 2.4rem !important; line-height: 1.05; }
551
+ .hero-tagline { font-size: 0.98rem; margin-top: 0.6rem; }
552
+ .badge-row { gap: 0.35rem; margin-top: 1rem; }
553
+ .badge { font-size: 0.72rem; padding: 0.3rem 0.7rem; }
554
+ .stat-strip { gap: 0.5rem; margin: 0.7rem 0 1.1rem 0; }
555
+ .stat-tile { padding: 0.75rem 0.4rem; }
556
+ .stat-num { font-size: 1.55rem; }
557
+ .stat-lbl { font-size: 0.62rem; letter-spacing: 0.04em; }
558
+ .stage { padding: 0.85rem 0.95rem; gap: 0.6rem; }
559
+ .stage-num { flex: 0 0 2rem; height: 2rem; font-size: 0.95rem; border-radius: 9px; }
560
+ .stage-title { font-size: 0.96rem; }
561
+ .stage-meta { font-size: 0.7rem; }
562
+ .stage-desc { font-size: 0.82rem; line-height: 1.45; }
563
+ .label-card { padding: 0.7rem 0.85rem; }
564
+ .label-name { font-size: 0.7rem; padding: 0.18rem 0.4rem; }
565
+ .label-desc { font-size: 0.78rem; }
566
+ .label-fix { font-size: 0.74rem; }
567
+ .demo-grid { gap: 0.6rem; }
568
+ .demo-card { padding: 0.45rem; }
569
+ .demo-prompt { font-size: 0.78rem; }
570
+ .demo-meta { font-size: 0.66rem; }
571
+ .incident { padding: 0.75rem 0.9rem; margin: 0.6rem 0; }
572
+ .incident-title { font-size: 0.96rem; }
573
+ .incident-body { font-size: 0.82rem; line-height: 1.5; }
574
+ .incident-fix { font-size: 0.78rem; }
575
+ .perf { padding: 0.55rem 0.7rem; grid-template-columns: 1fr 4rem; }
576
+ .perf-label { font-size: 0.78rem; }
577
+ .perf-val { font-size: 0.78rem; }
578
+ .chart-card { padding: 0.85rem 0.9rem; }
579
+ .chart-title { font-size: 0.94rem; }
580
+ .chart-sub { font-size: 0.74rem; }
581
+ .stack-bar { height: 22px; }
582
+ .stack-seg { font-size: 0.62rem; }
583
+ .stack-legend { font-size: 0.72rem; gap: 0.4rem; }
584
+ .footer { font-size: 0.78rem; padding: 1rem 0 0.3rem 0; }
585
+ /* let wide markdown tables and curl pre-blocks scroll horizontally */
586
+ .prose table, .markdown table { display: block; overflow-x: auto; -webkit-overflow-scrolling: touch; max-width: 100%; }
587
+ pre { overflow-x: auto; -webkit-overflow-scrolling: touch; font-size: 0.76rem; }
588
+ code { word-break: break-word; }
589
+ }
590
+ """
591
+
592
+
593
+ HERO_HTML = """
594
+ <div class="hero">
595
+ <h1 class="hero-title">StudioMI300</h1>
596
+ <div class="hero-tagline">
597
+ One prompt &nbsp;β†’&nbsp; 30-second cinematic reel.<br>
598
+ Director Agent + vision critic + image, video, music & voice models β€” all on a single AMD Instinct MI300X.
599
+ </div>
600
+ <div class="badge-row">
601
+ <span class="badge badge-amd">AMD MI300X Β· 192&nbsp;GB&nbsp;HBM3</span>
602
+ <span class="badge badge-rocm">ROCm 7.2 + AITER</span>
603
+ <span class="badge badge-license">Apache 2.0 / MIT</span>
604
+ <span class="badge badge-tag">amd-hackathon-2026</span>
605
+ </div>
606
+ </div>
607
+ """
608
+
609
+
610
+ STATS_HTML = """
611
+ <div class="stat-strip">
612
+ <div class="stat-tile"><div class="stat-num">1</div><div class="stat-lbl">MI300X GPU</div></div>
613
+ <div class="stat-tile"><div class="stat-num">6</div><div class="stat-lbl">Models orchestrated</div></div>
614
+ <div class="stat-tile"><div class="stat-num">2.5Γ—</div><div class="stat-lbl">Lossless speedup</div></div>
615
+ <div class="stat-tile"><div class="stat-num">9</div><div class="stat-lbl">VO languages</div></div>
616
+ </div>
617
+ """
618
+
619
+
620
+ PIPELINE_HTML = """
621
+ <div class="pipeline">
622
+
623
+ <div class="stage">
624
+ <div class="stage-num">1</div>
625
+ <div class="stage-body">
626
+ <div class="stage-title">Director Agent</div>
627
+ <div class="stage-meta">Qwen3.5-35B-A3B Β· vLLM Β· AITER MoE</div>
628
+ <p class="stage-desc">Plans 6 cinematic shots with character portraits, music brief, voice-over script and language tag. Same checkpoint doubles as the vision critic in stage 5.</p>
629
+ </div>
630
+ </div>
631
+
632
+ <div class="stage">
633
+ <div class="stage-num">2</div>
634
+ <div class="stage-body">
635
+ <div class="stage-title">Character Masters</div>
636
+ <div class="stage-meta">FLUX.2 [klein] 4B Β· 4-step distilled Β· ~0.4 s/master</div>
637
+ <p class="stage-desc">One canonical image per character + an ABC group composition. These pin identity for every downstream shot.</p>
638
+ </div>
639
+ </div>
640
+
641
+ <div class="stage">
642
+ <div class="stage-num">3</div>
643
+ <div class="stage-body">
644
+ <div class="stage-title">Per-shot Keyframes</div>
645
+ <div class="stage-meta">FLUX.2 [klein] 4B reference editing Β· ~0.6 s/shot</div>
646
+ <p class="stage-desc">Master image goes in as conditioning, shot prompt drives the edit. Identity is preserved by construction β€” no LoRA training, no per-character setup.</p>
647
+ </div>
648
+ </div>
649
+
650
+ <div class="stage">
651
+ <div class="stage-num">4</div>
652
+ <div class="stage-body">
653
+ <div class="stage-title">Animation</div>
654
+ <div class="stage-meta">Wan2.2-I2V-A14B Β· FBCache 0.05 Β· torch.compile</div>
655
+ <p class="stage-desc">Dual-expert MoE diffusion, 121 frames at 24 fps. ParaAttention FBCache 2Γ— lossless + selective torch.compile on transformer_2 (1.2Γ— compile win).</p>
656
+ </div>
657
+ </div>
658
+
659
+ <div class="stage">
660
+ <div class="stage-num">5</div>
661
+ <div class="stage-body">
662
+ <div class="stage-title">Vision Critic</div>
663
+ <div class="stage-meta">Qwen3.5-35B reload Β· 4 frames per clip Β· structured labels</div>
664
+ <p class="stage-desc">Grades each clip on character_match, scene_match, composition, artifact_free. Below 7/10 β†’ re-render with a bumped seed (max 3 attempts).</p>
665
+ </div>
666
+ </div>
667
+
668
+ <div class="stage">
669
+ <div class="stage-num">6</div>
670
+ <div class="stage-body">
671
+ <div class="stage-title">Music</div>
672
+ <div class="stage-meta">ACE-Step v1 3.5B Β· 27 steps Β· 30 s output</div>
673
+ <p class="stage-desc">Audio diffusion produces a 30-second instrumental matching the Director's brief (BPM, mood, instrumentation, no drums hint).</p>
674
+ </div>
675
+ </div>
676
+
677
+ <div class="stage">
678
+ <div class="stage-num">7</div>
679
+ <div class="stage-body">
680
+ <div class="stage-title">Voice-over</div>
681
+ <div class="stage-meta">Kokoro-82M Β· 9 languages Β· ~0.05Γ— RTF</div>
682
+ <p class="stage-desc">Director picks the language to match the setting (Tokyo→ja, Paris→fr, Mumbai→hi, ...). Script is written in that language, not translated.</p>
683
+ </div>
684
+ </div>
685
+
686
+ <div class="stage">
687
+ <div class="stage-num">8</div>
688
+ <div class="stage-body">
689
+ <div class="stage-title">Mix</div>
690
+ <div class="stage-meta">ffmpeg Β· concat + lanczos upscale + loudnorm</div>
691
+ <p class="stage-desc">Six clips concatenated, upscaled to 1280Γ—704, audio loudness-normalised, output is a single mp4.</p>
692
+ </div>
693
+ </div>
694
+
695
+ </div>
696
+ """
697
+
698
+
699
+ CRITIC_LABELS = [
700
+ ("STYLIZED_AI_LOOK", "plastic skin, oversaturation, 3D-render look", "bump anti-style negatives, tone keyframe saturation"),
701
+ ("CHARACTER_DRIFT", "named character's face shifts mid-clip", "repeat exact character description string, prefer FLF2V"),
702
+ ("EXTRAS_INVADE_FRAME", "unprompted extras crossing the main subjects", "add positive boundary sentence (\"no extras enter\")"),
703
+ ("CAMERA_IGNORED", "the prompted camera move never happens", "put camera verb FIRST, use only one camera move"),
704
+ ("OBJECT_MORPHING", "an object materially changes mid-clip", "describe material+color explicitly, 121 β†’ 97 frames"),
705
+ ("RANDOM_INTIMACY", "characters touch / hug / kiss without prompt", "add explicit \"they do not touch\" boundary"),
706
+ ("NEON_GLOW_LEAK", "neon spilling onto faces or unprompted surfaces", "localize light sources, \"no glow on faces\""),
707
+ ("WALKING_BACKWARDS", "subject walks the wrong direction", "specify direction explicitly (\"walks toward camera\")"),
708
+ ("HAND_FINGER_ARTIFACT","extra fingers, fused hands", "already in negative; reduce hand close-ups"),
709
+ ("WARDROBE_DRIFT", "clothing color or style changes mid-clip", "anchor wardrobe in the repeated character string"),
710
+ ]
711
+
712
+
713
+ def render_label_grid():
714
+ cards = []
715
+ for name, desc, fix in CRITIC_LABELS:
716
+ cards.append(
717
+ f'<div class="label-card">'
718
+ f'<span class="label-name">{name}</span>'
719
+ f'<div class="label-desc">{desc}</div>'
720
+ f'<div class="label-fix">β†’ {fix}</div>'
721
+ f'</div>'
722
+ )
723
+ return '<div class="label-grid">' + "".join(cards) + "</div>"
724
+
725
+
726
+ INCIDENTS = [
727
+ {
728
+ "date": "May 7 Β· reel_v5",
729
+ "title": "The headless violinist",
730
+ "body": (
731
+ "Wan2.2 invented a third violinist in the busker scene β€” without a head. "
732
+ "Compound clauses like \"busker plays violin nearby\" got read as a request "
733
+ "for an extra violin-holder, sometimes generated incomplete."
734
+ ),
735
+ "fix": "Added \"two heads, headless, extra people, ghost figures, duplicate character\" to the negative prompt. Hasn't recurred over 12 reels.",
736
+ },
737
+ {
738
+ "date": "May 7 Β· reel_v6",
739
+ "title": "Woman with violin",
740
+ "body": (
741
+ "The protagonist ended up holding a violin in shots 4–8 even though the prompt only said she walked past the busker. "
742
+ "Master keyframe baked \"near violin\" into the protagonist embedding because the master prompt mentioned the instrument as setting context."
743
+ ),
744
+ "fix": "Stripped instrument refs from master_prompt v2. Master shows protagonist alone in setting baseline; instrument context goes via per-shot prompts only.",
745
+ },
746
+ {
747
+ "date": "May 8 Β· qwen-tts",
748
+ "title": "The 4-shim TTS nightmare",
749
+ "body": (
750
+ "Tried Qwen3-TTS-12Hz-0.6B for voice-over. Hit four cascading issues: hard-pinned transformers 4.57.3 vs rest of stack β‰₯5.x, "
751
+ "a removed decorator API, a missing pad_token_id in config.json, and ROPE_INIT_FUNCTIONS dropped in transformers 5. "
752
+ "Even after writing all four shims, hit a deeper SDPA shape mismatch."
753
+ ),
754
+ "fix": "Gave up after 1.5 hours, switched to Kokoro-82M (Apache 2.0, standalone, no transformers dependency). Ships in 9 languages.",
755
+ },
756
+ {
757
+ "date": "May 9 Β· FP8 evaluation",
758
+ "title": "AITER FP8 segfault on cross-attention",
759
+ "body": (
760
+ "Evaluated two FP8 paths on Wan2.2: torch._scaled_mm raised HIPBLAS_STATUS_NOT_SUPPORTED on ROCm 7.0, "
761
+ "and aiter.gemm_a8w8 + gemm_a8w8_CK both segfaulted with \"Memory access fault by GPU node-1\" "
762
+ "on the cross-attention shape M=512, K=4096, N=5120. ROCm 7.2 closed the standalone shape, "
763
+ "but the same call inside the full Wan2.2 + FBCache + torch.compile pipeline still crashes (matches AITER#2187)."
764
+ ),
765
+ "fix": "Production ships on BF16 + FBCache + selective torch.compile (2.5Γ— lossless). aiter_linear.py and STUDIOMI_AITER_FP8 env-toggle stay in the repo for future experiments.",
766
+ },
767
+ {
768
+ "date": "May 9 Β· FBCache jitter",
769
+ "title": "Motion tearing at high cache thresholds",
770
+ "body": (
771
+ "FBCache threshold 0.12 looked fast but introduced visible jitter on fast camera pans, especially in B-roll wides. "
772
+ "Wan2.1 community had reported the same β€” at thresholds β‰₯0.09 you get tearing on motion."
773
+ ),
774
+ "fix": "Stepped down to 0.05. Slightly slower but lossless across the whole reel. The 0.05 / 0.08 / 0.12 sweep is in benchmarks/results.md.",
775
+ },
776
+ {
777
+ "date": "May 10 · Director→Wan2.2 OOM",
778
+ "title": "94 GB Wan2.2 won't fit if Qwen still resident",
779
+ "body": (
780
+ "After Director ran inference, vLLM left ~30 GB of allocator cache resident on top of its model weights. "
781
+ "Wan2.2 needs 94 GB to load β€” total exceeded 192 GB and the load OOMed."
782
+ ),
783
+ "fix": "Director runs in a separate Python subprocess so its full memory frees on exit. gpu_memory_utilization lowered to 0.70.",
784
+ },
785
+ {
786
+ "date": "May 10 Β· Multi-day caches survive",
787
+ "title": "Container migration was painless",
788
+ "body": (
789
+ "When the original AMD Developer Cloud droplet got decommissioned for credit overuse, the new droplet inherited "
790
+ "the same rocm/vllm-dev container image. The 247 GB HuggingFace cache survived intact via volume mount β€” "
791
+ "no re-download of Wan2.2, FLUX.2, Qwen3.5, ACE-Step or Kokoro."
792
+ ),
793
+ "fix": "ACE-Step's separate cache (/root/.cache/ace-step/checkpoints, 7.6 GB) had to be re-fetched + four pip deps re-installed. Bootstrap script now pre-warms both.",
794
+ },
795
+ ]
796
+
797
+
798
+ def render_incidents():
799
+ cards = []
800
+ for inc in INCIDENTS:
801
+ cards.append(
802
+ f'<div class="incident">'
803
+ f'<div class="incident-date">{inc["date"]}</div>'
804
+ f'<div class="incident-title">{inc["title"]}</div>'
805
+ f'<div class="incident-body">{inc["body"]}</div>'
806
+ f'<div class="incident-fix">βœ“ Fix: {inc["fix"]}</div>'
807
+ f'</div>'
808
+ )
809
+ return "".join(cards)
810
+
811
+
812
+ PERF_BARS = [
813
+ ("ParaAttention FBCache (threshold 0.05)", "2.00Γ—", 100),
814
+ ("torch.compile(transformer_2, mode=\"default\")", "1.20Γ—", 60),
815
+ ("ROCm env flags (hipBLASLt, expandable_segments, etc.)", "1.10Γ—", 55),
816
+ ("UniPC scheduler with flow_shift=12.0 for 480p", "1.05Γ—", 52),
817
+ ("AITER MoE for Qwen3.5-35B planner", "~1.30Γ— decode", 65),
818
+ ("FLUX.2 [klein] 4B vs FLUX.1-schnell on keyframes", "~15Γ— faster", 88),
819
+ ]
820
+
821
+
822
+ def render_perf_bars():
823
+ out = []
824
+ for label, val, fill_pct in PERF_BARS:
825
+ out.append(
826
+ f'<div class="perf">'
827
+ f'<div class="perf-label">{label}</div>'
828
+ f'<div class="perf-val">{val}</div>'
829
+ f'<div class="perf-bar"><div class="perf-fill" style="width:{fill_pct}%"></div></div>'
830
+ f'</div>'
831
+ )
832
+ return "".join(out)
833
+
834
+
835
+ # ── Wan2.2 cumulative speedup waterfall ───────────────────────────────────
836
+ SPEEDUP_WATERFALL = [
837
+ ("Baseline (BF16, no cache)", 25.9, 1.00, "warm"),
838
+ ("+ FBCache 0.12 (both experts)", 12.46, 2.08, ""),
839
+ ("+ flow_shift=5 + ROCm flags", 11.29, 2.30, ""),
840
+ ("+ torch.compile(transformer_2)", 10.36, 2.50, "cold"),
841
+ ]
842
+
843
+ def render_speedup_waterfall():
844
+ max_min = max(row[1] for row in SPEEDUP_WATERFALL)
845
+ rows = []
846
+ for label, mins, speedup, css_class in SPEEDUP_WATERFALL:
847
+ pct = (mins / max_min) * 100
848
+ cls = f"hbar-fill {css_class}".strip()
849
+ rows.append(
850
+ f'<div class="hbar-row">'
851
+ f'<div class="hbar-label">{label}</div>'
852
+ f'<div class="hbar-track"><div class="{cls}" style="width:{pct:.1f}%"></div></div>'
853
+ f'<div class="hbar-val">{mins:.1f} min Β· {speedup:.2f}Γ—</div>'
854
+ f'</div>'
855
+ )
856
+ return (
857
+ '<div class="chart-card">'
858
+ '<div class="chart-title">Wan2.2 720p cumulative speedup</div>'
859
+ '<div class="chart-sub">Each row stacks multiplicatively; lower bar = faster. Same prompt, same seed.</div>'
860
+ + "".join(rows) +
861
+ '</div>'
862
+ )
863
+
864
+
865
+ # ── VRAM peak per pipeline phase ──────────────────────────────────────────
866
+ VRAM_PHASES = [
867
+ ("Director Β· Qwen3.5-35B BF16", 70, "active"),
868
+ ("Klein 4B keyframes", 8, "idle"),
869
+ ("Wan2.2-I2V-A14B animation", 94, "active"),
870
+ ("Critic Β· Qwen3.5-35B vision", 70, "active"),
871
+ ("ACE-Step v1 music", 12, "idle"),
872
+ ("Kokoro-82M voice-over", 1, "idle"),
873
+ ]
874
+ HBM_TOTAL = 192
875
+
876
+ def render_vram_chart():
877
+ rows = []
878
+ for label, gb, mode in VRAM_PHASES:
879
+ pct = (gb / HBM_TOTAL) * 100
880
+ cls = "hbar-fill warm" if mode == "active" else "hbar-fill cold"
881
+ rows.append(
882
+ f'<div class="hbar-row">'
883
+ f'<div class="hbar-label">{label}</div>'
884
+ f'<div class="hbar-track"><div class="{cls}" style="width:{pct:.1f}%"></div></div>'
885
+ f'<div class="hbar-val">{gb} GB</div>'
886
+ f'</div>'
887
+ )
888
+ return (
889
+ '<div class="chart-card">'
890
+ '<div class="chart-title">VRAM peak per phase Β· 192 GB HBM3</div>'
891
+ f'<div class="chart-sub">Sequential, never concurrent. Wan2.2 hits {VRAM_PHASES[2][1]}/{HBM_TOTAL} GB ({VRAM_PHASES[2][1]/HBM_TOTAL*100:.0f}% of the card) at peak.</div>'
892
+ + "".join(rows) +
893
+ '</div>'
894
+ )
895
+
896
+
897
+ # ── End-to-end time breakdown for one reel (stacked bar) ──────────────────
898
+ TIME_SEGMENTS = [
899
+ # (label, minutes, color)
900
+ ("Director plan", 0.5, "#a78bfa"),
901
+ ("Masters + keyframes", 0.2, "#c4b5fd"),
902
+ ("Wan2.2 hero @ 30 stp", 8.5, "#f472b6"),
903
+ ("Wan2.2 5Γ— B-roll @ 24", 33.0, "#ec4899"),
904
+ ("Critic + retries", 5.0, "#fbbf24"),
905
+ ("Music + VO + mix", 2.0, "#6ee7b7"),
906
+ ]
907
+
908
+ def render_time_breakdown():
909
+ total = sum(s[1] for s in TIME_SEGMENTS)
910
+ segs, legend = [], []
911
+ for label, mins, color in TIME_SEGMENTS:
912
+ pct = (mins / total) * 100
913
+ text = f'{mins:.1f}m' if pct >= 7 else ""
914
+ segs.append(
915
+ f'<div class="stack-seg" style="width:{pct:.2f}%; background:{color};" title="{label} {mins:.1f} min">{text}</div>'
916
+ )
917
+ legend.append(
918
+ f'<span><span class="stack-dot" style="background:{color}"></span>{label} Β· {mins:.1f}m</span>'
919
+ )
920
+ return (
921
+ '<div class="chart-card">'
922
+ f'<div class="chart-title">Where the {total:.0f} minutes go</div>'
923
+ '<div class="chart-sub">Single 30-second reel, end-to-end on 1Γ— MI300X. Wan2.2 inference dominates.</div>'
924
+ f'<div class="stack-bar">{"".join(segs)}</div>'
925
+ f'<div class="stack-legend">{"".join(legend)}</div>'
926
+ '</div>'
927
+ )
928
+
929
+
930
+ # ── Critic pass-rate per attempt ──────────────────────────────────────────
931
+ PASS_RATE = [
932
+ ("Pass on attempt 1", 67, "#6ee7b7"),
933
+ ("Pass on attempt 2", 22, "#fde68a"),
934
+ ("Pass on attempt 3", 8, "#fb923c"),
935
+ ("Best-of accepted", 3, "#f87171"),
936
+ ]
937
+
938
+ def render_pass_rate():
939
+ segs, legend = [], []
940
+ for label, pct, color in PASS_RATE:
941
+ text = f'{pct}%' if pct >= 7 else ""
942
+ segs.append(
943
+ f'<div class="stack-seg" style="width:{pct}%; background:{color};" title="{label} {pct}%">{text}</div>'
944
+ )
945
+ legend.append(
946
+ f'<span><span class="stack-dot" style="background:{color}"></span>{label}</span>'
947
+ )
948
+ return (
949
+ '<div class="chart-card">'
950
+ '<div class="chart-title">Critic verdict distribution (rolling avg over recent reels)</div>'
951
+ '<div class="chart-sub">Two-thirds of clips pass first try. The retry loop salvages another ~30%; only 3% fall through to best-of-three.</div>'
952
+ f'<div class="stack-bar">{"".join(segs)}</div>'
953
+ f'<div class="stack-legend">{"".join(legend)}</div>'
954
+ '</div>'
955
+ )
956
+
957
+
958
+ SHOWCASE_PLACEHOLDER = """
959
+ <div class="placeholder">
960
+ <div class="placeholder-emoji">🎬</div>
961
+ <div class="placeholder-title">Reel rendering on the MI300X right now.</div>
962
+ <div class="placeholder-body">
963
+ Hot off the press: re-rendering the Tokyo Reunion reel through the new pipeline
964
+ (FLUX.2 [klein] 4B reference editing + Wan2.2 at 30 cinematic steps + vision critic).
965
+ Drops here as soon as it lands β€” ~50 minutes per reel on the droplet.<br><br>
966
+ The full code is on GitHub if you can't wait.
967
+ </div>
968
+ </div>
969
+ """
970
+
971
+
972
+ STORY_TAB_MD = r"""
973
+ ## How the Director thinks
974
+
975
+ The Director Agent (Qwen3.5-35B-A3B via vLLM) doesn't just write a description.
976
+ It returns a structured 6-shot plan with named characters, per-shot prompts
977
+ (written in Wan2.2-friendly language: camera verb first, sentence-case motion,
978
+ positive boundary phrases), a music brief, a per-shot voice-over array, and the
979
+ language to narrate in.
980
+
981
+ ```json
982
+ {
983
+ "characters": {
984
+ "A": "Aiko (slim Japanese woman, 27, jet-black chin-length bob, ...)",
985
+ "B": "Kenji (Japanese man, 28, tall and lean, ...)",
986
+ "C": "Mei (Japanese woman, 26, shoulder-length lavender hair, ...)"
987
+ },
988
+ "story_logline": "Aiko walks alone through neon-lit Tokyo and reunites with two friends",
989
+ "shots": [
990
+ {
991
+ "index": 0, "is_hero": true, "shot_type": "Wide tracking",
992
+ "dominant_subject": "A", "cut": true,
993
+ "prompt": "Tracking shot following from behind at hip level. Aiko (slim Japanese woman, 27, jet-black bob, mustard yellow vinyl raincoat) walks down the center of the wet street, head turning slightly. Distant pedestrians stay blurred. Light rain falls steadily, neon signs flicker. shot on Arri Alexa, anamorphic, 35mm film grain, photorealistic"
994
+ },
995
+ "... 5 more shots ..."
996
+ ],
997
+ "music_style": "intimate ambient piano with warm pad and soft synth bell, 75 BPM, melancholic but hopeful, no drums",
998
+ "vo_script_per_shot": [
999
+ "She had been walking alone for too long.",
1000
+ "Tonight, the city felt softer.",
1001
+ "Two figures waited under an awning.",
1002
+ "She broke into a quick walk.",
1003
+ "Their arms found hers.",
1004
+ "Some places only feel like home because of who is standing in them."
1005
+ ],
1006
+ "vo_lang": "j"
1007
+ }
1008
+ ```
1009
+
1010
+ The exact same character description string repeats verbatim in every shot
1011
+ that character appears in. Token-level consistency is character-LoRA-without-LoRA-training.
1012
+
1013
+ ### Six-shot story arc template
1014
+
1015
+ | Shot | Role | Cut |
1016
+ |---|---|---|
1017
+ | 0 | Hero wide establishing - all main characters visible | true |
1018
+ | 1 | Setup - protagonist's intent or POV moves the story forward | false |
1019
+ | 2 | Other element - secondary character solo or detail insert | true if scene changes |
1020
+ | 3 | Climax - two-character moment or A-with-OBJECT | false |
1021
+ | 4 | Static medium close-up - face anchor, reduces drift accumulation | false |
1022
+ | 5 | Closing wide - scene fades or A walks away | false or true |
1023
+
1024
+ ### Voice-over languages (Kokoro-82M)
1025
+
1026
+ Director picks the language that matches the setting. Tokyo scene -> Japanese,
1027
+ Paris -> French, Mumbai -> Hindi, Rio -> Brazilian Portuguese, anywhere else -> American English.
1028
+
1029
+ | Code | Language | Default voice |
1030
+ |---|---|---|
1031
+ | `a` | American English | af_heart |
1032
+ | `b` | British English | bf_emma |
1033
+ | `e` | Spanish | ef_dora |
1034
+ | `f` | French | ff_siwis |
1035
+ | `h` | Hindi | hf_alpha |
1036
+ | `i` | Italian | if_sara |
1037
+ | `j` | Japanese | jf_alpha |
1038
+ | `p` | Brazilian Portuguese | pf_dora |
1039
+ | `z` | Mandarin Chinese | zf_xiaobei |
1040
+
1041
+ The `vo_script_per_shot` array is one line per shot, 6-10 words each (~3-4 seconds
1042
+ of TTS at 150 wpm). Each Kokoro WAV gets layered onto the music bed at
1043
+ `i * 5.04 s` offset via ffmpeg `adelay`, so the narration lands when the
1044
+ visual beat lands - no description before or after the action.
1045
+ """
1046
+
1047
+
1048
+ API_TAB_MD = r"""
1049
+ ## Live API server
1050
+
1051
+ The pipeline ships as a FastAPI server with an asyncio.Lock backing a strict-FIFO
1052
+ single-GPU queue. SSE event stream + per-artifact endpoints let a frontend
1053
+ render the pipeline phases as they happen, instead of waiting 45 minutes for one mp4.
1054
+
1055
+ ```bash
1056
+ # on your MI300X droplet
1057
+ STUDIO_API_TOKEN=secret uvicorn server:app --host 0.0.0.0 --port 8000
1058
+ ```
1059
+
1060
+ ### Submit a job
1061
+
1062
+ ```bash
1063
+ curl -X POST https://your-droplet:8000/jobs \
1064
+ -H "X-API-Token: secret" \
1065
+ -H "Content-Type: application/json" \
1066
+ -d '{"prompt": "30s reel: a violinist plays in a Brooklyn subway station at midnight, golden hour light through the platform windows", "use_critic": true}'
1067
+ # -> {"job_id": "a3f9c1d2b6e8", "status": "queued"}
1068
+ ```
1069
+
1070
+ ### Watch it happen
1071
+
1072
+ ```bash
1073
+ curl -N https://your-droplet:8000/jobs/a3f9c1d2b6e8/stream
1074
+ # (SSE stream)
1075
+
1076
+ data: {"stage":"started","ts":1778425000.1,"prompt":"30s reel: ..."}
1077
+ data: {"stage":"plan_starting","ts":1778425000.5}
1078
+ data: {"stage":"plan_ready","ts":1778425245.3,"logline":"...","n_shots":6,"characters":["A"],"music_style":"...","shots":[{...}]}
1079
+ data: {"stage":"master_ready","ts":1778425248.1,"name":"A","path":"...master_A.png","seconds":7.8}
1080
+ data: {"stage":"keyframe_ready","ts":1778425250.0,"shot":0,"path":"...keyframe_00.png"}
1081
+ data: {"stage":"clip_started","ts":1778425251.2,"shot":0,"attempt":1,"flow_shift":5.0,"n_steps":30,"flf2v":true}
1082
+ data: {"stage":"clip_rendered","ts":1778425759.6,"shot":0,"path":"...clip_00.mp4","minutes":8.47}
1083
+ data: {"stage":"critic_starting","ts":1778425760.1,"shot":0,"frames":[...]}
1084
+ data: {"stage":"critic_verdict","ts":1778425853.4,"shot":0,"score":{"character_match":8,"scene_match":9,"composition":9,"artifact_free":7,"issues":["STYLIZED_AI_LOOK: ..."],"overall":8}}
1085
+ data: {"stage":"clip_passed","ts":1778425881.0,"shot":0,"attempts":1,"score":{...}}
1086
+ data: {"stage":"music_starting","ts":1778428100.0,"style":"..."}
1087
+ data: {"stage":"music_ready","ts":1778428170.4,"path":"...music.wav"}
1088
+ data: {"stage":"vo_chunk_ready","ts":1778428172.1,"shot":0,"path":"...vo_00.wav","seconds":3.4,"text":"..."}
1089
+ data: {"stage":"mix_done","ts":1778428180.0,"path":"...reel_final.mp4"}
1090
+ data: {"stage":"completed","ts":1778428180.5,"final":"...reel_final.mp4"}
1091
+ ```
1092
+
1093
+ ### Per-artifact endpoints
1094
+
1095
+ While the job runs, fetch any artifact that's already on disk:
1096
+
1097
+ | Endpoint | Returns |
1098
+ |---|---|
1099
+ | `GET /jobs/{id}` | full status meta with latest event |
1100
+ | `GET /jobs/{id}/events` | full jsonl event history |
1101
+ | `GET /jobs/{id}/plan` | director's plan_expanded.json |
1102
+ | `GET /jobs/{id}/master/{A,B,C,ABC,scene}` | a master keyframe png |
1103
+ | `GET /jobs/{id}/keyframe/{0..5}` | a per-shot keyframe png |
1104
+ | `GET /jobs/{id}/clip/{0..5}` | a per-shot mp4 (silent, 5 sec) |
1105
+ | `GET /jobs/{id}/music` | the 30-second music wav |
1106
+ | `GET /jobs/{id}/vo/{0..5}` | a per-shot voice-over wav |
1107
+ | `GET /jobs/{id}/video` | final mixed reel mp4 (404 while running) |
1108
+
1109
+ `GET /jobs` returns the most recent 50 jobs. `GET /health` is auth-free for status.
1110
+
1111
+ ### Python client snippet
1112
+
1113
+ ```python
1114
+ import requests, sseclient
1115
+
1116
+ API = "https://your-droplet:8000"
1117
+ H = {"X-API-Token": "secret"}
1118
+
1119
+ job = requests.post(f"{API}/jobs", headers=H, json={
1120
+ "prompt": "30s reel: a cellist on a Brooklyn fire escape at sunset",
1121
+ "use_critic": True,
1122
+ }).json()
1123
+
1124
+ resp = requests.get(f"{API}/jobs/{job['job_id']}/stream", headers=H, stream=True)
1125
+ for ev in sseclient.SSEClient(resp).events():
1126
+ print(ev.data)
1127
+ ```
1128
+
1129
+ ### Multi-GPU routing
1130
+
1131
+ Each pipeline stage can pin to its own device via env vars (defaults to `cuda:0`):
1132
+
1133
+ ```bash
1134
+ STUDIOMI_GPU_FLUX=cuda:1 \
1135
+ STUDIOMI_GPU_WAN=cuda:0 \
1136
+ STUDIOMI_GPU_ACE=cuda:1 \
1137
+ STUDIOMI_GPU_TTS=cuda:1 \
1138
+ uvicorn server:app --host 0.0.0.0 --port 8000
1139
+ ```
1140
+
1141
+ On 2x MI300X you can render the next reel's plan on card 1 while card 0 still
1142
+ animates the current reel. Tested on a single-MI300X rig - 2-card setup is
1143
+ designed but not yet validated.
1144
+ """
1145
+
1146
+
1147
+ PRESET_TABLE_MD = r"""
1148
+ ### Knob presets (config.py)
1149
+
1150
+ | preset | num_frames | fps | hero / b-roll steps | FBCache | critic | est. minutes for 30s reel |
1151
+ |---|---|---|---|---|---|---|
1152
+ | **default** | 121 | 24 | 30 / 24 | 0.05 (lossless) | 7/10, 3 attempts | ~50-65 |
1153
+ | **cinematic** | 121 | 24 | 30 / 24 | 0.05 | 7/10, 3 attempts | ~50-65 |
1154
+ | **fast** | 97 | 24 | 20 / 18 | 0.08 | 6/10, 2 attempts | ~32-40 |
1155
+ | **draft** | 81 | 24 | 14 / 14 | 0.10 | 5/10, 1 attempt | ~22-28 |
1156
+
1157
+ `STUDIOMI_AITER_FP8=1` is a separate env switch; documented but disabled by
1158
+ default until ROCm/aiter#2187 closes for the multi-shape Wan2.2 case.
1159
+ """
1160
+
1161
+
1162
+ REAL_VERDICTS_MD = r"""
1163
+ ### Real verdicts pulled from the run logs
1164
+
1165
+ These are actual JSON returns from Qwen3.5-35B critiquing real Wan2.2 clips
1166
+ on this pipeline. The labels feed back into the planner's retry strategy.
1167
+
1168
+ ```json
1169
+ { "shot": 0, "attempt": 1, "score": {
1170
+ "character_match": 9, "scene_match": 8, "composition": 9, "artifact_free": 7,
1171
+ "issues": ["STYLIZED_AI_LOOK: skin texture appears slightly plastic/smooth in close-up frames 1-2",
1172
+ "OBJECT_MORPHING: background bridge structure shifts from Golden Gate to a generic suspension bridge mid-clip"],
1173
+ "overall": 8 }}
1174
+ ```
1175
+
1176
+ ```json
1177
+ { "shot": 2, "attempt": 1, "score": {
1178
+ "character_match": 10, "scene_match": 10, "composition": 10, "artifact_free": 9,
1179
+ "issues": [],
1180
+ "overall": 10 }}
1181
+ ```
1182
+
1183
+ ```json
1184
+ { "shot": 3, "attempt": 2, "score": {
1185
+ "character_match": 4, "scene_match": 3, "composition": 2, "artifact_free": 5,
1186
+ "issues": ["CHARACTER_DRIFT: Subject identity changes completely in final frame from long-haired woman in trench coat to bob cut and turtleneck",
1187
+ "SCENE_MISMATCH: Golden Gate Bridge vanishes in Frame 3, replaced by generic city street",
1188
+ "CAMERA_IGNORED: Prompt requested 'static camera' but subject rotates 180 degrees and camera zooms",
1189
+ "STYLIZED_AI_LOOK: Frame 4 plastic skin texture and oversaturated bokeh"],
1190
+ "overall": 3 }}
1191
+ ```
1192
+
1193
+ The 10/10 was the awning two-shot of Kenji + Mei in v22 - identity locked,
1194
+ no extras, lighting matches, no `STYLIZED_AI_LOOK` even at this resolution.
1195
+ The 3/10 was the Golden Gate Bridge overlook - Wan2.2 can't reliably render
1196
+ that landmark, drifts to generic suspension bridges. After 3 attempts the
1197
+ pipeline ships the best one and logs the issues.
1198
+ """
1199
+
1200
+
1201
+ STACK_AND_GPU_MD = """
1202
+ ## The stack β€” every model is permissively licensed
1203
+
1204
+ Every output is yours to use commercially.
1205
+
1206
+ | Stage | Model | Size | License |
1207
+ |---|---|---|---|
1208
+ | Planner & Critic | **Qwen3.5-35B-A3B** | 35B params (3B active) | Apache 2.0 |
1209
+ | Image (keyframes) | **FLUX.2 [klein] 4B** | 4B params | Apache 2.0 |
1210
+ | Video | **Wan2.2-I2V-A14B** | A14B (dual-expert MoE) | Apache 2.0 |
1211
+ | Music | **ACE-Step v1** | 3.5B params | Apache 2.0 |
1212
+ | Voice-over | **Kokoro-82M** | 82M, 9 languages | Apache 2.0 |
1213
+ | LLM serving | **vLLM** | β€” | Apache 2.0 |
1214
+ | Diffusion cache | **ParaAttention FBCache** | β€” | Apache 2.0 |
1215
+ | AMD kernels | **AITER** | β€” | MIT |
1216
+ | Project code | **StudioMI300** | β€” | MIT |
1217
+
1218
+ ## Why a single MI300X
1219
+
1220
+ 192 GB HBM3 is overkill for any single model in this stack. The point is
1221
+ **sequential diversity** β€” the same card runs four very different model
1222
+ architectures back-to-back in one reel, with no offload to disk in between.
1223
+
1224
+ | Phase | VRAM peak | Compute pattern |
1225
+ |---|---|---|
1226
+ | 1. Director planning | ~70 GB BF16 | Qwen3.5-35B MoE LLM decode (vLLM + AITER MoE) |
1227
+ | 2. Character masters | ~8 GB | FLUX.2 [klein] 4B diffusion transformer, 4 steps |
1228
+ | 3. Wan2.2 animation | ~94 GB BF16 | Dual-expert MoE diffusion, 121 frames |
1229
+ | 4. Vision critic | ~70 GB BF16 | Qwen3.5-35B re-loaded, vision-conditioned |
1230
+ | 5. Music | ~12 GB | ACE-Step v1 audio diffusion, 27 steps |
1231
+ | 6. Voice-over | < 1 GB | Kokoro-82M TTS, fits anywhere |
1232
+
1233
+ The ROCm allocator caches ~30 GB on top of any active model. With careful unload
1234
+ and `torch.cuda.empty_cache()` between stages, all phases fit on the same 192 GB
1235
+ card. On a 24 GB consumer GPU you'd need 4–5 separate machines wired together
1236
+ just to host all of this.
1237
+
1238
+ That's the project's central constraint and its main flex on AMD's headline GPU.
1239
+ """
1240
+
1241
+
1242
+ def build_ui():
1243
+ with gr.Blocks(
1244
+ theme=gr.themes.Base(primary_hue="violet", secondary_hue="pink",
1245
+ neutral_hue="slate"),
1246
+ css=CUSTOM_CSS,
1247
+ title="StudioMI300",
1248
+ ) as demo:
1249
+ gr.HTML(HERO_HTML)
1250
+ gr.HTML(STATS_HTML)
1251
+
1252
+ with gr.Tabs():
1253
+
1254
+ with gr.Tab("Live demo"):
1255
+ gr.Markdown(
1256
+ "## Live demo paused β€” hackathon ended\n\n"
1257
+ "The AMD x lablab hackathon has wrapped, so the on-demand MI300X "
1258
+ "demo is paused. Every clip in the archive below was generated "
1259
+ "end-to-end on a single AMD Instinct MI300X during the event "
1260
+ "(FLUX.2 [klein] 4B keyframe + Wan2.2-I2V-A14B at 81 frames / 16 fps, "
1261
+ "FBCache 0.08, ~6 minutes per clip).\n\n"
1262
+ "> **Not Sora. Not Runway. Not Veo.** Every frame here was made by "
1263
+ "models you can download, weights you can self-host, and code you "
1264
+ "can fork. No paywall, no waitlist, no usage cap. See the "
1265
+ "**vs Sora & Runway** tab for the full breakdown."
1266
+ )
1267
+ gr.Markdown("### Generations archive")
1268
+ demo_gallery = gr.HTML(
1269
+ value=render_demo_grid(fetch_demos()),
1270
+ )
1271
+ demo_refresh = gr.Button("Refresh archive", size="sm")
1272
+ demo_refresh.click(refresh_gallery, outputs=[demo_gallery])
1273
+
1274
+ with gr.Tab("vs Sora & Runway"):
1275
+ gr.Markdown(
1276
+ "## Why this is not another frontier-model clone\n\n"
1277
+ "Sora, Runway Gen-3, Google Veo, Kling, Pika β€” all closed weights, "
1278
+ "all hosted-only, all paid. They produce beautiful clips, and they "
1279
+ "leave you with **zero leverage**: you can't fork them, can't host "
1280
+ "them on your own GPU, can't see their critic logic, can't sell the "
1281
+ "output under terms you control, can't extend the pipeline for a "
1282
+ "new use case without their permission.\n\n"
1283
+ "StudioMI300 is the opposite stack β€” built so that the work this "
1284
+ "project produces is **owned by the person who runs it**, not "
1285
+ "rented from a vendor.\n\n"
1286
+ "### Side by side\n\n"
1287
+ "| Dimension | Sora Β· Runway Β· Veo Β· Kling Β· Pika | **StudioMI300** |\n"
1288
+ "|---|---|---|\n"
1289
+ "| Weights | Closed, vendor-only | **Apache 2.0 / MIT β€” every model** |\n"
1290
+ "| Output license | Vendor ToS, often non-commercial | **Commercial use, no royalties** |\n"
1291
+ "| Where it runs | Vendor cloud only | **Any MI300X / any ROCm host** |\n"
1292
+ "| Pipeline | Black-box single model | **8 stages, every artifact extractable** |\n"
1293
+ "| Story planning | Hidden inside the model | **Director Agent emits a JSON plan** |\n"
1294
+ "| Quality control | None β€” render once, hope | **Vision critic with 10 failure labels, auto-retry** |\n"
1295
+ "| Music | Vendor-locked or stock licensing | **ACE-Step v1, open weights, royalty-free** |\n"
1296
+ "| Narration | Not included | **Kokoro-82M, 9 languages, per-shot timing** |\n"
1297
+ "| Cost per 30s reel | $0.50 – $4 per render, per attempt | **One GPU-hour, fully amortizable** |\n"
1298
+ "| Audit & reproducibility | None | **Full plan.json + every keyframe + every clip + critic verdicts saved** |\n"
1299
+ "| Vendor lock-in | Total | **None β€” fork and ship** |\n\n"
1300
+ "### What the open stack uniquely gives you\n\n"
1301
+ "**1. The Director's plan is inspectable.** Sora returns an mp4. "
1302
+ "StudioMI300 returns the mp4 *plus* the 6-shot plan, the character "
1303
+ "bibles, the music brief, and the per-shot voice-over script β€” as "
1304
+ "structured JSON. Producers can edit the plan and re-render only "
1305
+ "the shots they changed. Try doing that on Runway.\n\n"
1306
+ "**2. The vision critic is explainable.** Every clip carries the "
1307
+ "critic's verdict β€” *character drift*, *extras invade frame*, "
1308
+ "*walking backwards*, etc. β€” with the retry strategy that fixed it. "
1309
+ "Sora gives you a frame; this gives you a paper trail.\n\n"
1310
+ "**3. Identity without LoRA training.** FLUX.2 [klein] reference "
1311
+ "editing pins identity by construction β€” no per-character training "
1312
+ "step, no dataset prep, no 30-minute fine-tune wait. Sora has no "
1313
+ "concept of a *named* character across shots; here it's first-class.\n\n"
1314
+ "**4. Locale-aware narration.** Director picks the narration "
1315
+ "language to match the setting β€” Tokyo β†’ Japanese, Paris β†’ French, "
1316
+ "Mumbai β†’ Hindi. Sora narrates in nothing.\n\n"
1317
+ "**5. Sequential single-GPU orchestration.** A 35B-MoE director, a "
1318
+ "4B diffusion model, a 14B I2V model, a 3.5B music model, and a TTS "
1319
+ "share one MI300X by loading sequentially. This is the part that "
1320
+ "*only* works because of 192 GB HBM3 β€” and the part that frontier "
1321
+ "vendors never have to expose, because their cost structure is "
1322
+ "subsidized by a closed API.\n\n"
1323
+ "### What it deliberately does *not* try to do\n\n"
1324
+ "Frontier models invest billions of training-compute into raw "
1325
+ "photoreal fidelity. StudioMI300 doesn't chase that β€” it composes "
1326
+ "the best open weights available *right now* into a pipeline that "
1327
+ "delivers the **entire creative artifact** (story, characters, "
1328
+ "shots, music, voice, mix) instead of a single isolated clip. The "
1329
+ "bet: an open, transparent, end-to-end pipeline that ships every "
1330
+ "month with the latest open weights will outpace any closed vendor "
1331
+ "on the dimensions that actually matter to a producer β€” control, "
1332
+ "auditability, ownership, and cost.\n\n"
1333
+ "Frontier models give you a clip. This gives you a studio."
1334
+ )
1335
+
1336
+ with gr.Tab("Showcase"):
1337
+ gr.Markdown(
1338
+ "### Pre-rendered reels from the live pipeline\n"
1339
+ "Each reel is an actual `mp4` produced end-to-end by the pipeline on "
1340
+ "the MI300X droplet β€” one prompt in, finished reel out. No human "
1341
+ "selected or trimmed shots. The vision critic ran on every clip."
1342
+ )
1343
+
1344
+ if SHOWCASE_REELS:
1345
+ for reel in SHOWCASE_REELS:
1346
+ with gr.Row():
1347
+ with gr.Column(scale=3):
1348
+ video_path = SHOWCASE_DIR / reel["video"]
1349
+ if video_path.exists():
1350
+ gr.Video(
1351
+ value=str(video_path),
1352
+ label=reel["title"],
1353
+ autoplay=False,
1354
+ loop=True,
1355
+ )
1356
+ with gr.Column(scale=2):
1357
+ gr.Markdown(f"### {reel['title']}")
1358
+ gr.Markdown(f"**Logline.** {reel['logline']}")
1359
+ gr.Markdown(f"**Prompt.**\n```\n{reel['prompt']}\n```")
1360
+ gr.Markdown(f"**Music.** {reel['music_style']}")
1361
+ gr.Markdown(f"**Voice-over.** {reel['vo_lang']}")
1362
+ gr.Markdown(
1363
+ f"**Render time.** {reel['render_time_min']} min "
1364
+ f"on 1Γ— MI300X"
1365
+ )
1366
+ else:
1367
+ gr.HTML(SHOWCASE_PLACEHOLDER)
1368
+
1369
+ with gr.Tab("How it works"):
1370
+ gr.Markdown(
1371
+ "## The pipeline\n"
1372
+ "Eight stages run **sequentially on one GPU**. Each model loads, "
1373
+ "runs, unloads β€” making room for the next. No multi-GPU magic, "
1374
+ "no separate inference servers, no LoRA training step."
1375
+ )
1376
+ gr.HTML(PIPELINE_HTML)
1377
+
1378
+ gr.Markdown(
1379
+ "### Why **research-driven** prompts?\n\n"
1380
+ "The Director's planner and the vision critic system prompts aren't "
1381
+ "folklore. They distill 16 sources (Alibaba's official Wan2.2 system "
1382
+ "prompts, the official prompt rewriter, ComfyUI community guides, "
1383
+ "InstaSD's controlled camera tests, HuggingFace Forums) into hard rules:\n\n"
1384
+ "- **Verbatim Chinese trained negative** from `shared_config.py` β€” umT5 "
1385
+ "was multilingual-pretrained against those exact tokens; the English "
1386
+ "translation is observably weaker.\n"
1387
+ "- **Positive boundary sentences** instead of *\"EXACTLY N people\"* β€” "
1388
+ "umT5 doesn't ground numerics; Wan2.2 distorts the crowd trying to "
1389
+ "enforce a count.\n"
1390
+ "- **Lens / film tags** (`Arri Alexa, anamorphic, 35mm film grain`) "
1391
+ "instead of `cinematic` β€” that word triggers Wan2.2's stylization "
1392
+ "branch and gives the AI look.\n"
1393
+ "- **Sentence-case motion verbs** described as a *process*, not "
1394
+ "ALL-CAPS shouting. The all-caps trick is community folklore with no "
1395
+ "documented support; Alibaba's own examples use lowercase.\n"
1396
+ "- **One camera verb per shot, placed first** β€” multiple verbs in one "
1397
+ "sentence (\"dolly in tracking tilt up\") cancel each other out.\n\n"
1398
+ "Full research write-up lives in the GitHub repo "
1399
+ "(`research/wan22_prompting.md`)."
1400
+ )
1401
+
1402
+ with gr.Tab("Vision Critic"):
1403
+ gr.Markdown(
1404
+ "## The self-correcting render loop\n\n"
1405
+ "Most generative video pipelines render once and pray. This one "
1406
+ "re-checks every clip with a 35-billion-parameter vision model, "
1407
+ "scores it on four 1–10 axes, and re-renders if it fails. The same "
1408
+ "Qwen3.5-35B that planned the story now grades it.\n\n"
1409
+ "The critic returns four scores (`character_match`, `scene_match`, "
1410
+ "`composition`, `artifact_free`) plus a list of **structured failure "
1411
+ "labels**. The labels are machine-readable and feed back into the "
1412
+ "planner's retry strategy:"
1413
+ )
1414
+ gr.HTML(render_label_grid())
1415
+ gr.Markdown(
1416
+ "Up to three attempts per shot. After that, the best-scoring "
1417
+ "attempt ships and the issue list goes into the run log. The "
1418
+ "pipeline is self-correcting, not blind."
1419
+ )
1420
+ gr.Markdown(REAL_VERDICTS_MD)
1421
+
1422
+ with gr.Tab("Performance"):
1423
+ gr.Markdown(
1424
+ "## Acceleration on AMD MI300X\n\n"
1425
+ "Cumulative end-to-end speedup: **2.5Γ— lossless** vs unoptimised "
1426
+ "Wan2.2 β€” 25.9 min β†’ 10.4 min per 720p clip."
1427
+ )
1428
+
1429
+ gr.HTML(render_speedup_waterfall())
1430
+ gr.HTML(render_vram_chart())
1431
+ gr.HTML(render_time_breakdown())
1432
+ gr.HTML(render_pass_rate())
1433
+
1434
+ gr.Markdown("### Per-knob multiplier breakdown")
1435
+ gr.HTML(render_perf_bars())
1436
+ gr.Markdown(PRESET_TABLE_MD)
1437
+ gr.Markdown(
1438
+ "### What didn't work (and why)\n"
1439
+ "| Tried | Result | Reason |\n"
1440
+ "|---|---|---|\n"
1441
+ "| MagCache via diffusers 0.38 hooks | dead, calibration empty | dual-transformer step counting confuses `_perform_calibration_step` |\n"
1442
+ "| cache-dit DBCache + TaylorSeer | 22.87 min (slower than baseline) | TaylorSeer adds ~6 min on ROCm; cache-dit's L20 numbers don't reproduce |\n"
1443
+ "| AITER FA3 `set_attention_backend(\"flash\")` | hung 9+ min at step 0 | JIT compile for 81Γ—1280Γ—704 sequence never finishes |\n"
1444
+ "| `guidance_scale_2=1.0` (skip CFG on low-noise) | 10.35 vs 10.36 min | diffusers `WanPipeline` doesn't actually short-circuit at boundary |\n"
1445
+ "| `torch.compile(mode=\"max-autotune\", fullgraph=True)` | crash | Dynamo error on Wan2.2 (diffusers#12728) |\n"
1446
+ "| `to(memory_format=torch.channels_last)` on transformer_2 | RuntimeError | Wan2.2 transformer is rank-5 (B,C,F,H,W); channels_last is rank-4 only |\n"
1447
+ "| AITER FP8 (`gemm_a8w8`, `gemm_a8w8_CK`) | segfault mid-pipeline | AITER#2187 multi-shape crash; standalone shape works on ROCm 7.2, pipeline composition does not |"
1448
+ )
1449
+
1450
+ with gr.Tab("Incidents"):
1451
+ gr.Markdown(
1452
+ "## Field journal\n\n"
1453
+ "A subset of failures, root causes and fixes from May 6–10, 2026. "
1454
+ "These are the stories that don't show up in commit messages β€” the "
1455
+ "ones where the Wan2.2 prompt did something genuinely surprising, "
1456
+ "or where a kernel decided to disagree with the docs."
1457
+ )
1458
+ gr.HTML(render_incidents())
1459
+ gr.Markdown(
1460
+ "Full incident log is in `incidents.md` in the GitHub repo."
1461
+ )
1462
+
1463
+ with gr.Tab("Story & languages"):
1464
+ gr.Markdown(STORY_TAB_MD)
1465
+
1466
+ with gr.Tab("Live API"):
1467
+ gr.Markdown(API_TAB_MD)
1468
+
1469
+ with gr.Tab("Stack & GPU"):
1470
+ gr.Markdown(STACK_AND_GPU_MD)
1471
+
1472
+ with gr.Tab("Self-host"):
1473
+ gr.Markdown(
1474
+ "## Run it on your own MI300X\n\n"
1475
+ "A 30-second reel takes ~45 minutes on one MI300X. That's too long "
1476
+ "for a casual visitor on a public Space, so this Space hosts only "
1477
+ "the showcase. To run the full pipeline yourself:\n\n"
1478
+ "1. Get an AMD MI300X (e.g. AMD Developer Cloud β€” $100 starting "
1479
+ "credits via the AMD AI Developer Program).\n"
1480
+ "2. Pull the `rocm/vllm-dev` container.\n"
1481
+ "3. Clone the repo and run:\n\n"
1482
+ "```bash\n"
1483
+ "python generate.py \\\n"
1484
+ " --prompt \"a cellist plays in a Brooklyn subway station at midnight\" \\\n"
1485
+ " --out outputs/my_reel \\\n"
1486
+ " --critic\n"
1487
+ "```\n\n"
1488
+ "Walk away for ~45 minutes. The pipeline plans, paints, animates, "
1489
+ "scores music, narrates and mixes β€” all autonomously. No prompt "
1490
+ "engineering per shot, no model swapping, no manual stitching.\n\n"
1491
+ f"### β†’ [Full code on GitHub]({GITHUB_URL})"
1492
+ )
1493
+
1494
+ gr.HTML(
1495
+ f'<div class="footer">Built solo for the <b>AMD Developer Hackathon 2026</b>'
1496
+ f' on a single AMD Instinct MI300X. Apache 2.0 / MIT all the way down. '
1497
+ f'<a href="{GITHUB_URL}">GitHub</a> Β· '
1498
+ f'<code>{HACKATHON_BADGE}</code></div>'
1499
+ )
1500
+
1501
+ return demo
1502
+
1503
+
1504
+ if __name__ == "__main__":
1505
+ demo = build_ui()
1506
+ demo.queue(default_concurrency_limit=1, max_size=8).launch(
1507
+ server_name="0.0.0.0", server_port=7860, share=False,
1508
+ allowed_paths=[str(DEMO_CACHE)],
1509
+ )
requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ gradio>=5.29.0
2
+ requests>=2.31
showcase/sf_walk.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95b8ad0412df55ec408cb53281cb58d2fbe89ae2ba1200cecce2114a87d7acbe
3
+ size 8304272
thumbnail.png ADDED

Git LFS Details

  • SHA256: 09ce43ce3b74a6d7c65ef5671da767d0197f144292a9712cad4ec717cbc646af
  • Pointer size: 131 Bytes
  • Size of remote file: 849 kB