| --- |
| title: StudioMI300 |
| emoji: π¬ |
| colorFrom: indigo |
| colorTo: pink |
| sdk: gradio |
| sdk_version: 5.29.0 |
| app_file: app.py |
| pinned: true |
| license: mit |
| short_description: One prompt β 30s cinematic reel on a single AMD MI300X |
| thumbnail: thumbnail.png |
| tags: |
| - amd |
| - amd-hackathon-2026 |
| - mi300x |
| - rocm |
| - video-generation |
| - wan2.2 |
| - flux |
| - qwen |
| - text-to-video |
| - text-to-film |
| - cinematic |
| - gradio |
| --- |
| |
| # StudioMI300 |
|
|
| **One prompt β 30-second cinematic reel.** Built for the AMD Developer Hackathon 2026 |
| on a single AMD Instinct MI300X (192 GB HBM3, ROCm 7.2). |
|
|
| ## What it does |
|
|
| You write one sentence. The pipeline plans a six-shot story, paints character |
| keyframes, animates them, scores the music, narrates the voice-over, and stitches |
| everything into a 30-second `mp4`. No setup. No LoRA training. No per-shot prompting. |
|
|
| ``` |
| "A young woman walks through neon-lit Tokyo at night and meets two friends." |
| β |
| [ ~45 minutes on a single MI300X ] |
| β |
| 30s cinematic reel.mp4 + audio |
| ``` |
|
|
| ## How it works (single MI300X, sequential) |
|
|
| 1. **Director Agent** β Qwen3.5-35B-A3B (BF16, vLLM, AITER MoE) plans 6 shots, |
| character portraits, music brief, VO script, language tag. |
| 2. **Per-shot keyframes** β FLUX.2 [klein] 4B reference editing seeds each |
| shot from a single canonical character master, pinning identity. |
| 3. **Animation** β Wan2.2-I2V-A14B with ParaAttention FBCache (2Γ lossless) |
| and selective `torch.compile` on `transformer_2` (1.2Γ compile win). |
| 4. **Vision Critic** β the same Qwen3.5 looks at four sampled frames per clip, |
| labels failure modes (`STYLIZED_AI_LOOK`, `CHARACTER_DRIFT`, `EXTRAS_INVADE_FRAME`...) |
| and triggers a re-render with a bumped seed if the score is below threshold. |
| 5. **Music** β ACE-Step v1 3.5B generates a 30-second instrumental from the |
| Director's music brief. |
| 6. **Voice-over** β Kokoro-82M narrates the Director's script in any of 9 |
| languages (Director picks the language to match the setting). |
| 7. **Mix** β `ffmpeg` concat-and-loudnorm into the final `mp4`. |
|
|
| ## The full open-source stack (Apache 2.0 / MIT throughout) |
|
|
| | Stage | Model | License | |
| |---|---|---| |
| | Planner / Critic | Qwen3.5-35B-A3B | Apache 2.0 | |
| | Image | FLUX.2 [klein] 4B | Apache 2.0 | |
| | Video | Wan2.2-I2V-A14B | Apache 2.0 | |
| | Music | ACE-Step v1 3.5B | Apache 2.0 | |
| | TTS | Kokoro-82M | Apache 2.0 | |
| | Serving | vLLM 0.17 | Apache 2.0 | |
| | Caching | ParaAttention FBCache | Apache 2.0 | |
| | AMD kernels | AITER 0.1.13 | MIT | |
| | Project code | StudioMI300 | MIT | |
|
|
| Every output you generate from this stack is yours to use commercially. |
|
|
| ## Why a single MI300X |
|
|
| Most cinematic generation pipelines assume you have a multi-GPU cluster: one GPU |
| for the planner, one for the image model, one for the video model, etc. On 192 GB |
| HBM3 the pipeline runs them all sequentially on the same card. That's the project's central |
| constraint and also its main flex: |
|
|
| - Qwen3.5-35B planner loads / unloads cleanly between Director and Critic phases. |
| - Wan2.2-I2V-A14B (~80 GB BF16) leaves headroom for FLUX.2 [klein] 4B (~8 GB) |
| and ACE-Step (~12 GB) to live alongside in subprocess scope. |
| - AITER MoE for the planner. AITER FA / FP8 was evaluated for Wan2.2 β results |
| documented in `incidents.md` of the GitHub repo (FP8 path crashes mid-pipeline |
| on ROCm 7.2, AITER/issues#2187, BF16 ships). |
|
|
| ## Live demo |
|
|
| This Space hosts the showcase. Live generation requires an MI300X (45 minutes |
| per reel is too long for a casual visitor anyway). The full pipeline is on |
| GitHub β clone, point it at your MI300X, and it generates. |
|
|
| ## Credits |
|
|
| AMD Developer Hackathon 2026 entry. Built solo over six days on one AMD |
| Developer Cloud MI300X droplet. |
|
|
| Made with the open-source ecosystem: Black Forest Labs, Wan-AI, Alibaba Qwen, |
| StepFun, hexgrad/Kokoro, vLLM, ParaAttention, diffusers, AMD ROCm + AITER. |
|
|