Spaces:

lablab-ai-amd-developer-hackathon
/

studiomi300

Running

App Files Files Community

studiomi300 / README.md

pollix

restore space from local (app.py + showcase + thumbnail)

4f7e639 verified 11 days ago

3.95 kB

title: StudioMI300
emoji: 🎬
colorFrom: indigo
colorTo: pink
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: true
license: mit
short_description: One prompt → 30s cinematic reel on a single AMD MI300X
thumbnail: thumbnail.png
tags:
  - amd
  - amd-hackathon-2026
  - mi300x
  - rocm
  - video-generation
  - wan2.2
  - flux
  - qwen
  - text-to-video
  - text-to-film
  - cinematic
  - gradio

StudioMI300

One prompt → 30-second cinematic reel. Built for the AMD Developer Hackathon 2026 on a single AMD Instinct MI300X (192 GB HBM3, ROCm 7.2).

What it does

You write one sentence. The pipeline plans a six-shot story, paints character keyframes, animates them, scores the music, narrates the voice-over, and stitches everything into a 30-second mp4. No setup. No LoRA training. No per-shot prompting.

"A young woman walks through neon-lit Tokyo at night and meets two friends."
                                  ↓
                 [ ~45 minutes on a single MI300X ]
                                  ↓
                    30s cinematic reel.mp4 + audio

How it works (single MI300X, sequential)

Director Agent — Qwen3.5-35B-A3B (BF16, vLLM, AITER MoE) plans 6 shots, character portraits, music brief, VO script, language tag.
Per-shot keyframes — FLUX.2 [klein] 4B reference editing seeds each shot from a single canonical character master, pinning identity.
Animation — Wan2.2-I2V-A14B with ParaAttention FBCache (2× lossless) and selective torch.compile on transformer_2 (1.2× compile win).
Vision Critic — the same Qwen3.5 looks at four sampled frames per clip, labels failure modes (STYLIZED_AI_LOOK, CHARACTER_DRIFT, EXTRAS_INVADE_FRAME...) and triggers a re-render with a bumped seed if the score is below threshold.
Music — ACE-Step v1 3.5B generates a 30-second instrumental from the Director's music brief.
Voice-over — Kokoro-82M narrates the Director's script in any of 9 languages (Director picks the language to match the setting).
Mix — ffmpeg concat-and-loudnorm into the final mp4.

The full open-source stack (Apache 2.0 / MIT throughout)

Stage	Model	License
Planner / Critic	Qwen3.5-35B-A3B	Apache 2.0
Image	FLUX.2 [klein] 4B	Apache 2.0
Video	Wan2.2-I2V-A14B	Apache 2.0
Music	ACE-Step v1 3.5B	Apache 2.0
TTS	Kokoro-82M	Apache 2.0
Serving	vLLM 0.17	Apache 2.0
Caching	ParaAttention FBCache	Apache 2.0
AMD kernels	AITER 0.1.13	MIT
Project code	StudioMI300	MIT

Every output you generate from this stack is yours to use commercially.

Why a single MI300X

Most cinematic generation pipelines assume you have a multi-GPU cluster: one GPU for the planner, one for the image model, one for the video model, etc. On 192 GB HBM3 the pipeline runs them all sequentially on the same card. That's the project's central constraint and also its main flex:

Qwen3.5-35B planner loads / unloads cleanly between Director and Critic phases.
Wan2.2-I2V-A14B (~~80 GB BF16) leaves headroom for FLUX.2 [klein] 4B (~~8 GB) and ACE-Step (~12 GB) to live alongside in subprocess scope.
AITER MoE for the planner. AITER FA / FP8 was evaluated for Wan2.2 — results documented in incidents.md of the GitHub repo (FP8 path crashes mid-pipeline on ROCm 7.2, AITER/issues#2187, BF16 ships).

Live demo

This Space hosts the showcase. Live generation requires an MI300X (45 minutes per reel is too long for a casual visitor anyway). The full pipeline is on GitHub — clone, point it at your MI300X, and it generates.

Credits

AMD Developer Hackathon 2026 entry. Built solo over six days on one AMD Developer Cloud MI300X droplet.

Made with the open-source ecosystem: Black Forest Labs, Wan-AI, Alibaba Qwen, StepFun, hexgrad/Kokoro, vLLM, ParaAttention, diffusers, AMD ROCm + AITER.