ElevenClip-AI / docs /SUBMISSION.md
JakgritB
docs: add hackathon submission package
00b7145

ElevenClip.AI Hackathon Submission Pack

Use this as the source text for the lablab.ai project submission.

Basic Information

Project Title

ElevenClip.AI

Short Description

AI clip studio that turns long-form videos into personalized short-form clips with Whisper, Qwen, ROCm, and AMD MI300X.

Track

Track 3 - Vision & Multimodal AI

Technology Tags

AMD Developer Cloud, AMD Instinct MI300X, ROCm, Hugging Face, Qwen, Whisper, FastAPI, React, ffmpeg, multimodal AI, video AI, Thai language, creator tools

Long Description

ElevenClip.AI helps creators convert long-form videos into short-form clips for TikTok, YouTube Shorts, and Instagram Reels. The app takes a YouTube URL or uploaded video, transcribes it with Whisper Large V3, uses Qwen2.5 to score the best highlight moments based on the creator's channel profile, optionally adds visual signals with Qwen2-VL, and renders vertical clips with subtitles through ffmpeg.

The core idea is human-AI collaborative editing. AI creates the first pass quickly, but creators still control the final result. After the pipeline generates clips, the user can trim start and end times, edit subtitle text, delete weak clips, regenerate specific clips, approve final clips, and download the outputs.

The project is built for AMD Developer Hackathon Track 3 because it combines audio transcription, transcript reasoning, video understanding, and rendered media outputs. The production target is AMD Developer Cloud with ROCm and AMD Instinct MI300X. MI300X acceleration is especially relevant because the workflow needs to process long videos, run large multilingual models, and generate multiple clips fast enough for creator workflows.

Problem

Creators often publish long-form content but still need short clips for discovery platforms. Manually finding the best moments in a two-hour video, trimming clips, writing subtitles, reframing to vertical, and exporting multiple MP4 files can take hours.

Generic clipping tools miss the creator's style. A funny gaming channel and an educational podcast do not choose highlights the same way. ElevenClip.AI uses a reusable channel profile so highlight detection can adapt to niche, style, language, target platform, and preferred clip length.

Solution

  1. Set a channel profile once.
  2. Paste a YouTube URL or upload a video.
  3. Transcribe speech with Whisper Large V3.
  4. Use Qwen2.5 to score transcript segments for engagement potential.
  5. Optionally use Qwen2-VL for visual highlights such as reactions, scene changes, and on-screen text.
  6. Render short-form clips with subtitles using ffmpeg.
  7. Let the human editor trim, edit subtitles, regenerate, approve, and download.

AMD + ROCm Usage

The intended production pipeline runs on AMD Developer Cloud:

  • AMD Instinct MI300X for high-throughput inference.
  • ROCm 6.x as the GPU compute stack.
  • PyTorch ROCm for Whisper Large V3 transcription.
  • vLLM ROCm backend for Qwen2.5 highlight analysis.
  • Hugging Face model hub for Whisper and Qwen models.
  • ffmpeg hardware acceleration hooks where available.

The repo includes a local demo mode so the interface and API can be tested before cloud credits arrive. Once AMD credits are active, DEMO_MODE=false enables the real model path and benchmark collection.

Hugging Face Space

Public Space:

https://huggingface.co/spaces/lablab-ai-amd-developer-hackathon/ElevenClip-AI

The Space is published under the event organization and acts as the public demo/landing page. The full app source code is available on GitHub.

GitHub Repository

https://github.com/JakgritB/ElevenClip.AI

Demo Video Plan

The final demo video should be recorded twice:

  1. Draft demo before AMD credits: show the local app, end-to-end UI, clip editor, and project concept.
  2. Final demo after AMD credits: show the same flow plus AMD Developer Cloud, ROCm/MI300X GPU detection, real model inference, and benchmark results.

Benchmark Placeholder

Replace this table after the AMD run.

Run Video Length Clips Hardware Total Time Notes
Draft demo 2-3 min 5 Local CPU demo mode TBD UI and workflow validation
CPU baseline 2 hr 10 CPU TBD Real model stack, GPU hidden
AMD GPU 2 hr 10 AMD Instinct MI300X + ROCm TBD Target: under 10 minutes

Judging Criteria Mapping

Criterion How ElevenClip.AI Addresses It
Application of Technology Integrates Whisper, Qwen2.5, optional Qwen2-VL, Hugging Face, ROCm, and AMD MI300X target deployment.
Business Value Solves a real creator workflow: turning long videos into platform-ready clips with subtitles and human editing.
Originality Uses creator profile personalization and multilingual support rather than generic highlight detection.
Presentation Demo shows before/after clips, editing controls, and CPU vs AMD GPU benchmark logs.