File size: 2,335 Bytes
00b7145
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
# Pitch Deck Outline

Use this as the slide plan for the required presentation deck.

## Slide 1 - Title

ElevenClip.AI

AI clip studio for turning long-form videos into personalized short-form clips.

Include:

- AMD Developer Hackathon
- Track 3 - Vision & Multimodal AI
- GitHub URL
- Hugging Face Space URL

## Slide 2 - Problem

Long-form creators need short-form distribution, but editing clips manually is slow.

Key points:

- Two-hour videos can take hours to review.
- Good clips depend on audience, niche, tone, and platform.
- Subtitles and vertical export add repetitive work.

## Slide 3 - Solution

ElevenClip.AI automates the first editing pass.

Workflow:

Video input -> Whisper transcript -> Qwen highlight scoring -> ffmpeg clip rendering -> human review/editor -> downloads

## Slide 4 - Product Demo

Show screenshots or short GIFs of:

- Channel profile
- Pipeline progress
- Transcript/highlights
- Clip editor
- Approved/downloaded clips

## Slide 5 - AI Architecture

Model roles:

- Whisper Large V3: multilingual transcription, including Thai.
- Qwen2.5-7B-Instruct: profile-aware highlight detection.
- Qwen2-VL-7B-Instruct: visual reactions, scene changes, and on-screen text.
- ffmpeg: subtitle burn-in and platform export.

## Slide 6 - AMD + ROCm

Why AMD matters:

- Long videos need high-throughput inference.
- MI300X memory helps with large models and long transcripts.
- ROCm + PyTorch enables Whisper inference.
- vLLM ROCm enables faster Qwen serving.

## Slide 7 - Benchmark

Replace placeholders after cloud credits arrive.

| Run | Hardware | Total Time | Clips |
| --- | --- | ---: | ---: |
| CPU baseline | CPU | TBD | 10 |
| AMD GPU | MI300X + ROCm | TBD | 10 |

Goal: 2-hour video -> 10 subtitled clips in under 10 minutes on MI300X.

## Slide 8 - Business Value

Target users:

- YouTubers
- Podcasters
- Educators
- Streamers
- Agencies
- Brand marketing teams

Value:

- Save editing time.
- Increase short-form output.
- Keep creator control.
- Support multilingual creators.

## Slide 9 - What We Built

Current MVP:

- FastAPI backend
- React editor
- YouTube/upload input
- Demo pipeline
- Clip rendering and subtitles
- Hugging Face Space
- AMD deployment plan

Next:

- Real Whisper + Qwen on MI300X
- Qwen2-VL frame analysis
- Benchmark table
- Better subtitle styling presets