File size: 7,872 Bytes
eda316b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e073547
eda316b
 
 
e073547
 
 
eda316b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
---

title: ClipForge
sdk: docker
app_port: 7860
---


# ClipForge

Current default preset:

- `native_highlight` captions
- OpenRouter + `google/gemini-2.5-pro` for Gemini-like stages
- Replicate SAM speaker-lock when `REPLICATE_API_TOKEN` is available
- ElevenLabs Scribe v2 transcription when `ELEVENLABS_API_KEY` is set

Long podcast or interview β†’ vertical 9:16 shorts. Pipeline: download, transcribe, Gemini (clip selection, hook detection, content pruning, layout vision), ffmpeg render.

**Architecture (static HTML, GitHub Pages):**  
[https://bryanthelai.github.io/long-to-shorts/hive_architecture_visualization.html](https://bryanthelai.github.io/long-to-shorts/hive_architecture_visualization.html)

## Hugging Face Space

This repo includes a Hugging Face Docker Space entrypoint in `app.py` with the ClipForge upload/link UI.

- Paste a YouTube/video URL or upload one local video file
- Watch live pipeline progress in the ClipForge UI
- Preview and download rendered `short_*.mp4` clips from the UI
- Regenerate from the same source with a steering prompt

Required Space secrets:

- `GOOGLE_API_KEY` or `GEMINI_API_KEY`, or `OPENROUTER_API_KEY`
- `OPENAI_API_KEY` or `ELEVENLABS_API_KEY`
- Optional for YouTube links on cloud IPs: `YTDLP_COOKIES_B64`

If `HUMEO_TRANSCRIBE_PROVIDER` is not set, the Space uses ElevenLabs when
`ELEVENLABS_API_KEY` exists, otherwise OpenAI Whisper.
If YouTube blocks Hugging Face with a bot/sign-in challenge, export a Netscape
`cookies.txt` from a logged-in browser, base64 encode it, and set it as the
`YTDLP_COOKIES_B64` Space secret. Local file upload does not need this.

## Repo layout

| Path | Role |
|------|------|
| `src/humeo/` | CLI, pipeline, ingest, Gemini prompts, render adapters |
| `humeo-core/` | Schemas, ffmpeg compile, primitives, optional MCP server |

## Pipeline (actual order)

```text

YouTube URL

  β†’ ingest (source.mp4, transcript.json)

  β†’ clip selection (Gemini β†’ clips.json)

  β†’ hook detection (Gemini β†’ hooks.json)

  β†’ content pruning (Gemini β†’ prune.json)

  β†’ keyframes + layout vision (Gemini vision β†’ layout_vision.json)

  β†’ ASS subtitles + humeo-core ffmpeg render β†’ short_<id>.mp4

```

Details: **`docs/PIPELINE.md`**.

## Five layouts

A short shows at most two on-screen items (`person` or `chart`). That yields five layout modes (see **`TERMINOLOGY.md`**).

## Requirements

- **Python** β‰₯ 3.10  
- **`uv`** β€” install: [astral.sh/uv](https://docs.astral.sh/uv/)  
- **`ffmpeg`** β€” on `PATH` for extract/render  
- **API keys** β€” see **`docs/ENVIRONMENT.md`**  
  - `GOOGLE_API_KEY` or `GEMINI_API_KEY` β€” preferred for Gemini stages  
  - `OPENROUTER_API_KEY` β€” supported fallback for those same Gemini-like stages when Google keys are unavailable  
  - `OPENAI_API_KEY` β€” if using OpenAI Whisper API (`HUMEO_TRANSCRIBE_PROVIDER=openai`)

Copy **`.env.example`** β†’ **`.env`** (never commit `.env`).

## Install

```bash

uv venv

uv sync

```

Optional local WhisperX (heavy; Windows often uses OpenAI API instead):

```bash

uv sync --extra whisper

```

## Run

```bash

humeo --long-to-shorts "https://www.youtube.com/watch?v=VIDEO_ID"

humeo --long-to-shorts "C:\path\to\video.mp4"

```

Use **`--work-dir`** or **`--no-video-cache`** to control where `source.mp4` and intermediates live (see **`docs/ENVIRONMENT.md`**).

## CLI guide (all flags)

Use `humeo --help` for the live source of truth. This table matches `src/humeo/cli.py`.

### Required

| Flag | Meaning |
|------|---------|
| `--long-to-shorts SOURCE` | YouTube URL or local MP4 path to process (required). |

### Paths and cache behavior

| Flag | Meaning |
|------|---------|
| `--output`, `-o` | Output directory for final `short_*.mp4` (default: `./output`). |
| `--work-dir PATH` | Directory for intermediate artifacts (`source.mp4`, `transcript.json`, caches). |
| `--no-video-cache` | Disable per-video cache dirs; uses `./.humeo_work` unless `--work-dir` is set. |
| `--cache-root PATH` | Override cache root (env equivalent: `HUMEO_CACHE_ROOT`). |
| `--clean-run` | Fresh run: disables video cache, forces all model stages, overwrites outputs, and auto-creates a timestamped work dir if `--work-dir` is not provided. |

### Model selection and stage forcing

| Flag | Meaning |
|------|---------|
| `--gemini-model MODEL_ID` | Gemini model for clip selection / text stages (default from env/config). |
| `--gemini-vision-model MODEL_ID` | Gemini model for keyframe layout vision (defaults to `GEMINI_VISION_MODEL` or clip model). |
| `--force-clip-selection` | Re-run clip selection even if `clips.meta.json` cache matches. |
| `--force-hook-detection` | Re-run Stage 2.25 hook detection even if `hooks.meta.json` cache matches. |
| `--force-content-pruning` | Re-run Stage 2.5 pruning even if `prune.meta.json` cache matches. |
| `--force-layout-vision` | Re-run layout vision even if `layout_vision.meta.json` cache matches. |
| `--no-hook-detection` | Skip Stage 2.25 hook detection (pruning still runs with fallback behavior). |

### Pruning and subtitles

| Flag | Meaning |
|------|---------|
| `--prune-level {off,conservative,balanced,aggressive}` | Stage 2.5 aggressiveness (default: `balanced`). |
| `--subtitle-font-size INT` | Subtitle font size in output pixels (default: `48`). |
| `--subtitle-margin-v INT` | Bottom subtitle margin in output pixels (default: `160`). |
| `--subtitle-max-words INT` | Max words per subtitle cue (default: `4`). |
| `--subtitle-max-cue-sec FLOAT` | Max subtitle cue duration in seconds (default: `2.2`). |

### Logging

| Flag | Meaning |
|------|---------|
| `--verbose`, `-v` | Enable debug logging. |

### Common command recipes

```bash

# Basic run

humeo --long-to-shorts "https://www.youtube.com/watch?v=VIDEO_ID"



# Local MP4

humeo --long-to-shorts "C:\path\to\video.mp4"



# Full fresh run for debugging / prompt tuning

humeo --long-to-shorts "https://www.youtube.com/watch?v=VIDEO_ID" --clean-run --verbose



# Re-run only clip selection after prompt edits

humeo --long-to-shorts "https://www.youtube.com/watch?v=VIDEO_ID" --force-clip-selection



# Keep intermediates in a fixed local folder

humeo --long-to-shorts "https://www.youtube.com/watch?v=VIDEO_ID" --work-dir .humeo_work



# Compare different prune levels on same source

humeo --long-to-shorts "https://www.youtube.com/watch?v=VIDEO_ID" --prune-level conservative

humeo --long-to-shorts "https://www.youtube.com/watch?v=VIDEO_ID" --prune-level aggressive

```

## Documentation

| Doc | Purpose |
|-----|---------|
| **`docs/README.md`** | Index of all files under `docs/` |
| **`docs/STUDY_ORDER.md`** | Read order for onboarding |

| **`docs/PIPELINE.md`** | Stages, caches, JSON contracts |

| **`docs/ENVIRONMENT.md`** | Keys, env vars, cache layout |

| **`docs/SHARING.md`** | How to share logs/docs/video without bloating git |

| **`docs/TARGET_VIDEO_ANALYSIS.md`** | Reference input analysis example |

| **`docs/full_run_output.txt`** | Example full run log (text) |

| **`docs/hive-paper/PAPER_BREAKDOWN.md`** | HIVE paper, file mapping Β§9 |
| **`docs/hive-paper/hive_paper_blunt_guide.md`** | Short HIVE recap |

| **`docs/TODO.md`** | Backlog |

| **`docs/KNOWN_LIMITATIONS_AND_PROMPT_CONTRACT_GAP.md`** | Prompt vs code (ranking, hooks, unused fields, scene detect) |
| **`docs/SOLUTIONS.md`** | Design rationale |
| **`TERMINOLOGY.md`** | Glossary |

## Tests

```bash

uv sync --extra dev

uv run pytest

```

## Sharing outputs

`output/`, `*.mp4`, and `keyframes/` are **gitignored**. Put rendered shorts on **YouTube** or **GitHub Releases**; keep the repo for source and docs. See **`docs/SHARING.md`**.

## License

See **`LICENSE`** (root) and **`humeo-core/LICENSE`**.