Spaces:
Sleeping
title: Humeo
sdk: docker
app_port: 7860
Humeo
Long podcast or interview β vertical 9:16 shorts. Pipeline: download, transcribe, Gemini (clip selection, hook detection, content pruning, layout vision), ffmpeg render.
Architecture (static HTML, GitHub Pages):
https://bryanthelai.github.io/long-to-shorts/hive_architecture_visualization.html
Hugging Face Space
This repo includes a Hugging Face Docker Space entrypoint in app.py.
- Upload one local MP4
- Watch live pipeline logs and stage progress
- Download rendered
short_*.mp4clips from the UI
Required Space secrets:
GOOGLE_API_KEYorGEMINI_API_KEY, orOPENROUTER_API_KEYOPENAI_API_KEY
The Docker image pins HUMEO_TRANSCRIBE_PROVIDER=openai for the Space demo.
Repo layout
| Path | Role |
|---|---|
src/humeo/ |
CLI, pipeline, ingest, Gemini prompts, render adapters |
humeo-core/ |
Schemas, ffmpeg compile, primitives, optional MCP server |
Pipeline (actual order)
YouTube URL
β ingest (source.mp4, transcript.json)
β clip selection (Gemini β clips.json)
β hook detection (Gemini β hooks.json)
β content pruning (Gemini β prune.json)
β keyframes + layout vision (Gemini vision β layout_vision.json)
β ASS subtitles + humeo-core ffmpeg render β short_<id>.mp4
Details: docs/PIPELINE.md.
Five layouts
A short shows at most two on-screen items (person or chart). That yields five layout modes (see TERMINOLOGY.md).
Requirements
- Python β₯ 3.10
uvβ install: astral.sh/uvffmpegβ onPATHfor extract/render- API keys β see
docs/ENVIRONMENT.mdGOOGLE_API_KEYorGEMINI_API_KEYβ preferred for Gemini stagesOPENROUTER_API_KEYβ supported fallback for those same Gemini-like stages when Google keys are unavailableOPENAI_API_KEYβ if using OpenAI Whisper API (HUMEO_TRANSCRIBE_PROVIDER=openai)
Copy .env.example β .env (never commit .env).
Install
uv venv
uv sync
Optional local WhisperX (heavy; Windows often uses OpenAI API instead):
uv sync --extra whisper
Run
humeo --long-to-shorts "https://www.youtube.com/watch?v=VIDEO_ID"
humeo --long-to-shorts "C:\path\to\video.mp4"
Use --work-dir or --no-video-cache to control where source.mp4 and intermediates live (see docs/ENVIRONMENT.md).
CLI guide (all flags)
Use humeo --help for the live source of truth. This table matches src/humeo/cli.py.
Required
| Flag | Meaning |
|---|---|
--long-to-shorts SOURCE |
YouTube URL or local MP4 path to process (required). |
Paths and cache behavior
| Flag | Meaning |
|---|---|
--output, -o |
Output directory for final short_*.mp4 (default: ./output). |
--work-dir PATH |
Directory for intermediate artifacts (source.mp4, transcript.json, caches). |
--no-video-cache |
Disable per-video cache dirs; uses ./.humeo_work unless --work-dir is set. |
--cache-root PATH |
Override cache root (env equivalent: HUMEO_CACHE_ROOT). |
--clean-run |
Fresh run: disables video cache, forces all model stages, overwrites outputs, and auto-creates a timestamped work dir if --work-dir is not provided. |
Model selection and stage forcing
| Flag | Meaning |
|---|---|
--gemini-model MODEL_ID |
Gemini model for clip selection / text stages (default from env/config). |
--gemini-vision-model MODEL_ID |
Gemini model for keyframe layout vision (defaults to GEMINI_VISION_MODEL or clip model). |
--force-clip-selection |
Re-run clip selection even if clips.meta.json cache matches. |
--force-hook-detection |
Re-run Stage 2.25 hook detection even if hooks.meta.json cache matches. |
--force-content-pruning |
Re-run Stage 2.5 pruning even if prune.meta.json cache matches. |
--force-layout-vision |
Re-run layout vision even if layout_vision.meta.json cache matches. |
--no-hook-detection |
Skip Stage 2.25 hook detection (pruning still runs with fallback behavior). |
Pruning and subtitles
| Flag | Meaning |
|---|---|
--prune-level {off,conservative,balanced,aggressive} |
Stage 2.5 aggressiveness (default: balanced). |
--subtitle-font-size INT |
Subtitle font size in output pixels (default: 48). |
--subtitle-margin-v INT |
Bottom subtitle margin in output pixels (default: 160). |
--subtitle-max-words INT |
Max words per subtitle cue (default: 4). |
--subtitle-max-cue-sec FLOAT |
Max subtitle cue duration in seconds (default: 2.2). |
Logging
| Flag | Meaning |
|---|---|
--verbose, -v |
Enable debug logging. |
Common command recipes
# Basic run
humeo --long-to-shorts "https://www.youtube.com/watch?v=VIDEO_ID"
# Local MP4
humeo --long-to-shorts "C:\path\to\video.mp4"
# Full fresh run for debugging / prompt tuning
humeo --long-to-shorts "https://www.youtube.com/watch?v=VIDEO_ID" --clean-run --verbose
# Re-run only clip selection after prompt edits
humeo --long-to-shorts "https://www.youtube.com/watch?v=VIDEO_ID" --force-clip-selection
# Keep intermediates in a fixed local folder
humeo --long-to-shorts "https://www.youtube.com/watch?v=VIDEO_ID" --work-dir .humeo_work
# Compare different prune levels on same source
humeo --long-to-shorts "https://www.youtube.com/watch?v=VIDEO_ID" --prune-level conservative
humeo --long-to-shorts "https://www.youtube.com/watch?v=VIDEO_ID" --prune-level aggressive
Documentation
| Doc | Purpose |
|---|---|
docs/README.md |
Index of all files under docs/ |
docs/STUDY_ORDER.md |
Read order for onboarding |
docs/PIPELINE.md |
Stages, caches, JSON contracts |
docs/ENVIRONMENT.md |
Keys, env vars, cache layout |
docs/SHARING.md |
How to share logs/docs/video without bloating git |
docs/TARGET_VIDEO_ANALYSIS.md |
Reference input analysis example |
docs/full_run_output.txt |
Example full run log (text) |
docs/hive-paper/PAPER_BREAKDOWN.md |
HIVE paper, file mapping Β§9 |
docs/hive-paper/hive_paper_blunt_guide.md |
Short HIVE recap |
docs/TODO.md |
Backlog |
docs/KNOWN_LIMITATIONS_AND_PROMPT_CONTRACT_GAP.md |
Prompt vs code (ranking, hooks, unused fields, scene detect) |
docs/SOLUTIONS.md |
Design rationale |
TERMINOLOGY.md |
Glossary |
Tests
uv sync --extra dev
uv run pytest
Sharing outputs
output/, *.mp4, and keyframes/ are gitignored. Put rendered shorts on YouTube or GitHub Releases; keep the repo for source and docs. See docs/SHARING.md.
License
See LICENSE (root) and humeo-core/LICENSE.