OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer
Paper β’ 2604.24762 β’ Published β’ 13
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Shot Boundary Detection with OmniShotCut, ported to Apple MLX for native Mac inference.
Based on the paper: OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer.
ffmpeg (for video I/O)pip install mlx mlx-metal numpy
# 1. Clone and install
git clone https://github.com/eisneim/OmniShotCut_mlx.git
cd OmniShotCut_mlx
# 2. Download weights from HuggingFace
python omnishotcut_mlx/download_weights.py
# 3. Run on test videos
python run_inference.py
# Auto-download from HuggingFace Hub (requires huggingface_hub)
pip install huggingface_hub
python omnishotcut_mlx/download_weights.py
# Or manually download from:
# https://huggingface.co/eisneim/OmniShotCut_mlx
# Place OmniShotCut.safetensors and config.json into ./weights/
# Alternative: download without huggingface_hub
curl -L -o weights/OmniShotCut.safetensors https://huggingface.co/eisneim/OmniShotCut_mlx/resolve/main/OmniShotCut.safetensors
curl -L -o weights/config.json https://huggingface.co/eisneim/OmniShotCut_mlx/resolve/main/config.json
# Default: balanced detection
python run_inference.py
# Sensitive mode: more cuts, good for action/vlog videos
python run_inference.py --sensitive
# Conservative mode: fewer false positives, good for interviews/long takes
python run_inference.py --conservative
# Single video
python run_inference.py --video /path/to/video.mp4
# Custom output directory
python run_inference.py --output ./my_shots
# Fine-tuned control
python run_inference.py --context 12 --min-shot 0.8 --conf 0.1
| Parameter | Default | Range | Effect |
|---|---|---|---|
--context |
10 | 0β20 | Overlap frames between windows. Higher = fewer missed boundaries, but slower |
--min-shot |
0.5 | 0.1β5.0 | Minimum shot duration in seconds. Higher = fewer false positives |
--conf |
0.0 | 0.0β1.0 | Intra-class confidence threshold. E.g. 0.3 = keep only predictions model is >30% sure about |
--sensitive |
β | β | Shortcut: context=15, min-shot=0.3, conf=0 |
--conservative |
β | β | Shortcut: context=5, min-shot=1.5, conf=0.15 |
| Video Type | Recommended | Why |
|---|---|---|
| Action / Sports | --sensitive |
Fast cuts, many short shots |
| Vlog / YouTube | default or --context 15 |
Moderate pace, varied editing |
| Interview / Podcast | --conservative |
Long takes, few cuts |
| Film / Cinema | default | Balanced |
| Animation | --sensitive |
Frequent scene changes |
| Screen Recording | --conservative or --min-shot 2.0 |
Mostly static |
OmniShotCut_mlx/
βββ run_inference.py # Main entry point
βββ omnishotcut_mlx/
β βββ model.py # OmniShotCut MLX model
β βββ transformer.py # Transformer encoder/decoder
β βββ resnet.py # ResNet18 backbone
β βββ position_encoding.py # 3D sinusoidal position encoding
β βββ load_weights.py # Weight loader (from safetensors)
β βββ download_weights.py # HuggingFace weight downloader
βββ weights/
β βββ OmniShotCut.safetensors # MLX-native weights (~157MB)
β βββ config.json # Model configuration
βββ test_data/ # Place test videos here
Shots are saved as shot_0000.mp4, shot_0001.mp4, ... under test_data/output/<video_name>/.
Each shot file is a self-contained H.264/AAC MP4 clip with the detected shot boundary transitions removed.
Paper: OmniShotCut (arXiv 2604.24762) by Boyang Wang et al.
MLX port by @eisneim. Weights hosted at eisneim/OmniShotCut_mlx.