# OmniShotCut MLX Shot Boundary Detection with OmniShotCut, ported to Apple MLX for native Mac inference. Based on the paper: [OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer](https://arxiv.org/abs/2604.24762). ## Features - Pure MLX inference — runs natively on Apple Silicon, zero PyTorch dependency at runtime - Detects hard cuts, dissolves, fades, wipes, slides, zooms, doorways, and sudden jumps - Tunable sensitivity for different video types (action, interview, vlog, film) ## Requirements - macOS with Apple Silicon (M1/M2/M3/M4) - Python 3.10+ - `ffmpeg` (for video I/O) ```bash pip install mlx mlx-metal numpy ``` ## Quick Start ```bash # 1. Clone and install git clone https://github.com/eisneim/OmniShotCut_mlx.git cd OmniShotCut_mlx # 2. Download weights from HuggingFace python omnishotcut_mlx/download_weights.py # 3. Run on test videos python run_inference.py ``` ## Download Weights ```bash # Auto-download from HuggingFace Hub (requires huggingface_hub) pip install huggingface_hub python omnishotcut_mlx/download_weights.py # Or manually download from: # https://huggingface.co/eisneim/OmniShotCut_mlx # Place OmniShotCut.safetensors and config.json into ./weights/ # Alternative: download without huggingface_hub curl -L -o weights/OmniShotCut.safetensors https://huggingface.co/eisneim/OmniShotCut_mlx/resolve/main/OmniShotCut.safetensors curl -L -o weights/config.json https://huggingface.co/eisneim/OmniShotCut_mlx/resolve/main/config.json ``` ## Usage ```bash # Default: balanced detection python run_inference.py # Sensitive mode: more cuts, good for action/vlog videos python run_inference.py --sensitive # Conservative mode: fewer false positives, good for interviews/long takes python run_inference.py --conservative # Single video python run_inference.py --video /path/to/video.mp4 # Custom output directory python run_inference.py --output ./my_shots # Fine-tuned control python run_inference.py --context 12 --min-shot 0.8 --conf 0.1 ``` ### Tunable Parameters | Parameter | Default | Range | Effect | |-----------|---------|-------|--------| | `--context` | 10 | 0–20 | Overlap frames between windows. Higher = fewer missed boundaries, but slower | | `--min-shot` | 0.5 | 0.1–5.0 | Minimum shot duration in seconds. Higher = fewer false positives | | `--conf` | 0.0 | 0.0–1.0 | Intra-class confidence threshold. E.g. 0.3 = keep only predictions model is >30% sure about | | `--sensitive` | — | — | Shortcut: context=15, min-shot=0.3, conf=0 | | `--conservative` | — | — | Shortcut: context=5, min-shot=1.5, conf=0.15 | ### Parameter Guide by Video Type | Video Type | Recommended | Why | |------------|------------|-----| | Action / Sports | `--sensitive` | Fast cuts, many short shots | | Vlog / YouTube | default or `--context 15` | Moderate pace, varied editing | | Interview / Podcast | `--conservative` | Long takes, few cuts | | Film / Cinema | default | Balanced | | Animation | `--sensitive` | Frequent scene changes | | Screen Recording | `--conservative` or `--min-shot 2.0` | Mostly static | ## Project Structure ``` OmniShotCut_mlx/ ├── run_inference.py # Main entry point ├── omnishotcut_mlx/ │ ├── model.py # OmniShotCut MLX model │ ├── transformer.py # Transformer encoder/decoder │ ├── resnet.py # ResNet18 backbone │ ├── position_encoding.py # 3D sinusoidal position encoding │ ├── load_weights.py # Weight loader (from safetensors) │ └── download_weights.py # HuggingFace weight downloader ├── weights/ │ ├── OmniShotCut.safetensors # MLX-native weights (~157MB) │ └── config.json # Model configuration └── test_data/ # Place test videos here ``` ## Output Shots are saved as `shot_0000.mp4`, `shot_0001.mp4`, ... under `test_data/output//`. Each shot file is a self-contained H.264/AAC MP4 clip with the detected shot boundary transitions removed. ## Model - **Architecture**: Shot-Query Transformer (DETR-style), 6 encoder + 6 decoder layers, ResNet18 backbone - **Input**: 100-frame windows at 128×96, ImageNet normalization - **Output**: Shot boundary frame indices + intra-shot relation (dissolve, wipe, fade, ...) + inter-shot relation (hard cut, sudden jump, ...) - **Weights**: Converted from the official PyTorch checkpoint, 363 tensors, float32 ## License & Credits Paper: [OmniShotCut (arXiv 2604.24762)](https://arxiv.org/abs/2604.24762) by Boyang Wang et al. MLX port by [@eisneim](https://github.com/eisneim). Weights hosted at [eisneim/OmniShotCut_mlx](https://huggingface.co/eisneim/OmniShotCut_mlx).