| # OmniShotCut MLX |
|
|
| Shot Boundary Detection with OmniShotCut, ported to Apple MLX for native Mac inference. |
|
|
| Based on the paper: [OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer](https://arxiv.org/abs/2604.24762). |
|
|
| ## Features |
|
|
| - Pure MLX inference β runs natively on Apple Silicon, zero PyTorch dependency at runtime |
| - Detects hard cuts, dissolves, fades, wipes, slides, zooms, doorways, and sudden jumps |
| - Tunable sensitivity for different video types (action, interview, vlog, film) |
|
|
| ## Requirements |
|
|
| - macOS with Apple Silicon (M1/M2/M3/M4) |
| - Python 3.10+ |
| - `ffmpeg` (for video I/O) |
|
|
| ```bash |
| pip install mlx mlx-metal numpy |
| ``` |
|
|
| ## Quick Start |
|
|
| ```bash |
| # 1. Clone and install |
| git clone https://github.com/eisneim/OmniShotCut_mlx.git |
| cd OmniShotCut_mlx |
| |
| # 2. Download weights from HuggingFace |
| python omnishotcut_mlx/download_weights.py |
| |
| # 3. Run on test videos |
| python run_inference.py |
| ``` |
|
|
| ## Download Weights |
|
|
| ```bash |
| # Auto-download from HuggingFace Hub (requires huggingface_hub) |
| pip install huggingface_hub |
| python omnishotcut_mlx/download_weights.py |
| |
| # Or manually download from: |
| # https://huggingface.co/eisneim/OmniShotCut_mlx |
| # Place OmniShotCut.safetensors and config.json into ./weights/ |
| |
| # Alternative: download without huggingface_hub |
| curl -L -o weights/OmniShotCut.safetensors https://huggingface.co/eisneim/OmniShotCut_mlx/resolve/main/OmniShotCut.safetensors |
| curl -L -o weights/config.json https://huggingface.co/eisneim/OmniShotCut_mlx/resolve/main/config.json |
| ``` |
|
|
| ## Usage |
|
|
| ```bash |
| # Default: balanced detection |
| python run_inference.py |
| |
| # Sensitive mode: more cuts, good for action/vlog videos |
| python run_inference.py --sensitive |
| |
| # Conservative mode: fewer false positives, good for interviews/long takes |
| python run_inference.py --conservative |
| |
| # Single video |
| python run_inference.py --video /path/to/video.mp4 |
| |
| # Custom output directory |
| python run_inference.py --output ./my_shots |
| |
| # Fine-tuned control |
| python run_inference.py --context 12 --min-shot 0.8 --conf 0.1 |
| ``` |
|
|
| ### Tunable Parameters |
|
|
| | Parameter | Default | Range | Effect | |
| |-----------|---------|-------|--------| |
| | `--context` | 10 | 0β20 | Overlap frames between windows. Higher = fewer missed boundaries, but slower | |
| | `--min-shot` | 0.5 | 0.1β5.0 | Minimum shot duration in seconds. Higher = fewer false positives | |
| | `--conf` | 0.0 | 0.0β1.0 | Intra-class confidence threshold. E.g. 0.3 = keep only predictions model is >30% sure about | |
| | `--sensitive` | β | β | Shortcut: context=15, min-shot=0.3, conf=0 | |
| | `--conservative` | β | β | Shortcut: context=5, min-shot=1.5, conf=0.15 | |
|
|
| ### Parameter Guide by Video Type |
|
|
| | Video Type | Recommended | Why | |
| |------------|------------|-----| |
| | Action / Sports | `--sensitive` | Fast cuts, many short shots | |
| | Vlog / YouTube | default or `--context 15` | Moderate pace, varied editing | |
| | Interview / Podcast | `--conservative` | Long takes, few cuts | |
| | Film / Cinema | default | Balanced | |
| | Animation | `--sensitive` | Frequent scene changes | |
| | Screen Recording | `--conservative` or `--min-shot 2.0` | Mostly static | |
|
|
| ## Project Structure |
|
|
| ``` |
| OmniShotCut_mlx/ |
| βββ run_inference.py # Main entry point |
| βββ omnishotcut_mlx/ |
| β βββ model.py # OmniShotCut MLX model |
| β βββ transformer.py # Transformer encoder/decoder |
| β βββ resnet.py # ResNet18 backbone |
| β βββ position_encoding.py # 3D sinusoidal position encoding |
| β βββ load_weights.py # Weight loader (from safetensors) |
| β βββ download_weights.py # HuggingFace weight downloader |
| βββ weights/ |
| β βββ OmniShotCut.safetensors # MLX-native weights (~157MB) |
| β βββ config.json # Model configuration |
| βββ test_data/ # Place test videos here |
| ``` |
|
|
| ## Output |
|
|
| Shots are saved as `shot_0000.mp4`, `shot_0001.mp4`, ... under `test_data/output/<video_name>/`. |
|
|
| Each shot file is a self-contained H.264/AAC MP4 clip with the detected shot boundary transitions removed. |
|
|
| ## Model |
|
|
| - **Architecture**: Shot-Query Transformer (DETR-style), 6 encoder + 6 decoder layers, ResNet18 backbone |
| - **Input**: 100-frame windows at 128Γ96, ImageNet normalization |
| - **Output**: Shot boundary frame indices + intra-shot relation (dissolve, wipe, fade, ...) + inter-shot relation (hard cut, sudden jump, ...) |
| - **Weights**: Converted from the official PyTorch checkpoint, 363 tensors, float32 |
|
|
| ## License & Credits |
|
|
| Paper: [OmniShotCut (arXiv 2604.24762)](https://arxiv.org/abs/2604.24762) by Boyang Wang et al. |
|
|
| MLX port by [@eisneim](https://github.com/eisneim). Weights hosted at [eisneim/OmniShotCut_mlx](https://huggingface.co/eisneim/OmniShotCut_mlx). |
|
|