File size: 4,796 Bytes
c633bae | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 | # OmniShotCut MLX
Shot Boundary Detection with OmniShotCut, ported to Apple MLX for native Mac inference.
Based on the paper: [OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer](https://arxiv.org/abs/2604.24762).
## Features
- Pure MLX inference β runs natively on Apple Silicon, zero PyTorch dependency at runtime
- Detects hard cuts, dissolves, fades, wipes, slides, zooms, doorways, and sudden jumps
- Tunable sensitivity for different video types (action, interview, vlog, film)
## Requirements
- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.10+
- `ffmpeg` (for video I/O)
```bash
pip install mlx mlx-metal numpy
```
## Quick Start
```bash
# 1. Clone and install
git clone https://github.com/eisneim/OmniShotCut_mlx.git
cd OmniShotCut_mlx
# 2. Download weights from HuggingFace
python omnishotcut_mlx/download_weights.py
# 3. Run on test videos
python run_inference.py
```
## Download Weights
```bash
# Auto-download from HuggingFace Hub (requires huggingface_hub)
pip install huggingface_hub
python omnishotcut_mlx/download_weights.py
# Or manually download from:
# https://huggingface.co/eisneim/OmniShotCut_mlx
# Place OmniShotCut.safetensors and config.json into ./weights/
# Alternative: download without huggingface_hub
curl -L -o weights/OmniShotCut.safetensors https://huggingface.co/eisneim/OmniShotCut_mlx/resolve/main/OmniShotCut.safetensors
curl -L -o weights/config.json https://huggingface.co/eisneim/OmniShotCut_mlx/resolve/main/config.json
```
## Usage
```bash
# Default: balanced detection
python run_inference.py
# Sensitive mode: more cuts, good for action/vlog videos
python run_inference.py --sensitive
# Conservative mode: fewer false positives, good for interviews/long takes
python run_inference.py --conservative
# Single video
python run_inference.py --video /path/to/video.mp4
# Custom output directory
python run_inference.py --output ./my_shots
# Fine-tuned control
python run_inference.py --context 12 --min-shot 0.8 --conf 0.1
```
### Tunable Parameters
| Parameter | Default | Range | Effect |
|-----------|---------|-------|--------|
| `--context` | 10 | 0β20 | Overlap frames between windows. Higher = fewer missed boundaries, but slower |
| `--min-shot` | 0.5 | 0.1β5.0 | Minimum shot duration in seconds. Higher = fewer false positives |
| `--conf` | 0.0 | 0.0β1.0 | Intra-class confidence threshold. E.g. 0.3 = keep only predictions model is >30% sure about |
| `--sensitive` | β | β | Shortcut: context=15, min-shot=0.3, conf=0 |
| `--conservative` | β | β | Shortcut: context=5, min-shot=1.5, conf=0.15 |
### Parameter Guide by Video Type
| Video Type | Recommended | Why |
|------------|------------|-----|
| Action / Sports | `--sensitive` | Fast cuts, many short shots |
| Vlog / YouTube | default or `--context 15` | Moderate pace, varied editing |
| Interview / Podcast | `--conservative` | Long takes, few cuts |
| Film / Cinema | default | Balanced |
| Animation | `--sensitive` | Frequent scene changes |
| Screen Recording | `--conservative` or `--min-shot 2.0` | Mostly static |
## Project Structure
```
OmniShotCut_mlx/
βββ run_inference.py # Main entry point
βββ omnishotcut_mlx/
β βββ model.py # OmniShotCut MLX model
β βββ transformer.py # Transformer encoder/decoder
β βββ resnet.py # ResNet18 backbone
β βββ position_encoding.py # 3D sinusoidal position encoding
β βββ load_weights.py # Weight loader (from safetensors)
β βββ download_weights.py # HuggingFace weight downloader
βββ weights/
β βββ OmniShotCut.safetensors # MLX-native weights (~157MB)
β βββ config.json # Model configuration
βββ test_data/ # Place test videos here
```
## Output
Shots are saved as `shot_0000.mp4`, `shot_0001.mp4`, ... under `test_data/output/<video_name>/`.
Each shot file is a self-contained H.264/AAC MP4 clip with the detected shot boundary transitions removed.
## Model
- **Architecture**: Shot-Query Transformer (DETR-style), 6 encoder + 6 decoder layers, ResNet18 backbone
- **Input**: 100-frame windows at 128Γ96, ImageNet normalization
- **Output**: Shot boundary frame indices + intra-shot relation (dissolve, wipe, fade, ...) + inter-shot relation (hard cut, sudden jump, ...)
- **Weights**: Converted from the official PyTorch checkpoint, 363 tensors, float32
## License & Credits
Paper: [OmniShotCut (arXiv 2604.24762)](https://arxiv.org/abs/2604.24762) by Boyang Wang et al.
MLX port by [@eisneim](https://github.com/eisneim). Weights hosted at [eisneim/OmniShotCut_mlx](https://huggingface.co/eisneim/OmniShotCut_mlx).
|