OmniShotCut MLX

Shot Boundary Detection with OmniShotCut, ported to Apple MLX for native Mac inference.

Based on the paper: OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer.

Features

Pure MLX inference — runs natively on Apple Silicon, zero PyTorch dependency at runtime
Detects hard cuts, dissolves, fades, wipes, slides, zooms, doorways, and sudden jumps
Tunable sensitivity for different video types (action, interview, vlog, film)

Requirements

macOS with Apple Silicon (M1/M2/M3/M4)
Python 3.10+
ffmpeg (for video I/O)

pip install mlx mlx-metal numpy

Quick Start

# 1. Clone and install
git clone https://github.com/eisneim/OmniShotCut_mlx.git
cd OmniShotCut_mlx

# 2. Download weights from HuggingFace
python omnishotcut_mlx/download_weights.py

# 3. Run on test videos
python run_inference.py

Download Weights

# Auto-download from HuggingFace Hub (requires huggingface_hub)
pip install huggingface_hub
python omnishotcut_mlx/download_weights.py

# Or manually download from:
# https://huggingface.co/eisneim/OmniShotCut_mlx
# Place OmniShotCut.safetensors and config.json into ./weights/

# Alternative: download without huggingface_hub
curl -L -o weights/OmniShotCut.safetensors https://huggingface.co/eisneim/OmniShotCut_mlx/resolve/main/OmniShotCut.safetensors
curl -L -o weights/config.json https://huggingface.co/eisneim/OmniShotCut_mlx/resolve/main/config.json

Usage

# Default: balanced detection
python run_inference.py

# Sensitive mode: more cuts, good for action/vlog videos
python run_inference.py --sensitive

# Conservative mode: fewer false positives, good for interviews/long takes
python run_inference.py --conservative

# Single video
python run_inference.py --video /path/to/video.mp4

# Custom output directory
python run_inference.py --output ./my_shots

# Fine-tuned control
python run_inference.py --context 12 --min-shot 0.8 --conf 0.1

Tunable Parameters

Parameter	Default	Range	Effect
`--context`	10	0–20	Overlap frames between windows. Higher = fewer missed boundaries, but slower
`--min-shot`	0.5	0.1–5.0	Minimum shot duration in seconds. Higher = fewer false positives
`--conf`	0.0	0.0–1.0	Intra-class confidence threshold. E.g. 0.3 = keep only predictions model is >30% sure about
`--sensitive`	—	—	Shortcut: context=15, min-shot=0.3, conf=0
`--conservative`	—	—	Shortcut: context=5, min-shot=1.5, conf=0.15

Parameter Guide by Video Type

Video Type	Recommended	Why
Action / Sports	`--sensitive`	Fast cuts, many short shots
Vlog / YouTube	default or `--context 15`	Moderate pace, varied editing
Interview / Podcast	`--conservative`	Long takes, few cuts
Film / Cinema	default	Balanced
Animation	`--sensitive`	Frequent scene changes
Screen Recording	`--conservative` or `--min-shot 2.0`	Mostly static

Project Structure

OmniShotCut_mlx/
├── run_inference.py               # Main entry point
├── omnishotcut_mlx/
│   ├── model.py                   # OmniShotCut MLX model
│   ├── transformer.py             # Transformer encoder/decoder
│   ├── resnet.py                  # ResNet18 backbone
│   ├── position_encoding.py       # 3D sinusoidal position encoding
│   ├── load_weights.py            # Weight loader (from safetensors)
│   └── download_weights.py        # HuggingFace weight downloader
├── weights/
│   ├── OmniShotCut.safetensors    # MLX-native weights (~157MB)
│   └── config.json                # Model configuration
└── test_data/                     # Place test videos here

Output

Shots are saved as shot_0000.mp4, shot_0001.mp4, ... under test_data/output/<video_name>/.

Each shot file is a self-contained H.264/AAC MP4 clip with the detected shot boundary transitions removed.

Model

Architecture: Shot-Query Transformer (DETR-style), 6 encoder + 6 decoder layers, ResNet18 backbone
Input: 100-frame windows at 128×96, ImageNet normalization
Output: Shot boundary frame indices + intra-shot relation (dissolve, wipe, fade, ...) + inter-shot relation (hard cut, sudden jump, ...)
Weights: Converted from the official PyTorch checkpoint, 363 tensors, float32

License & Credits

Paper: OmniShotCut (arXiv 2604.24762) by Boyang Wang et al.

MLX port by @eisneim. Weights hosted at eisneim/OmniShotCut_mlx.