YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

OmniShotCut MLX

Shot Boundary Detection with OmniShotCut, ported to Apple MLX for native Mac inference.

Based on the paper: OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer.

Features

  • Pure MLX inference β€” runs natively on Apple Silicon, zero PyTorch dependency at runtime
  • Detects hard cuts, dissolves, fades, wipes, slides, zooms, doorways, and sudden jumps
  • Tunable sensitivity for different video types (action, interview, vlog, film)

Requirements

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • Python 3.10+
  • ffmpeg (for video I/O)
pip install mlx mlx-metal numpy

Quick Start

# 1. Clone and install
git clone https://github.com/eisneim/OmniShotCut_mlx.git
cd OmniShotCut_mlx

# 2. Download weights from HuggingFace
python omnishotcut_mlx/download_weights.py

# 3. Run on test videos
python run_inference.py

Download Weights

# Auto-download from HuggingFace Hub (requires huggingface_hub)
pip install huggingface_hub
python omnishotcut_mlx/download_weights.py

# Or manually download from:
# https://huggingface.co/eisneim/OmniShotCut_mlx
# Place OmniShotCut.safetensors and config.json into ./weights/

# Alternative: download without huggingface_hub
curl -L -o weights/OmniShotCut.safetensors https://huggingface.co/eisneim/OmniShotCut_mlx/resolve/main/OmniShotCut.safetensors
curl -L -o weights/config.json https://huggingface.co/eisneim/OmniShotCut_mlx/resolve/main/config.json

Usage

# Default: balanced detection
python run_inference.py

# Sensitive mode: more cuts, good for action/vlog videos
python run_inference.py --sensitive

# Conservative mode: fewer false positives, good for interviews/long takes
python run_inference.py --conservative

# Single video
python run_inference.py --video /path/to/video.mp4

# Custom output directory
python run_inference.py --output ./my_shots

# Fine-tuned control
python run_inference.py --context 12 --min-shot 0.8 --conf 0.1

Tunable Parameters

Parameter Default Range Effect
--context 10 0–20 Overlap frames between windows. Higher = fewer missed boundaries, but slower
--min-shot 0.5 0.1–5.0 Minimum shot duration in seconds. Higher = fewer false positives
--conf 0.0 0.0–1.0 Intra-class confidence threshold. E.g. 0.3 = keep only predictions model is >30% sure about
--sensitive β€” β€” Shortcut: context=15, min-shot=0.3, conf=0
--conservative β€” β€” Shortcut: context=5, min-shot=1.5, conf=0.15

Parameter Guide by Video Type

Video Type Recommended Why
Action / Sports --sensitive Fast cuts, many short shots
Vlog / YouTube default or --context 15 Moderate pace, varied editing
Interview / Podcast --conservative Long takes, few cuts
Film / Cinema default Balanced
Animation --sensitive Frequent scene changes
Screen Recording --conservative or --min-shot 2.0 Mostly static

Project Structure

OmniShotCut_mlx/
β”œβ”€β”€ run_inference.py               # Main entry point
β”œβ”€β”€ omnishotcut_mlx/
β”‚   β”œβ”€β”€ model.py                   # OmniShotCut MLX model
β”‚   β”œβ”€β”€ transformer.py             # Transformer encoder/decoder
β”‚   β”œβ”€β”€ resnet.py                  # ResNet18 backbone
β”‚   β”œβ”€β”€ position_encoding.py       # 3D sinusoidal position encoding
β”‚   β”œβ”€β”€ load_weights.py            # Weight loader (from safetensors)
β”‚   └── download_weights.py        # HuggingFace weight downloader
β”œβ”€β”€ weights/
β”‚   β”œβ”€β”€ OmniShotCut.safetensors    # MLX-native weights (~157MB)
β”‚   └── config.json                # Model configuration
└── test_data/                     # Place test videos here

Output

Shots are saved as shot_0000.mp4, shot_0001.mp4, ... under test_data/output/<video_name>/.

Each shot file is a self-contained H.264/AAC MP4 clip with the detected shot boundary transitions removed.

Model

  • Architecture: Shot-Query Transformer (DETR-style), 6 encoder + 6 decoder layers, ResNet18 backbone
  • Input: 100-frame windows at 128Γ—96, ImageNet normalization
  • Output: Shot boundary frame indices + intra-shot relation (dissolve, wipe, fade, ...) + inter-shot relation (hard cut, sudden jump, ...)
  • Weights: Converted from the official PyTorch checkpoint, 363 tensors, float32

License & Credits

Paper: OmniShotCut (arXiv 2604.24762) by Boyang Wang et al.

MLX port by @eisneim. Weights hosted at eisneim/OmniShotCut_mlx.

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for eisneim/OmniShotCut_mlx