File size: 4,796 Bytes
c633bae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
# OmniShotCut MLX

Shot Boundary Detection with OmniShotCut, ported to Apple MLX for native Mac inference.

Based on the paper: [OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer](https://arxiv.org/abs/2604.24762).

## Features

- Pure MLX inference β€” runs natively on Apple Silicon, zero PyTorch dependency at runtime
- Detects hard cuts, dissolves, fades, wipes, slides, zooms, doorways, and sudden jumps
- Tunable sensitivity for different video types (action, interview, vlog, film)

## Requirements

- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.10+
- `ffmpeg` (for video I/O)

```bash
pip install mlx mlx-metal numpy
```

## Quick Start

```bash
# 1. Clone and install
git clone https://github.com/eisneim/OmniShotCut_mlx.git
cd OmniShotCut_mlx

# 2. Download weights from HuggingFace
python omnishotcut_mlx/download_weights.py

# 3. Run on test videos
python run_inference.py
```

## Download Weights

```bash
# Auto-download from HuggingFace Hub (requires huggingface_hub)
pip install huggingface_hub
python omnishotcut_mlx/download_weights.py

# Or manually download from:
# https://huggingface.co/eisneim/OmniShotCut_mlx
# Place OmniShotCut.safetensors and config.json into ./weights/

# Alternative: download without huggingface_hub
curl -L -o weights/OmniShotCut.safetensors https://huggingface.co/eisneim/OmniShotCut_mlx/resolve/main/OmniShotCut.safetensors
curl -L -o weights/config.json https://huggingface.co/eisneim/OmniShotCut_mlx/resolve/main/config.json
```

## Usage

```bash
# Default: balanced detection
python run_inference.py

# Sensitive mode: more cuts, good for action/vlog videos
python run_inference.py --sensitive

# Conservative mode: fewer false positives, good for interviews/long takes
python run_inference.py --conservative

# Single video
python run_inference.py --video /path/to/video.mp4

# Custom output directory
python run_inference.py --output ./my_shots

# Fine-tuned control
python run_inference.py --context 12 --min-shot 0.8 --conf 0.1
```

### Tunable Parameters

| Parameter | Default | Range | Effect |
|-----------|---------|-------|--------|
| `--context` | 10 | 0–20 | Overlap frames between windows. Higher = fewer missed boundaries, but slower |
| `--min-shot` | 0.5 | 0.1–5.0 | Minimum shot duration in seconds. Higher = fewer false positives |
| `--conf` | 0.0 | 0.0–1.0 | Intra-class confidence threshold. E.g. 0.3 = keep only predictions model is >30% sure about |
| `--sensitive` | β€” | β€” | Shortcut: context=15, min-shot=0.3, conf=0 |
| `--conservative` | β€” | β€” | Shortcut: context=5, min-shot=1.5, conf=0.15 |

### Parameter Guide by Video Type

| Video Type | Recommended | Why |
|------------|------------|-----|
| Action / Sports | `--sensitive` | Fast cuts, many short shots |
| Vlog / YouTube | default or `--context 15` | Moderate pace, varied editing |
| Interview / Podcast | `--conservative` | Long takes, few cuts |
| Film / Cinema | default | Balanced |
| Animation | `--sensitive` | Frequent scene changes |
| Screen Recording | `--conservative` or `--min-shot 2.0` | Mostly static |

## Project Structure

```
OmniShotCut_mlx/
β”œβ”€β”€ run_inference.py               # Main entry point
β”œβ”€β”€ omnishotcut_mlx/
β”‚   β”œβ”€β”€ model.py                   # OmniShotCut MLX model
β”‚   β”œβ”€β”€ transformer.py             # Transformer encoder/decoder
β”‚   β”œβ”€β”€ resnet.py                  # ResNet18 backbone
β”‚   β”œβ”€β”€ position_encoding.py       # 3D sinusoidal position encoding
β”‚   β”œβ”€β”€ load_weights.py            # Weight loader (from safetensors)
β”‚   └── download_weights.py        # HuggingFace weight downloader
β”œβ”€β”€ weights/
β”‚   β”œβ”€β”€ OmniShotCut.safetensors    # MLX-native weights (~157MB)
β”‚   └── config.json                # Model configuration
└── test_data/                     # Place test videos here
```

## Output

Shots are saved as `shot_0000.mp4`, `shot_0001.mp4`, ... under `test_data/output/<video_name>/`.

Each shot file is a self-contained H.264/AAC MP4 clip with the detected shot boundary transitions removed.

## Model

- **Architecture**: Shot-Query Transformer (DETR-style), 6 encoder + 6 decoder layers, ResNet18 backbone
- **Input**: 100-frame windows at 128Γ—96, ImageNet normalization
- **Output**: Shot boundary frame indices + intra-shot relation (dissolve, wipe, fade, ...) + inter-shot relation (hard cut, sudden jump, ...)
- **Weights**: Converted from the official PyTorch checkpoint, 363 tensors, float32

## License & Credits

Paper: [OmniShotCut (arXiv 2604.24762)](https://arxiv.org/abs/2604.24762) by Boyang Wang et al.

MLX port by [@eisneim](https://github.com/eisneim). Weights hosted at [eisneim/OmniShotCut_mlx](https://huggingface.co/eisneim/OmniShotCut_mlx).