eisneim commited on
Commit
c633bae
Β·
verified Β·
1 Parent(s): b9b423c

Upload folder using huggingface_hub

Browse files
Files changed (3) hide show
  1. OmniShotCut.safetensors +3 -0
  2. README.md +131 -3
  3. config.json +16 -0
OmniShotCut.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce52cb008514a6b5fbfd2f6ee6c2865a9845e464f50166104c0c9ecbfd347152
3
+ size 164079848
README.md CHANGED
@@ -1,3 +1,131 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # OmniShotCut MLX
2
+
3
+ Shot Boundary Detection with OmniShotCut, ported to Apple MLX for native Mac inference.
4
+
5
+ Based on the paper: [OmniShotCut: Holistic Relational Shot Boundary Detection with Shot-Query Transformer](https://arxiv.org/abs/2604.24762).
6
+
7
+ ## Features
8
+
9
+ - Pure MLX inference β€” runs natively on Apple Silicon, zero PyTorch dependency at runtime
10
+ - Detects hard cuts, dissolves, fades, wipes, slides, zooms, doorways, and sudden jumps
11
+ - Tunable sensitivity for different video types (action, interview, vlog, film)
12
+
13
+ ## Requirements
14
+
15
+ - macOS with Apple Silicon (M1/M2/M3/M4)
16
+ - Python 3.10+
17
+ - `ffmpeg` (for video I/O)
18
+
19
+ ```bash
20
+ pip install mlx mlx-metal numpy
21
+ ```
22
+
23
+ ## Quick Start
24
+
25
+ ```bash
26
+ # 1. Clone and install
27
+ git clone https://github.com/eisneim/OmniShotCut_mlx.git
28
+ cd OmniShotCut_mlx
29
+
30
+ # 2. Download weights from HuggingFace
31
+ python omnishotcut_mlx/download_weights.py
32
+
33
+ # 3. Run on test videos
34
+ python run_inference.py
35
+ ```
36
+
37
+ ## Download Weights
38
+
39
+ ```bash
40
+ # Auto-download from HuggingFace Hub (requires huggingface_hub)
41
+ pip install huggingface_hub
42
+ python omnishotcut_mlx/download_weights.py
43
+
44
+ # Or manually download from:
45
+ # https://huggingface.co/eisneim/OmniShotCut_mlx
46
+ # Place OmniShotCut.safetensors and config.json into ./weights/
47
+
48
+ # Alternative: download without huggingface_hub
49
+ curl -L -o weights/OmniShotCut.safetensors https://huggingface.co/eisneim/OmniShotCut_mlx/resolve/main/OmniShotCut.safetensors
50
+ curl -L -o weights/config.json https://huggingface.co/eisneim/OmniShotCut_mlx/resolve/main/config.json
51
+ ```
52
+
53
+ ## Usage
54
+
55
+ ```bash
56
+ # Default: balanced detection
57
+ python run_inference.py
58
+
59
+ # Sensitive mode: more cuts, good for action/vlog videos
60
+ python run_inference.py --sensitive
61
+
62
+ # Conservative mode: fewer false positives, good for interviews/long takes
63
+ python run_inference.py --conservative
64
+
65
+ # Single video
66
+ python run_inference.py --video /path/to/video.mp4
67
+
68
+ # Custom output directory
69
+ python run_inference.py --output ./my_shots
70
+
71
+ # Fine-tuned control
72
+ python run_inference.py --context 12 --min-shot 0.8 --conf 0.1
73
+ ```
74
+
75
+ ### Tunable Parameters
76
+
77
+ | Parameter | Default | Range | Effect |
78
+ |-----------|---------|-------|--------|
79
+ | `--context` | 10 | 0–20 | Overlap frames between windows. Higher = fewer missed boundaries, but slower |
80
+ | `--min-shot` | 0.5 | 0.1–5.0 | Minimum shot duration in seconds. Higher = fewer false positives |
81
+ | `--conf` | 0.0 | 0.0–1.0 | Intra-class confidence threshold. E.g. 0.3 = keep only predictions model is >30% sure about |
82
+ | `--sensitive` | β€” | β€” | Shortcut: context=15, min-shot=0.3, conf=0 |
83
+ | `--conservative` | β€” | β€” | Shortcut: context=5, min-shot=1.5, conf=0.15 |
84
+
85
+ ### Parameter Guide by Video Type
86
+
87
+ | Video Type | Recommended | Why |
88
+ |------------|------------|-----|
89
+ | Action / Sports | `--sensitive` | Fast cuts, many short shots |
90
+ | Vlog / YouTube | default or `--context 15` | Moderate pace, varied editing |
91
+ | Interview / Podcast | `--conservative` | Long takes, few cuts |
92
+ | Film / Cinema | default | Balanced |
93
+ | Animation | `--sensitive` | Frequent scene changes |
94
+ | Screen Recording | `--conservative` or `--min-shot 2.0` | Mostly static |
95
+
96
+ ## Project Structure
97
+
98
+ ```
99
+ OmniShotCut_mlx/
100
+ β”œβ”€β”€ run_inference.py # Main entry point
101
+ β”œβ”€β”€ omnishotcut_mlx/
102
+ β”‚ β”œβ”€β”€ model.py # OmniShotCut MLX model
103
+ β”‚ β”œβ”€β”€ transformer.py # Transformer encoder/decoder
104
+ β”‚ β”œβ”€β”€ resnet.py # ResNet18 backbone
105
+ β”‚ β”œβ”€β”€ position_encoding.py # 3D sinusoidal position encoding
106
+ β”‚ β”œβ”€β”€ load_weights.py # Weight loader (from safetensors)
107
+ β”‚ └── download_weights.py # HuggingFace weight downloader
108
+ β”œβ”€β”€ weights/
109
+ β”‚ β”œβ”€β”€ OmniShotCut.safetensors # MLX-native weights (~157MB)
110
+ β”‚ └── config.json # Model configuration
111
+ └── test_data/ # Place test videos here
112
+ ```
113
+
114
+ ## Output
115
+
116
+ Shots are saved as `shot_0000.mp4`, `shot_0001.mp4`, ... under `test_data/output/<video_name>/`.
117
+
118
+ Each shot file is a self-contained H.264/AAC MP4 clip with the detected shot boundary transitions removed.
119
+
120
+ ## Model
121
+
122
+ - **Architecture**: Shot-Query Transformer (DETR-style), 6 encoder + 6 decoder layers, ResNet18 backbone
123
+ - **Input**: 100-frame windows at 128Γ—96, ImageNet normalization
124
+ - **Output**: Shot boundary frame indices + intra-shot relation (dissolve, wipe, fade, ...) + inter-shot relation (hard cut, sudden jump, ...)
125
+ - **Weights**: Converted from the official PyTorch checkpoint, 363 tensors, float32
126
+
127
+ ## License & Credits
128
+
129
+ Paper: [OmniShotCut (arXiv 2604.24762)](https://arxiv.org/abs/2604.24762) by Boyang Wang et al.
130
+
131
+ MLX port by [@eisneim](https://github.com/eisneim). Weights hosted at [eisneim/OmniShotCut_mlx](https://huggingface.co/eisneim/OmniShotCut_mlx).
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "OmniShotCut",
3
+ "hidden_dim": 384,
4
+ "num_encoder_layers": 6,
5
+ "num_decoder_layers": 6,
6
+ "nhead": 8,
7
+ "dim_feedforward": 2048,
8
+ "num_queries": 24,
9
+ "num_frames": 100,
10
+ "process_height": 96,
11
+ "process_width": 128,
12
+ "num_intra_relation_classes": 10,
13
+ "num_inter_relation_classes": 7,
14
+ "backbone": "resnet18",
15
+ "dropout": 0.0
16
+ }