SoraExplora commited on
Commit
8456c9f
·
verified ·
1 Parent(s): 5075568

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +182 -3
  2. config.json +37 -0
  3. model.safetensors +3 -0
  4. preprocessor_config.json +26 -0
README.md CHANGED
@@ -1,3 +1,182 @@
1
- ---
2
- license: cc0-1.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ 🧾 Model Card — VideoMAE-DeepFake-Detector-v1
5
+ 🧠 Model Overview
6
+
7
+ VideoMAE-DeepFake-Detector-v1 is a fine-tuned video deepfake detection model trained to distinguish between authentic and manipulated facial videos. The model builds upon the pretrained VideoMAE architecture and adapts it for binary classification of real versus synthetic videos.
8
+
9
+ The base model was originally trained on large-scale video action datasets, enabling strong spatiotemporal feature understanding. It was further fine-tuned on the FaceForensics++ dataset to detect visual artifacts, temporal inconsistencies, and manipulation signatures commonly found in deepfake videos.
10
+
11
+ By leveraging transformer-based video representation learning, the model captures both frame-level visual cues and motion patterns across time, allowing it to identify subtle manipulations that traditional image-based detectors may miss.
12
+
13
+ The model is designed for applications in media verification, misinformation detection, and AI-generated content monitoring.
14
+
15
+ 🏗️ Training Details
16
+
17
+ Base Model:
18
+ MCG-NJU/videomae-base-finetuned-kinetics
19
+
20
+ Framework:
21
+ Hugging Face Transformers + PyTorch
22
+
23
+ Training Hardware:
24
+ NVIDIA T4 GPU (Kaggle)
25
+
26
+ Epochs:
27
+ 15
28
+
29
+ Batch Size:
30
+ 4
31
+
32
+ Learning Rate:
33
+ 2e-5
34
+
35
+ Optimizer:
36
+ AdamW
37
+
38
+ Video Sampling:
39
+ 16 frames per video clip
40
+
41
+ Resolution:
42
+ 224 × 224
43
+
44
+ Training Strategy:
45
+
46
+ Transfer learning with partial freezing:
47
+
48
+ ~70% of VideoMAE backbone layers frozen
49
+
50
+ Final transformer layers + classifier head fine-tuned
51
+
52
+ Dataset:
53
+ FaceForensics++ (C23 compression level)
54
+
55
+ Classes:
56
+
57
+ 🟢 Real Video
58
+ 🔴 Deepfake Video
59
+
60
+ 📊 Dataset Description
61
+
62
+ The model was trained using the FaceForensics++ dataset, a widely used benchmark for deepfake detection research.
63
+
64
+ FaceForensics++ contains manipulated videos generated using multiple facial manipulation techniques, including deepfake generation and facial reenactment.
65
+
66
+ For this model version, training used a subset consisting of:
67
+
68
+ Original videos (real)
69
+
70
+ Deepfakes manipulation videos (fake)
71
+
72
+ Each video was processed by sampling 16 frames uniformly across its duration to capture both spatial and temporal artifacts.
73
+
74
+ Label Description
75
+ Real Authentic unmodified video
76
+ Fake Video manipulated using deepfake synthesis techniques
77
+ 🎯 Evaluation Metrics
78
+
79
+ Evaluation was performed on a held-out validation split of the dataset.
80
+
81
+ Metric Score
82
+ Train Loss 0.303
83
+ Validation Loss 0.506
84
+ Accuracy 88.0%
85
+ F1 Score 0.742
86
+ AUC 0.836
87
+
88
+ ✅ The model demonstrates strong ability to distinguish between authentic and manipulated videos using temporal visual patterns.
89
+
90
+ 💬 Example Usage
91
+ import torch
92
+ import numpy as np
93
+ from decord import VideoReader, cpu
94
+ from PIL import Image
95
+ from transformers import VideoMAEForVideoClassification, VideoMAEImageProcessor
96
+
97
+ model = VideoMAEForVideoClassification.from_pretrained(
98
+ "your_username/videomae-deepfake-detector"
99
+ )
100
+
101
+ processor = VideoMAEImageProcessor.from_pretrained(
102
+ "your_username/videomae-deepfake-detector"
103
+ )
104
+
105
+ def load_video_frames(video_path, num_frames=16):
106
+ vr = VideoReader(video_path, ctx=cpu(0))
107
+ total_frames = len(vr)
108
+
109
+ indices = np.linspace(0, total_frames - 1, num_frames).astype(int)
110
+ frames = vr.get_batch(indices).asnumpy()
111
+
112
+ return [Image.fromarray(f) for f in frames]
113
+
114
+ @torch.no_grad()
115
+ def predict(video_path):
116
+ frames = load_video_frames(video_path)
117
+ inputs = processor(frames, return_tensors="pt")
118
+
119
+ outputs = model(**inputs)
120
+ probs = torch.softmax(outputs.logits, dim=1)[0]
121
+
122
+ return {
123
+ "real": float(probs[0]),
124
+ "fake": float(probs[1])
125
+ }
126
+
127
+ print(predict("sample_video.mp4"))
128
+
129
+ Output example:
130
+
131
+ {'real': 0.96, 'fake': 0.04}
132
+ 🧩 Intended Use
133
+
134
+ Deepfake detection in video content
135
+ Media authenticity verification
136
+ AI-generated video detection pipelines
137
+ Research on manipulated media detection
138
+ Integration into misinformation monitoring systems
139
+
140
+ ⚠️ Limitations
141
+
142
+ The model was trained on a subset of FaceForensics++ and may not generalize perfectly to unseen deepfake generation techniques.
143
+
144
+ Performance may degrade on:
145
+
146
+ heavily compressed social media videos
147
+
148
+ unseen manipulation methods
149
+
150
+ partial face occlusions
151
+
152
+ extremely short clips
153
+
154
+ This model should be used as an assistive forensic tool, not as a definitive authenticity guarantee.
155
+
156
+ 🧑‍💻 Developer
157
+
158
+ Author: Vansh Momaya
159
+ Institution: D. J. Sanghvi College of Engineering
160
+ Focus Area: Computer Vision, AI Safety, Deepfake Detection, Video Understanding
161
+
162
+ Email: vanshmomaya9@gmail.com
163
+
164
+ 🌍 Citation
165
+
166
+ If you use this model in research or projects:
167
+
168
+ @online{momaya2025videomaedeepfake,
169
+ author = {Vansh Momaya},
170
+ title = {VideoMAE-DeepFake-Detector-v1},
171
+ year = {2025},
172
+ version = {v1},
173
+ url = {https://huggingface.co/Vansh180/VideoMae-deepfake-detector},
174
+ institution = {D. J. Sanghvi College of Engineering},
175
+ note = {Fine-tuned VideoMAE model for detecting deepfake videos using FaceForensics++},
176
+ license = {MIT}
177
+ }
178
+ 🚀 Acknowledgements
179
+
180
+ VideoMAE — Base architecture for video representation learning
181
+ FaceForensics++ — Deepfake detection dataset benchmark
182
+ Hugging Face Transformers — Training and deployment framework
config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "VideoMAEForVideoClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.0,
6
+ "decoder_hidden_size": 384,
7
+ "decoder_intermediate_size": 1536,
8
+ "decoder_num_attention_heads": 6,
9
+ "decoder_num_hidden_layers": 4,
10
+ "dtype": "float32",
11
+ "hidden_act": "gelu",
12
+ "hidden_dropout_prob": 0.0,
13
+ "hidden_size": 768,
14
+ "id2label": {
15
+ "0": "real",
16
+ "1": "fake"
17
+ },
18
+ "image_size": 224,
19
+ "initializer_range": 0.02,
20
+ "intermediate_size": 3072,
21
+ "label2id": {
22
+ "fake": 1,
23
+ "real": 0
24
+ },
25
+ "layer_norm_eps": 1e-12,
26
+ "model_type": "videomae",
27
+ "norm_pix_loss": false,
28
+ "num_attention_heads": 12,
29
+ "num_channels": 3,
30
+ "num_frames": 16,
31
+ "num_hidden_layers": 12,
32
+ "patch_size": 16,
33
+ "qkv_bias": true,
34
+ "transformers_version": "5.2.0",
35
+ "tubelet_size": 2,
36
+ "use_mean_pooling": true
37
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:293c668a5d3289d3162902d8cac6687cd11551cf2358768bad126de32b6d7f29
3
+ size 344937328
preprocessor_config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "crop_size": {
3
+ "height": 224,
4
+ "width": 224
5
+ },
6
+ "do_center_crop": true,
7
+ "do_normalize": true,
8
+ "do_rescale": true,
9
+ "do_resize": true,
10
+ "image_mean": [
11
+ 0.485,
12
+ 0.456,
13
+ 0.406
14
+ ],
15
+ "image_processor_type": "VideoMAEImageProcessor",
16
+ "image_std": [
17
+ 0.229,
18
+ 0.224,
19
+ 0.225
20
+ ],
21
+ "resample": 2,
22
+ "rescale_factor": 0.00392156862745098,
23
+ "size": {
24
+ "shortest_edge": 224
25
+ }
26
+ }