DeitFake: Deit-Based Deepfake Detection

Model Card for sakshamkr1/deepfake-fb-deit-vit-224

Model Description

DeitFake is a fine-tuned Vision Transformer (ViT) based on facebook/deit-base-patch16-224 for deepfake image classification. The model has been trained to classify images as 'Fake' or 'Real' using the Deepfake and Real Images dataset, which is derived from the OpenForensics Dataset.

Source Code

The source code of the model is available at DeitFake | Polymath-Saksh: GitHub

Intended Uses

This model is designed for research and educational purposes in deepfake detection and general image integrity verification.
Possible use cases:

Deepfake detection in research pipelines
Media authenticity analysis
Benchmarking transformer-based vision architectures for binary classification tasks

Not recommended for production-level forensic verification without further validation.

Training Data

The model was fine-tuned on the Deepfake and Real Images dataset (derived from OpenForensics). The dataset includes both artificially generated (fake) and real facial images.
To ensure balanced representation, random over-sampling was applied during the training phase.

Training Procedure

Fine-tuning was performed using Hugging Face’s transformers library (Trainer API):

Base model: facebook/deit-base-patch16-224
Epochs: 5
Learning rate: 1e-5
Weight decay: 0.01
Optimizer: AdamW
Mixed precision: fp16=True
Framework: PyTorch (CUDA enabled)
Loss function: CrossEntropyLoss

Evaluation Results (V2 Checkpoint)

Final performance metrics on the test set:

Metric	Value
Test Loss	0.0219
Accuracy	0.9922
Macro F1-Score	0.9922
AUROC	0.9997
Runtime (s)	48.26
Samples/sec	395.23
Steps/sec	6.18

Classification Report

Class	Precision	Recall	F1-Score	Support
Fake	0.9909	0.9936	0.9922	9521
Real	0.9936	0.9909	0.9922	9520
Accuracy			0.9922	19041
Macro avg	0.9922	0.9922	0.9922	19041
Weighted avg	0.9922	0.9922	0.9922	19041

How to Use

You can load and use this model easily with the Hugging Face Transformers library:

  from transformers import AutoFeatureExtractor, AutoModelForImageClassification
  from PIL import Image
  import torch

  # Load an image
  image = Image.open("sample_image.jpg")
  
  # Prepare inputs
  inputs = extractor(images=image, return_tensors="pt")
  
  # Run inference
  with torch.no_grad():
  outputs = model(**inputs)
  logits = outputs.logits
  predicted_class = logits.argmax(-1).item()
  
  labels = model.config.id2label
  print(f"Predicted class: {labels[predicted_class]}")

Citation

If you use this model in your research, please cite and credit as follows:

  @article{KUMAR2026100734,
  title = {DeiTFake: Deepfake detection model using DeiT multi-stage training},
  journal = {Array},
  pages = {100734},
  year = {2026},
  issn = {2590-0056},
  doi = {https://doi.org/10.1016/j.array.2026.100734},
  url = {https://www.sciencedirect.com/science/article/pii/S2590005626000573},
  author = {Saksham Kumar and Ashish Singh and Srinivasarao Thota and Sunil Kumar Singh and Chandan Kumar},
  keywords = {DeepFake detection, DeiT, Vision transformers, Transfer learning, Progressive training, OpenForensics},
  abstract = {Deepfakes are major threats to the integrity of digital media. We propose DeiTFake, a DeiT-based transformer and a two-stage progressive training strategy with increasing augmentation complexity. The approach applies an initial transfer-learning phase with standard augmentations, followed by a fine-tuning phase using advanced affine and color-based augmentations. We use DeiT models pre-trained weights, providing a strong initialization for learning manipulation artifacts, increasing the robustness of the detection model. Trained on a face-cropped dataset derived from the OpenForensics dataset (190,335 images), DeiTFake achieves 98.71% accuracy after stage one and 99.22% accuracy with an AUROC of 99.97%, after stage two, achieving strong performance under the same face-level evaluation setting. We analyze augmentation impact and training schedules, and provide practical benchmarks for facial deepfake detection.}
  }

arXiv Version (Pre-Print): arxiv.org/abs/2511.12048

Author

Developed by Saksham Kumar
LinkedIn: sakshamkr1

Downloads last month: 92

Safetensors

Model size

85.8M params

Tensor type

F32

Model tree for sakshamkr1/deitfake-v2

Base model

facebook/deit-base-patch16-224

Finetuned

(279)

this model

Paper for sakshamkr1/deitfake-v2

DeiTFake: Deepfake Detection Model using DeiT Multi-Stage Training

Paper • 2511.12048 • Published Nov 15, 2025 • 1