alianassmaaa commited on
Commit
d878216
·
verified ·
1 Parent(s): fffc143

Update README with French docs and Space link

Browse files
Files changed (1) hide show
  1. README.md +36 -46
README.md CHANGED
@@ -14,67 +14,57 @@ datasets:
14
  pipeline_tag: image-classification
15
  ---
16
 
17
- # Multimodal Deepfake Detection Model
18
 
19
- A multimodal ensemble model that classifies images, video frames, and text as **real** or **AI-generated/fake**, with confidence scores and GradCAM explainability maps.
20
 
21
- ## Architecture
22
 
23
- **Visual Branch**: EfficientNet-B0 (ImageNet pretrained) with L2-normalized features for image/video frame classification
24
- **Text Branch**: RoBERTa-base with mean pooling and MLP head for AI-generated text detection
25
- **Fusion Layer**: Learnable weighted late ensemble combining visual + text probabilities
26
- **Explainability**: GradCAM heatmaps on EfficientNet convolutional layers
 
 
 
27
 
28
- ## Usage
29
 
30
  ```python
31
- from inference import load_model, classify_image, classify_text, classify_video, classify_multimodal
32
 
33
  model, config = load_model('multimodal_ensemble.pt', device='cuda')
34
 
35
- # Image with GradCAM explainability
36
- result = classify_image(model, 'face.jpg', device='cuda', return_gradcam=True)
37
- print(f"Prediction: {result['prediction']} (confidence: {result['confidence']:.2%})")
38
- # result['gradcam'] contains the explainability heatmap
39
-
40
- # Text classification
41
- result = classify_text(model, 'This text was written by...')
42
- print(f"Prediction: {result['prediction']} (confidence: {result['confidence']:.2%})")
43
 
44
- # Video classification
45
- result = classify_video(model, 'video.mp4', num_frames=32, aggregation='mean')
46
- print(f"Video: {result['prediction']} (confidence: {result['confidence']:.2%})")
47
 
48
- # Multimodal (image + text)
49
  result = classify_multimodal(model, image_path_or_pil='face.jpg', text='Caption...')
50
- print(f"Combined: {result['prediction']} — Weights: {result['fusion_weights']}")
51
  ```
52
 
53
- ## Training
54
-
55
- ### Datasets
56
- - **Visual**: [Hemg/deepfake-and-real-images](https://huggingface.co/datasets/Hemg/deepfake-and-real-images) — 140K+ face images (real vs deepfake)
57
- - **Text**: [artem9k/ai-text-detection-pile](https://huggingface.co/datasets/artem9k/ai-text-detection-pile) — 1.9GB human vs AI-generated text
58
-
59
- ### Recipe
60
- | Component | Config |
61
- |-----------|--------|
62
- | Visual backbone | EfficientNet-B0 |
63
- | Visual optimizer | Adam, lr=1e-4, cosine annealing, 8 epochs |
64
- | Text backbone | RoBERTa-base |
65
- | Text optimizer | AdamW, lr=2e-5, warmup+cosine, 5 epochs |
66
- | Augmentations | RandomFlip, Rotation, ColorJitter, GaussianBlur, RandomErasing |
67
 
68
- ### Based on Research
69
- - **AWARE-NET** (arxiv:2505.00312): Learnable weighted fusion
70
- - **CLIP Deepfake** (arxiv:2503.19683): L2-normalized feature spaces
71
- - **DeTeCtive** (arxiv:2410.20964): RoBERTa for AI text detection
 
 
 
 
 
72
 
73
- ## Files
74
- - `model.py` — Architecture (GradCAM, EfficientNet, RoBERTa, Fusion)
75
- - `preprocessing.py` — Data pipeline (images, video frames, text)
76
- - `inference.py` — Inference API (single/modality, multimodal, video)
77
- - `train.py` — Training script
78
 
79
- ## License
80
  Apache-2.0
 
14
  pipeline_tag: image-classification
15
  ---
16
 
17
+ # 🕵️ Multimodal Deepfake Detection Model
18
 
19
+ Modèle d'ensemble multimodal pour détecter les contenus générés par IA (images, vidéos, texte) avec explicabilité GradCAM.
20
 
21
+ ## 🏗️ Architecture
22
 
23
+ ```
24
+ Visual Branch EfficientNet-B0/B4 ──┐
25
+ ├──► Fusion pondérée ──► Confidence [0-1]
26
+ Text Branch RoBERTa-base ──┘
27
+
28
+ GradCAM Heatmap (explicabilité)
29
+ ```
30
 
31
+ ## 🚀 Utilisation Rapide
32
 
33
  ```python
34
+ from inference import load_model, classify_image, classify_text, classify_multimodal
35
 
36
  model, config = load_model('multimodal_ensemble.pt', device='cuda')
37
 
38
+ # Image + GradCAM
39
+ result = classify_image(model, 'face.jpg', return_gradcam=True)
40
+ print(f"{result['prediction']} confidence: {result['confidence']:.2%}")
41
+ # result['gradcam'] heatmap (224, 224)
 
 
 
 
42
 
43
+ # Texte
44
+ result = classify_text(model, 'Cet essai analyse les impacts...')
45
+ print(f"{result['prediction']} confidence: {result['confidence']:.2%}")
46
 
47
+ # Multimodal (image + texte)
48
  result = classify_multimodal(model, image_path_or_pil='face.jpg', text='Caption...')
49
+ print(f"Fusion: {result['prediction']} — poids: {result['fusion_weights']}")
50
  ```
51
 
52
+ ## 📊 Jeux de Données
53
+ - **Visuel**: [Hemg/deepfake-and-real-images](https://huggingface.co/datasets/Hemg/deepfake-and-real-images) — 528K images
54
+ - **Texte**: [artem9k/ai-text-detection-pile](https://huggingface.co/datasets/artem9k/ai-text-detection-pile) — 1.88GB
 
 
 
 
 
 
 
 
 
 
 
55
 
56
+ ## 📁 Fichiers
57
+ | Fichier | Description |
58
+ |---------|-------------|
59
+ | `model.py` | Architecture complète |
60
+ | `preprocessing.py` | Pipeline de données |
61
+ | `inference.py` | API d'inférence |
62
+ | `train.py` / `train_optimised.py` | Scripts d'entraînement |
63
+ | `multimodal_ensemble.pt` | Checkpoint principal |
64
+ | `gradcam_examples/` | Visualisations d'explicabilité |
65
 
66
+ ## 🔗 Space de Démo
67
+ [alianassmaaa/multimodal-deepfake-space](https://huggingface.co/spaces/alianassmaaa/multimodal-deepfake-space)
 
 
 
68
 
69
+ ## 📄 Licence
70
  Apache-2.0