Update README with French docs and Space link
Browse files
README.md
CHANGED
|
@@ -14,67 +14,57 @@ datasets:
|
|
| 14 |
pipeline_tag: image-classification
|
| 15 |
---
|
| 16 |
|
| 17 |
-
# Multimodal Deepfake Detection Model
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
## Architecture
|
| 22 |
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
-
##
|
| 29 |
|
| 30 |
```python
|
| 31 |
-
from inference import load_model, classify_image, classify_text,
|
| 32 |
|
| 33 |
model, config = load_model('multimodal_ensemble.pt', device='cuda')
|
| 34 |
|
| 35 |
-
# Image
|
| 36 |
-
result = classify_image(model, 'face.jpg',
|
| 37 |
-
print(f"
|
| 38 |
-
# result['gradcam']
|
| 39 |
-
|
| 40 |
-
# Text classification
|
| 41 |
-
result = classify_text(model, 'This text was written by...')
|
| 42 |
-
print(f"Prediction: {result['prediction']} (confidence: {result['confidence']:.2%})")
|
| 43 |
|
| 44 |
-
#
|
| 45 |
-
result =
|
| 46 |
-
print(f"
|
| 47 |
|
| 48 |
-
# Multimodal (image +
|
| 49 |
result = classify_multimodal(model, image_path_or_pil='face.jpg', text='Caption...')
|
| 50 |
-
print(f"
|
| 51 |
```
|
| 52 |
|
| 53 |
-
##
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
- **Visual**: [Hemg/deepfake-and-real-images](https://huggingface.co/datasets/Hemg/deepfake-and-real-images) — 140K+ face images (real vs deepfake)
|
| 57 |
-
- **Text**: [artem9k/ai-text-detection-pile](https://huggingface.co/datasets/artem9k/ai-text-detection-pile) — 1.9GB human vs AI-generated text
|
| 58 |
-
|
| 59 |
-
### Recipe
|
| 60 |
-
| Component | Config |
|
| 61 |
-
|-----------|--------|
|
| 62 |
-
| Visual backbone | EfficientNet-B0 |
|
| 63 |
-
| Visual optimizer | Adam, lr=1e-4, cosine annealing, 8 epochs |
|
| 64 |
-
| Text backbone | RoBERTa-base |
|
| 65 |
-
| Text optimizer | AdamW, lr=2e-5, warmup+cosine, 5 epochs |
|
| 66 |
-
| Augmentations | RandomFlip, Rotation, ColorJitter, GaussianBlur, RandomErasing |
|
| 67 |
|
| 68 |
-
##
|
| 69 |
-
|
| 70 |
-
-
|
| 71 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
-
##
|
| 74 |
-
-
|
| 75 |
-
- `preprocessing.py` — Data pipeline (images, video frames, text)
|
| 76 |
-
- `inference.py` — Inference API (single/modality, multimodal, video)
|
| 77 |
-
- `train.py` — Training script
|
| 78 |
|
| 79 |
-
##
|
| 80 |
Apache-2.0
|
|
|
|
| 14 |
pipeline_tag: image-classification
|
| 15 |
---
|
| 16 |
|
| 17 |
+
# 🕵️ Multimodal Deepfake Detection Model
|
| 18 |
|
| 19 |
+
Modèle d'ensemble multimodal pour détecter les contenus générés par IA (images, vidéos, texte) avec explicabilité GradCAM.
|
| 20 |
|
| 21 |
+
## 🏗️ Architecture
|
| 22 |
|
| 23 |
+
```
|
| 24 |
+
Visual Branch EfficientNet-B0/B4 ──┐
|
| 25 |
+
├──► Fusion pondérée ──► Confidence [0-1]
|
| 26 |
+
Text Branch RoBERTa-base ──┘
|
| 27 |
+
↓
|
| 28 |
+
GradCAM Heatmap (explicabilité)
|
| 29 |
+
```
|
| 30 |
|
| 31 |
+
## 🚀 Utilisation Rapide
|
| 32 |
|
| 33 |
```python
|
| 34 |
+
from inference import load_model, classify_image, classify_text, classify_multimodal
|
| 35 |
|
| 36 |
model, config = load_model('multimodal_ensemble.pt', device='cuda')
|
| 37 |
|
| 38 |
+
# Image + GradCAM
|
| 39 |
+
result = classify_image(model, 'face.jpg', return_gradcam=True)
|
| 40 |
+
print(f"{result['prediction']} — confidence: {result['confidence']:.2%}")
|
| 41 |
+
# result['gradcam'] → heatmap (224, 224)
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
+
# Texte
|
| 44 |
+
result = classify_text(model, 'Cet essai analyse les impacts...')
|
| 45 |
+
print(f"{result['prediction']} — confidence: {result['confidence']:.2%}")
|
| 46 |
|
| 47 |
+
# Multimodal (image + texte)
|
| 48 |
result = classify_multimodal(model, image_path_or_pil='face.jpg', text='Caption...')
|
| 49 |
+
print(f"Fusion: {result['prediction']} — poids: {result['fusion_weights']}")
|
| 50 |
```
|
| 51 |
|
| 52 |
+
## 📊 Jeux de Données
|
| 53 |
+
- **Visuel**: [Hemg/deepfake-and-real-images](https://huggingface.co/datasets/Hemg/deepfake-and-real-images) — 528K images
|
| 54 |
+
- **Texte**: [artem9k/ai-text-detection-pile](https://huggingface.co/datasets/artem9k/ai-text-detection-pile) — 1.88GB
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
+
## 📁 Fichiers
|
| 57 |
+
| Fichier | Description |
|
| 58 |
+
|---------|-------------|
|
| 59 |
+
| `model.py` | Architecture complète |
|
| 60 |
+
| `preprocessing.py` | Pipeline de données |
|
| 61 |
+
| `inference.py` | API d'inférence |
|
| 62 |
+
| `train.py` / `train_optimised.py` | Scripts d'entraînement |
|
| 63 |
+
| `multimodal_ensemble.pt` | Checkpoint principal |
|
| 64 |
+
| `gradcam_examples/` | Visualisations d'explicabilité |
|
| 65 |
|
| 66 |
+
## 🔗 Space de Démo
|
| 67 |
+
[alianassmaaa/multimodal-deepfake-space](https://huggingface.co/spaces/alianassmaaa/multimodal-deepfake-space)
|
|
|
|
|
|
|
|
|
|
| 68 |
|
| 69 |
+
## 📄 Licence
|
| 70 |
Apache-2.0
|