DziriBERT for Algerian Darija Misinformation Detection
Model Description
Fine-tuned DziriBERT model for detecting misinformation in Algerian Darija text.
Base Model: alger-ia/dziribert
Task: Multi-class classification (5 classes)
Classes
- F: Fake
- R: Real
- N: Non-new
- M: Misleading
- S: Satire
Performance
| Metric | Score |
|---|---|
| Accuracy | 77.27% |
| Macro F1 | 67.49% |
| Macro Precision | 68.51% |
| Macro Recall | 66.87% |
Per-Class Performance
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| F | 84.84% | 84.66% | 84.75% | 952 |
| R | 78.13% | 75.83% | 76.96% | 848 |
| N | 83.83% | 84.40% | 84.11% | 872 |
| M | 59.40% | 63.80% | 61.53% | 594 |
| S | 36.36% | 25.64% | 30.08% | 78 |
Usage
# test_load_from_hub.py
import os
# CRITICAL: Disable TensorFlow before importing transformers
os.environ['USE_TF'] = '0'
os.environ['USE_TORCH'] = '1'
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load from HuggingFace Hub
REPO_ID = "aurelius2023/dziribert-algerian-misinformation"
print("Loading model from Hugging Face Hub...")
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)
model = AutoModelForSequenceClassification.from_pretrained(REPO_ID)
print("✓ Model loaded successfully!")
# Test prediction
text = "وزير الشباب الجزائري يكشف ان الدول الاوروبيه تطلب من الجزائر حلولًا لمعالجه المشكلات الاجتماعيه لشبابها"
inputs = tokenizer(text, return_tensors="pt", max_length=128, truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
probs = torch.softmax(outputs.logits, dim=1)
pred = torch.argmax(probs).item()
confidence = probs[0][pred].item()
label_map = {0: 'F', 1: 'R', 2: 'N', 3: 'M', 4: 'S'}
label_names = {
'F': 'Fake', 'R': 'Real', 'N': 'Non-new',
'M': 'Misleading', 'S': 'Satire'
}
print(f"
Test Prediction:")
print(f"Text: {text}")
print(f"Predicted: {label_names[label_map[pred]]} ({label_map[pred]})")
print(f"Confidence: {confidence:.2%}")
Contact
For questions or issues, please open an issue on the model repository.
- Downloads last month
- 1
Evaluation results
- Macro F1self-reported0.675
- Accuracyself-reported0.773