Roman Urdu Emotion Classifier (XLM-R)
A fine-tuned XLM-RoBERTa model for 7-class emotion classification in Roman Urdu text, covering Ekman's six basic emotions plus a none class.
Model Details
| Field | Value |
|---|---|
| Base model | Khubaib01/roman-urdu-sentiment-xlm-r |
| Task | Multi-class emotion classification |
| Language | Roman Urdu (Romanized Urdu / code-switched) |
| Classes | 7 (anger, disgust, fear, joy, sadness, surprise, none) |
| Parameters | ~278M (XLM-R base + custom head) |
| Max input length | 128 tokens |
Emotion Classes
| Label | Ekman Emotion |
|---|---|
| anger | Basic |
| disgust | Basic |
| fear | Basic |
| joy | Basic |
| sadness | Basic |
| surprise | Basic |
| none | No emotion |
Performance
Metrics calculated on test(unseen) dataset.
| Metric | Score |
|---|---|
| Macro F1 | 0.7149 |
| Accuracy | 0.7148 |
Per-class F1 (Test Set)
| Emotion | F1 Score |
|---|---|
| Anger | 0.5962 |
| Disgust | 0.5672 |
| Fear | 0.8467 |
| Happy | 0.7276 |
| Sad | 0.6379 |
| Surprise | 0.9349 |
| None | 0.6937 |
Architecture
Input (Roman Urdu text)
β XLM-R Tokenizer (max_length=128)
β XLM-R Encoder (12 layers, hidden=768)
β [CLS] token representation
β LayerNorm β Dropout(0.3)
β Linear(768 β 256) β GELU β Dropout(0.15)
β Linear(256 β 7)
β CrossEntropyLoss (label_smoothing=0.1)
Training Details
| Hyperparameter | Value |
|---|---|
| Base LR (encoder) | 2e-5 |
| Head LR | 1e-4 |
| Layer-wise LR decay | 0.95 per layer |
| Scheduler | Cosine with 10% warm-up |
| Epochs | Up to 10 (early stopping) |
| Batch size | 16 |
| Label smoothing | 0.1 |
| Dropout | 0.3 |
| Optimizer | AdamW |
| Mixed precision | fp16 |
Key technique: Discriminative layer-wise learning rates β the classification head trains at 5Γ the base encoder rate, while lower encoder layers decay further to preserve pre-trained multilingual representations.
Dataset
- Total samples: 21,000
- Split: 80% train / 10% val / 10% test
- Balance: Perfectly balanced (3,000 samples per class)
- Language: Roman Urdu (social media messages)
Usage
β οΈ Important β Custom Architecture
This model uses a custom PyTorch nn.Module, not a standard HuggingFace
PreTrainedModel. You must define the class before loading weights.
from transformers import pipeline
HF_REPO_ID = "Khubaib01/roman-urdu-emotion-xlmr"
pipe = pipeline(
"text-classification",
model="f{HF_REPO_ID}",
trust_remote_code=True,
top_k=None,
)
pipe("bhai ab mera kia hoga")
Limitations
- Trained on a specific Roman Urdu dataset; performance may vary on highly code-switched (Urdu + English) or dialectal text.
- The none class captures emotionally neutral text but may misfire on sarcasm or implicit sentiment.
- Not evaluated on formal/literary Urdu.
Citation
If you use this model in your research, please cite:
@misc{roman-urdu-emotion-xlmr,
author = {Muhammad Khubaib Ahmad},
title = {Roman Urdu Emotion Classifier (XLM-R)},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/Khubaib01/roman-urdu-emotion-xlmr}
}
- Downloads last month
- 46
Model tree for Khubaib01/roman-urdu-emotion-xlmr
Evaluation results
- Macro F1self-reported0.715
- Accuracyself-reported0.715