Roman Urdu Emotion Classifier (XLM-R)

A fine-tuned XLM-RoBERTa model for 7-class emotion classification in Roman Urdu text, covering Ekman's six basic emotions plus a none class.

Model Details

Field Value
Base model Khubaib01/roman-urdu-sentiment-xlm-r
Task Multi-class emotion classification
Language Roman Urdu (Romanized Urdu / code-switched)
Classes 7 (anger, disgust, fear, joy, sadness, surprise, none)
Parameters ~278M (XLM-R base + custom head)
Max input length 128 tokens

Emotion Classes

Label Ekman Emotion
anger Basic
disgust Basic
fear Basic
joy Basic
sadness Basic
surprise Basic
none No emotion

Performance

Metrics calculated on test(unseen) dataset.

Metric Score
Macro F1 0.7149
Accuracy 0.7148

Per-class F1 (Test Set)

Emotion F1 Score
Anger 0.5962
Disgust 0.5672
Fear 0.8467
Happy 0.7276
Sad 0.6379
Surprise 0.9349
None 0.6937

Architecture

Input (Roman Urdu text)
    β†’ XLM-R Tokenizer (max_length=128)
    β†’ XLM-R Encoder (12 layers, hidden=768)
    β†’ [CLS] token representation
    β†’ LayerNorm β†’ Dropout(0.3)
    β†’ Linear(768 β†’ 256) β†’ GELU β†’ Dropout(0.15)
    β†’ Linear(256 β†’ 7)
    β†’ CrossEntropyLoss (label_smoothing=0.1)

Training Details

Hyperparameter Value
Base LR (encoder) 2e-5
Head LR 1e-4
Layer-wise LR decay 0.95 per layer
Scheduler Cosine with 10% warm-up
Epochs Up to 10 (early stopping)
Batch size 16
Label smoothing 0.1
Dropout 0.3
Optimizer AdamW
Mixed precision fp16

Key technique: Discriminative layer-wise learning rates β€” the classification head trains at 5Γ— the base encoder rate, while lower encoder layers decay further to preserve pre-trained multilingual representations.

Dataset

  • Total samples: 21,000
  • Split: 80% train / 10% val / 10% test
  • Balance: Perfectly balanced (3,000 samples per class)
  • Language: Roman Urdu (social media messages)

Usage

⚠️ Important β€” Custom Architecture

This model uses a custom PyTorch nn.Module, not a standard HuggingFace PreTrainedModel. You must define the class before loading weights.

from transformers import pipeline

HF_REPO_ID = "Khubaib01/roman-urdu-emotion-xlmr"

pipe = pipeline(
        "text-classification",
        model="f{HF_REPO_ID}",
        trust_remote_code=True,
        top_k=None,
    )
pipe("bhai ab mera kia hoga")

Limitations

  • Trained on a specific Roman Urdu dataset; performance may vary on highly code-switched (Urdu + English) or dialectal text.
  • The none class captures emotionally neutral text but may misfire on sarcasm or implicit sentiment.
  • Not evaluated on formal/literary Urdu.

Citation

If you use this model in your research, please cite:

@misc{roman-urdu-emotion-xlmr,
  author    = {Muhammad Khubaib Ahmad},
  title     = {Roman Urdu Emotion Classifier (XLM-R)},
  year      = {2026},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/Khubaib01/roman-urdu-emotion-xlmr}
}
Downloads last month
46
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Khubaib01/roman-urdu-emotion-xlmr

Finetuned
(1)
this model
Finetunes
1 model

Evaluation results