Instructions to use boffire/kabyle-emotion-afro-xlmr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use boffire/kabyle-emotion-afro-xlmr with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="boffire/kabyle-emotion-afro-xlmr")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("boffire/kabyle-emotion-afro-xlmr") model = AutoModelForSequenceClassification.from_pretrained("boffire/kabyle-emotion-afro-xlmr") - Notebooks
- Google Colab
- Kaggle
Kabyle Emotion Classifier (AfroXLMR-Large + GoEmotions)
A fine-tuned AfroXLMR-Large model for 28-class emotion recognition in Kabyle (Taqbaylit), a low-resource Afro-Asiatic and Amazigh language spoken in Algeria.
This is the third iteration of the Kabyle emotion model, upgrading from XLM-RoBERTa-base to AfroXLMR-Large and from 6-class Ekman labels to 28-class GoEmotions fine-grained labels.
Model Details
| Attribute | Value |
|---|---|
| Base model | Davlan/afro-xlmr-large (AfroXLMR-Large, ~560M params) |
| Architecture | XLM-RoBERTa for Sequence Classification |
| Parameters | ~560M |
| Language | Kabyle (kab) |
| Task | Text Classification (Emotion Detection) |
| Classes | 28 — GoEmotions taxonomy |
| Best checkpoint | Epoch 5 (loaded via load_best_model_at_end) |
28 Emotion Classes
admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, neutral, optimism, pride, realization, relief, remorse, sadness, surprise
Training Data
The model was trained via cross-lingual label transfer from English to Kabyle using parallel sentence pairs:
- Round-trip parallel corpus (
eng_kab_roundtrip_good.tsv) — 131,301 English-Kabyle sentence pairs with back-translation quality scores. - Tatoeba parallel corpus — 138,353 additional English-Kabyle linked sentences from tatoeba.org.
Labeling pipeline:
- English sentences were labeled with
cirimus/modernbert-base-go-emotions(28-class GoEmotions classifier). - The single best GoEmotions label and its raw sigmoid confidence were transferred to the Kabyle side via sentence alignment.
- Per-class adaptive thresholds and caps were applied to balance the dataset across all 28 labels.
Final balanced dataset:
- Total labeled rows (raw): ~204,000
- Final training set: 46,516 rows
- Validation set: 6,203 rows
- Test set: 9,304 rows
Performance
Validation Set (Epoch 5)
| Metric | Score |
|---|---|
| F1 (weighted) | 0.817 |
| Accuracy | 0.815 |
Test Set Results (9,304 samples)
| Emotion | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| admiration | 0.663 | 0.523 | 0.585 | 900 |
| amusement | 0.746 | 0.730 | 0.738 | 137 |
| anger | 0.577 | 0.518 | 0.546 | 326 |
| annoyance | 0.326 | 0.127 | 0.183 | 118 |
| approval | 0.519 | 0.388 | 0.444 | 417 |
| caring | 0.622 | 0.313 | 0.416 | 521 |
| confusion | 0.701 | 0.653 | 0.676 | 288 |
| curiosity | 0.938 | 0.977 | 0.957 | 1200 |
| desire | 0.880 | 0.885 | 0.882 | 479 |
| disappointment | 0.319 | 0.285 | 0.301 | 130 |
| disapproval | 0.691 | 0.724 | 0.707 | 648 |
| disgust | 0.108 | 0.061 | 0.078 | 66 |
| embarrassment | 0.231 | 0.500 | 0.316 | 42 |
| excitement | 0.201 | 0.243 | 0.220 | 111 |
| fear | 0.738 | 0.684 | 0.710 | 247 |
| gratitude | 0.957 | 0.892 | 0.923 | 148 |
| grief | 0.273 | 0.882 | 0.417 | 17 |
| joy | 0.677 | 0.417 | 0.516 | 357 |
| love | 0.832 | 0.780 | 0.805 | 513 |
| nervousness | 0.280 | 0.535 | 0.368 | 99 |
| neutral | 0.579 | 0.833 | 0.683 | 1200 |
| optimism | 0.502 | 0.779 | 0.611 | 280 |
| pride | 0.476 | 0.833 | 0.606 | 36 |
| realization | 0.150 | 0.570 | 0.237 | 100 |
| relief | 0.111 | 0.071 | 0.087 | 14 |
| remorse | 0.718 | 0.761 | 0.739 | 134 |
| sadness | 0.537 | 0.225 | 0.317 | 547 |
| surprise | 0.802 | 0.672 | 0.732 | 229 |
- Accuracy: 0.648
- Weighted Avg F1: 0.641
- Macro Avg F1: 0.529
How to Use
Quick inference with transformers
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="boffire/kabyle-emotion-afro-xlmr",
device=0 # use -1 for CPU
)
# Example sentences
examples = [
"Ur d-yelli ara wid akken ttwali",
"Lliɣ d aɣeznay i uqeddic-agi",
"Ihi, ma yella, ad nerr",
"Ahat ad yemmut umdan-nni",
"Tameddakelt-iw tezwared-iyi",
]
for text in examples:
result = classifier(text, top_k=None)
top = sorted(result[0], key=lambda x: x["score"], reverse=True)[0]
print(f"{text} -> {top['label']} ({top['score']:.3f})")
Loading the model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("boffire/kabyle-emotion-afro-xlmr")
model = AutoModelForSequenceClassification.from_pretrained("boffire/kabyle-emotion-afro-xlmr")
# Tokenize and predict
inputs = tokenizer("Tura, Jeǧǧiga tesɛa 20 n yiseggasen.", return_tensors="pt", truncation=True)
outputs = model(**inputs)
Training Details
| Hyperparameter | Value |
|---|---|
| Epochs | 5 (early stopping patience=2) |
| Batch size | 16 per device (effective 64 with gradient accumulation) |
| Gradient accumulation | 4 |
| Learning rate | 2e-5 |
| Max sequence length | 96 |
| Weight decay | 0.01 |
| Warmup steps | ~10% of total steps |
| Optimizer | AdamW |
| Class weights | Balanced (sklearn.utils.class_weight.compute_class_weight) |
| Mixed precision | None (float32) |
| Best checkpoint | Epoch 5 |
Limitations & Caveats
- Silver labels: Ground-truth emotions were projected from an English GoEmotions classifier. Some labels may not perfectly capture Kabyle cultural or emotional nuance.
- Rare class weakness: Classes with very few test examples (
relief: 14,grief: 17,disgust: 66) have low F1 scores. The model struggles to learn reliable patterns for these. - Neutral class: While
neutralnow comes from a real GoEmotions label (not synthetic uncertainty), it still dominates the raw distribution and is capped to 2,000 training examples. - Translation quality: The parallel corpus includes round-trip translated sentences. Imperfect translations may introduce label noise.
- No native speaker validation: The test set was held out from the same silver-labeled pool. A small native-annotated benchmark would give a more accurate human ceiling.
- Domain limitation: Training data comes from Tatoeba (simple, short sentences) and round-trip translations. Performance may degrade on longer, more complex Kabyle text (social media, literature, etc.).
- Kabyle not in AfroXLMR pre-training corpus: AfroXLMR-Large was trained on 17 African languages, but Kabyle was not among them. The model relies on transfer from related Afro-Asiatic languages (e.g., Amharic, Arabic).
Intended Use
- Research in low-resource NLP and Afro-Asiatic / Amazigh language processing.
- Downstream applications requiring fine-grained emotion signals in Kabyle text (e.g., content moderation, mental-health screening, customer feedback analysis).
- Baseline for future Kabyle emotion models trained on native annotations.
Citation
If you use this model, please cite:
@misc{boffire_kabyle_emotion_afro_xlmr,
title = {Kabyle Emotion Classifier (AfroXLMR-Large + GoEmotions)},
author = {Boffire},
year = {2026},
howpublished = {\url{https://huggingface.co/boffire/kabyle-emotion-afro-xlmr}},
note = {Fine-tuned AfroXLMR-Large for 28-class GoEmotions detection in Kabyle via cross-lingual label transfer from English}
}
Acknowledgments
- Davlan for the
afro-xlmr-largebase model and African-centric pre-training. - cirimus for the
modernbert-base-go-emotionsEnglish emotion classifier. - Google Research for the GoEmotions dataset.
- Tatoeba Project for the English-Kabyle parallel corpus.
- Hugging Face
transformers,datasets, andaccelerateteams for the training infrastructure.
License
This model is released under the Apache 2.0 license.
The base model (Davlan/afro-xlmr-large) and English emotion classifier (cirimus/modernbert-base-go-emotions) are subject to their respective MIT licenses. The GoEmotions dataset is Apache 2.0.
- Downloads last month
- 45
Model tree for boffire/kabyle-emotion-afro-xlmr
Base model
Davlan/afro-xlmr-largeSpace using boffire/kabyle-emotion-afro-xlmr 1
Evaluation results
- Validation Weighted F1 on English-Kabyle Parallel Corpus (Tatoeba + Round-trip)self-reported0.817
- Validation Accuracy on English-Kabyle Parallel Corpus (Tatoeba + Round-trip)self-reported0.815
- Test Weighted F1 on English-Kabyle Parallel Corpus (Tatoeba + Round-trip)self-reported0.641
- Test Accuracy on English-Kabyle Parallel Corpus (Tatoeba + Round-trip)self-reported0.648