Text Classification
Transformers
Safetensors
Kabyle
ber
xlm-roberta
kabyle
tamazight
emotion-classification
sentiment-analysis
low-resource
cross-lingual-transfer
text-embeddings-inference
Instructions to use boffire/kabyle-emotion-xlmr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use boffire/kabyle-emotion-xlmr with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="boffire/kabyle-emotion-xlmr")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("boffire/kabyle-emotion-xlmr") model = AutoModelForSequenceClassification.from_pretrained("boffire/kabyle-emotion-xlmr") - Notebooks
- Google Colab
- Kaggle
metadata
language:
- kab
- ber
license: mit
library_name: transformers
tags:
- kabyle
- tamazight
- emotion-classification
- sentiment-analysis
- xlm-roberta
- low-resource
- cross-lingual-transfer
datasets:
- tatoeba
base_model: xlm-roberta-base
metrics:
- f1
- accuracy
Kabyle Emotion Classifier
A fine-tuned XLM-RoBERTa model for 7-class emotion recognition in Kabyle (Taqbaylit), a low-resource Afro-Asiatic and Amazigh language spoken in Algeria.
Model Details
| Attribute | Value |
|---|---|
| Base model | xlm-roberta-base (fine-tuned from boffire/kabyle-emotion-xlmr) |
| Architecture | XLM-RoBERTa for Sequence Classification |
| Parameters | ~278 M |
| Languages | Kabyle (kab) |
| Task | Text Classification (Emotion Detection) |
| Classes | 7 — anger, disgust, fear, joy, sadness, surprise, neutral |
Training Data
The model was trained via cross-lingual label transfer from English to Kabyle using parallel sentence pairs:
- Round-trip parallel corpus (
eng_kab_roundtrip_good.tsv) — 131,301 English–Kabyle sentence pairs with back-translation quality scores. - Tatoeba parallel corpus — 138,353 additional English–Kabyle linked sentences downloaded from tatoeba.org.
Labeling pipeline:
- English sentences were labeled with
j-hartmann/emotion-english-distilroberta-base. - Labels were transferred to the Kabyle side via sentence alignment.
- Low-confidence predictions (
< 0.75) were filtered out. - The
neutralclass was capped at 2,000 examples to reduce imbalance.
Final balanced dataset:
- Total labeled rows (raw): 225,036
- Final training set: 54,486 rows
- joy: 12,539 | disgust: 11,983 | sadness: 9,666 | surprise: 6,418 | fear: 6,334 | anger: 5,546 | neutral: 2,000
- Train / Val / Test split: 40,864 / 5,449 / 8,173
Performance
Test Set Results (8,173 samples)
| Emotion | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| anger | 0.70 | 0.75 | 0.73 | 832 |
| disgust | 0.81 | 0.64 | 0.72 | 1,797 |
| fear | 0.72 | 0.74 | 0.73 | 950 |
| joy | 0.82 | 0.80 | 0.81 | 1,881 |
| sadness | 0.72 | 0.77 | 0.75 | 1,450 |
| surprise | 0.87 | 0.87 | 0.87 | 963 |
| neutral | 0.22 | 0.39 | 0.28 | 300 |
- Accuracy: 0.74
- Weighted Avg F1: 0.75
- Macro Avg F1: 0.69
How to Use
Quick inference with transformers
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="boffire/kabyle-emotion-xlmr",
device=0 # use -1 for CPU
)
# Example sentences
examples = [
"Tafyirt 1", # → sadness (0.98)
"Tafyirt tis 2", # → joy (0.72)
"Tafyirt tis 3", # → neutral / sadness
]
for text in examples:
result = classifier(text, top_k=None)
top = sorted(result[0], key=lambda x: x["score"], reverse=True)[0]
print(f"{text} → {top['label']} ({top['score']:.3f})")
Loading the model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("boffire/kabyle-emotion-xlmr")
model = AutoModelForSequenceClassification.from_pretrained("boffire/kabyle-emotion-xlmr")
# Tokenize and predict
inputs = tokenizer("Imir-a, Muiriel tesɛa 20 n yiseggasen.", return_tensors="pt", truncation=True)
outputs = model(**inputs)
Training Details
| Hyperparameter | Value |
|---|---|
| Epochs | 6 (with early stopping, patience=2) |
| Batch size | 32 per device (effective 64 with gradient accumulation) |
| Learning rate | 2e-5 |
| Max sequence length | 96 |
| Weight decay | 0.01 |
| Warmup steps | ~10% of total steps |
| Mixed precision | FP16 |
| Class weights | Balanced (sklearn.utils.class_weight.compute_class_weight) |
| Optimizer | AdamW (Hugging Face default) |
| Best checkpoint | Epoch 6 (loaded automatically via load_best_model_at_end) |
Limitations & Caveats
- Silver labels: Ground-truth emotions were projected from an English classifier. Some labels may not perfectly capture Kabyle cultural or emotional nuance.
- Neutral class weakness: The
neutralclass performs poorly (F1 ~0.28) because it contains many low-confidence English predictions. Consider treating it as a "no strong emotion" fallback rather than a reliable label. - Translation quality: The parallel corpus includes round-trip translated sentences. Imperfect translations may introduce label noise.
- No native speaker validation: The test set was held out from the same silver-labeled pool. A small native-annotated benchmark would give a more accurate human ceiling.
- Imbalanced source: Tatoeba data is skewed toward simple, short sentences. Performance may degrade on longer, more complex Kabyle text (social media, literature, etc.).
Intended Use
- Research in low-resource NLP and Afro-Asiatic language processing.
- Downstream applications requiring coarse emotion signals in Kabyle text (e.g., content moderation, mental-health screening, customer feedback analysis).
- Baseline for future Kabyle emotion models trained on native annotations.
Citation
If you use this model, please cite:
@misc{boffire_kabyle_emotion_xlmr,
title = {Kabyle Emotion Classifier},
author = {Boffire},
year = {2026},
howpublished = {\url{https://huggingface.co/boffire/kabyle-emotion-xlmr}},
note = {Fine-tuned XLM-RoBERTa for 7-class emotion detection in Kabyle via cross-lingual label transfer from English}
}
License
This model is released under the Apache 2.0 license. The base XLM-RoBERTa weights and the English emotion classifier (j-hartmann/emotion-english-distilroberta-base) are subject to their respective original licenses.
Acknowledgments
- Tatoeba Project for the English–Kabyle parallel corpus.
- j-hartmann for the English emotion classifier used for label projection.
- Hugging Face
transformers,datasets, andaccelerateteams for the training infrastructure.