Text Classification
Transformers
Safetensors
Kabyle
ber
xlm-roberta
kabyle
tamazight
emotion-classification
sentiment-analysis
low-resource
cross-lingual-transfer
text-embeddings-inference
Instructions to use boffire/kabyle-emotion-xlmr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use boffire/kabyle-emotion-xlmr with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="boffire/kabyle-emotion-xlmr")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("boffire/kabyle-emotion-xlmr") model = AutoModelForSequenceClassification.from_pretrained("boffire/kabyle-emotion-xlmr") - Notebooks
- Google Colab
- Kaggle
| language: | |
| - kab | |
| - ber | |
| license: mit | |
| library_name: transformers | |
| tags: | |
| - kabyle | |
| - tamazight | |
| - emotion-classification | |
| - sentiment-analysis | |
| - xlm-roberta | |
| - low-resource | |
| - cross-lingual-transfer | |
| datasets: | |
| - tatoeba | |
| base_model: xlm-roberta-base | |
| metrics: | |
| - f1 | |
| - accuracy | |
| # Kabyle Emotion Classifier | |
| A fine-tuned XLM-RoBERTa model for **7-class emotion recognition in Kabyle** (Taqbaylit), a low-resource Afro-Asiatic and Amazigh language spoken in Algeria. | |
| ## Model Details | |
| | Attribute | Value | | |
| |-----------|-------| | |
| | **Base model** | `xlm-roberta-base` (fine-tuned from `boffire/kabyle-emotion-xlmr`) | | |
| | **Architecture** | XLM-RoBERTa for Sequence Classification | | |
| | **Parameters** | ~278 M | | |
| | **Languages** | Kabyle (`kab`) | | |
| | **Task** | Text Classification (Emotion Detection) | | |
| | **Classes** | 7 — `anger`, `disgust`, `fear`, `joy`, `sadness`, `surprise`, `neutral` | | |
| ## Training Data | |
| The model was trained via **cross-lingual label transfer** from English to Kabyle using parallel sentence pairs: | |
| 1. **Round-trip parallel corpus** (`eng_kab_roundtrip_good.tsv`) — 131,301 English–Kabyle sentence pairs with back-translation quality scores. | |
| 2. **Tatoeba parallel corpus** — 138,353 additional English–Kabyle linked sentences downloaded from [tatoeba.org](https://tatoeba.org). | |
| **Labeling pipeline:** | |
| - English sentences were labeled with [`j-hartmann/emotion-english-distilroberta-base`](https://huggingface.co/j-hartmann/emotion-english-distilroberta-base). | |
| - Labels were transferred to the Kabyle side via sentence alignment. | |
| - Low-confidence predictions (`< 0.75`) were filtered out. | |
| - The `neutral` class was capped at 2,000 examples to reduce imbalance. | |
| **Final balanced dataset:** | |
| - **Total labeled rows (raw):** 225,036 | |
| - **Final training set:** 54,486 rows | |
| - joy: 12,539 | disgust: 11,983 | sadness: 9,666 | surprise: 6,418 | fear: 6,334 | anger: 5,546 | neutral: 2,000 | |
| - **Train / Val / Test split:** 40,864 / 5,449 / 8,173 | |
| ## Performance | |
| ### Test Set Results (8,173 samples) | |
| | Emotion | Precision | Recall | F1-Score | Support | | |
| |---------|-----------|--------|----------|---------| | |
| | anger | 0.70 | 0.75 | **0.73** | 832 | | |
| | disgust | 0.81 | 0.64 | **0.72** | 1,797 | | |
| | fear | 0.72 | 0.74 | **0.73** | 950 | | |
| | joy | 0.82 | 0.80 | **0.81** | 1,881 | | |
| | sadness | 0.72 | 0.77 | **0.75** | 1,450 | | |
| | surprise | 0.87 | 0.87 | **0.87** | 963 | | |
| | neutral | 0.22 | 0.39 | **0.28** | 300 | | |
| - **Accuracy:** 0.74 | |
| - **Weighted Avg F1:** **0.75** | |
| - **Macro Avg F1:** 0.69 | |
| ## How to Use | |
| ### Quick inference with `transformers` | |
| ```python | |
| from transformers import pipeline | |
| classifier = pipeline( | |
| "text-classification", | |
| model="boffire/kabyle-emotion-xlmr", | |
| device=0 # use -1 for CPU | |
| ) | |
| # Example sentences | |
| examples = [ | |
| "Tafyirt 1", # → sadness (0.98) | |
| "Tafyirt tis 2", # → joy (0.72) | |
| "Tafyirt tis 3", # → neutral / sadness | |
| ] | |
| for text in examples: | |
| result = classifier(text, top_k=None) | |
| top = sorted(result[0], key=lambda x: x["score"], reverse=True)[0] | |
| print(f"{text} → {top['label']} ({top['score']:.3f})") | |
| ``` | |
| ### Loading the model directly | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
| tokenizer = AutoTokenizer.from_pretrained("boffire/kabyle-emotion-xlmr") | |
| model = AutoModelForSequenceClassification.from_pretrained("boffire/kabyle-emotion-xlmr") | |
| # Tokenize and predict | |
| inputs = tokenizer("Imir-a, Muiriel tesɛa 20 n yiseggasen.", return_tensors="pt", truncation=True) | |
| outputs = model(**inputs) | |
| ``` | |
| ## Training Details | |
| | Hyperparameter | Value | | |
| |----------------|-------| | |
| | Epochs | 6 (with early stopping, patience=2) | | |
| | Batch size | 32 per device (effective 64 with gradient accumulation) | | |
| | Learning rate | 2e-5 | | |
| | Max sequence length | 96 | | |
| | Weight decay | 0.01 | | |
| | Warmup steps | ~10% of total steps | | |
| | Mixed precision | FP16 | | |
| | Class weights | Balanced (`sklearn.utils.class_weight.compute_class_weight`) | | |
| | Optimizer | AdamW (Hugging Face default) | | |
| | Best checkpoint | Epoch 6 (loaded automatically via `load_best_model_at_end`) | | |
| ## Limitations & Caveats | |
| 1. **Silver labels:** Ground-truth emotions were projected from an English classifier. Some labels may not perfectly capture Kabyle cultural or emotional nuance. | |
| 2. **Neutral class weakness:** The `neutral` class performs poorly (F1 ~0.28) because it contains many low-confidence English predictions. Consider treating it as a "no strong emotion" fallback rather than a reliable label. | |
| 3. **Translation quality:** The parallel corpus includes round-trip translated sentences. Imperfect translations may introduce label noise. | |
| 4. **No native speaker validation:** The test set was held out from the same silver-labeled pool. A small native-annotated benchmark would give a more accurate human ceiling. | |
| 5. **Imbalanced source:** Tatoeba data is skewed toward simple, short sentences. Performance may degrade on longer, more complex Kabyle text (social media, literature, etc.). | |
| ## Intended Use | |
| - **Research** in low-resource NLP and Afro-Asiatic language processing. | |
| - **Downstream applications** requiring coarse emotion signals in Kabyle text (e.g., content moderation, mental-health screening, customer feedback analysis). | |
| - **Baseline** for future Kabyle emotion models trained on native annotations. | |
| ## Citation | |
| If you use this model, please cite: | |
| ```bibtex | |
| @misc{boffire_kabyle_emotion_xlmr, | |
| title = {Kabyle Emotion Classifier}, | |
| author = {Boffire}, | |
| year = {2026}, | |
| howpublished = {\url{https://huggingface.co/boffire/kabyle-emotion-xlmr}}, | |
| note = {Fine-tuned XLM-RoBERTa for 7-class emotion detection in Kabyle via cross-lingual label transfer from English} | |
| } | |
| ``` | |
| ## License | |
| This model is released under the **Apache 2.0** license. The base XLM-RoBERTa weights and the English emotion classifier (`j-hartmann/emotion-english-distilroberta-base`) are subject to their respective original licenses. | |
| ## Acknowledgments | |
| - [Tatoeba Project](https://tatoeba.org) for the English–Kabyle parallel corpus. | |
| - [j-hartmann](https://huggingface.co/j-hartmann) for the English emotion classifier used for label projection. | |
| - Hugging Face `transformers`, `datasets`, and `accelerate` teams for the training infrastructure. |