How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-classification", model="boffire/kabyle-emotion-xlmr")
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("boffire/kabyle-emotion-xlmr")
model = AutoModelForSequenceClassification.from_pretrained("boffire/kabyle-emotion-xlmr")
Quick Links

Kabyle Emotion Classifier

A fine-tuned XLM-RoBERTa model for 7-class emotion recognition in Kabyle (Taqbaylit), a low-resource Afro-Asiatic and Amazigh language spoken in Algeria.

Model Details

Attribute Value
Base model xlm-roberta-base (fine-tuned from boffire/kabyle-emotion-xlmr)
Architecture XLM-RoBERTa for Sequence Classification
Parameters ~278 M
Languages Kabyle (kab)
Task Text Classification (Emotion Detection)
Classes 7 — anger, disgust, fear, joy, sadness, surprise, neutral

Training Data

The model was trained via cross-lingual label transfer from English to Kabyle using parallel sentence pairs:

  1. Round-trip parallel corpus (eng_kab_roundtrip_good.tsv) — 131,301 English–Kabyle sentence pairs with back-translation quality scores.
  2. Tatoeba parallel corpus — 138,353 additional English–Kabyle linked sentences downloaded from tatoeba.org.

Labeling pipeline:

  • English sentences were labeled with j-hartmann/emotion-english-distilroberta-base.
  • Labels were transferred to the Kabyle side via sentence alignment.
  • Low-confidence predictions (< 0.75) were filtered out.
  • The neutral class was capped at 2,000 examples to reduce imbalance.

Final balanced dataset:

  • Total labeled rows (raw): 225,036
  • Final training set: 54,486 rows
    • joy: 12,539 | disgust: 11,983 | sadness: 9,666 | surprise: 6,418 | fear: 6,334 | anger: 5,546 | neutral: 2,000
  • Train / Val / Test split: 40,864 / 5,449 / 8,173

Performance

Test Set Results (8,173 samples)

Emotion Precision Recall F1-Score Support
anger 0.70 0.75 0.73 832
disgust 0.81 0.64 0.72 1,797
fear 0.72 0.74 0.73 950
joy 0.82 0.80 0.81 1,881
sadness 0.72 0.77 0.75 1,450
surprise 0.87 0.87 0.87 963
neutral 0.22 0.39 0.28 300
  • Accuracy: 0.74
  • Weighted Avg F1: 0.75
  • Macro Avg F1: 0.69

How to Use

Quick inference with transformers

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="boffire/kabyle-emotion-xlmr",
    device=0  # use -1 for CPU
)

# Example sentences
examples = [
    "Tafyirt 1",           # → sadness (0.98)
    "Tafyirt tis 2",       # → joy (0.72)
    "Tafyirt tis 3",    # → neutral / sadness
]

for text in examples:
    result = classifier(text, top_k=None)
    top = sorted(result[0], key=lambda x: x["score"], reverse=True)[0]
    print(f"{text}{top['label']} ({top['score']:.3f})")

Loading the model directly

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("boffire/kabyle-emotion-xlmr")
model = AutoModelForSequenceClassification.from_pretrained("boffire/kabyle-emotion-xlmr")

# Tokenize and predict
inputs = tokenizer("Imir-a, Muiriel tesɛa 20 n yiseggasen.", return_tensors="pt", truncation=True)
outputs = model(**inputs)

Training Details

Hyperparameter Value
Epochs 6 (with early stopping, patience=2)
Batch size 32 per device (effective 64 with gradient accumulation)
Learning rate 2e-5
Max sequence length 96
Weight decay 0.01
Warmup steps ~10% of total steps
Mixed precision FP16
Class weights Balanced (sklearn.utils.class_weight.compute_class_weight)
Optimizer AdamW (Hugging Face default)
Best checkpoint Epoch 6 (loaded automatically via load_best_model_at_end)

Limitations & Caveats

  1. Silver labels: Ground-truth emotions were projected from an English classifier. Some labels may not perfectly capture Kabyle cultural or emotional nuance.
  2. Neutral class weakness: The neutral class performs poorly (F1 ~0.28) because it contains many low-confidence English predictions. Consider treating it as a "no strong emotion" fallback rather than a reliable label.
  3. Translation quality: The parallel corpus includes round-trip translated sentences. Imperfect translations may introduce label noise.
  4. No native speaker validation: The test set was held out from the same silver-labeled pool. A small native-annotated benchmark would give a more accurate human ceiling.
  5. Imbalanced source: Tatoeba data is skewed toward simple, short sentences. Performance may degrade on longer, more complex Kabyle text (social media, literature, etc.).

Intended Use

  • Research in low-resource NLP and Afro-Asiatic language processing.
  • Downstream applications requiring coarse emotion signals in Kabyle text (e.g., content moderation, mental-health screening, customer feedback analysis).
  • Baseline for future Kabyle emotion models trained on native annotations.

Citation

If you use this model, please cite:

@misc{boffire_kabyle_emotion_xlmr,
  title = {Kabyle Emotion Classifier},
  author = {Boffire},
  year = {2026},
  howpublished = {\url{https://huggingface.co/boffire/kabyle-emotion-xlmr}},
  note = {Fine-tuned XLM-RoBERTa for 7-class emotion detection in Kabyle via cross-lingual label transfer from English}
}

License

This model is released under the Apache 2.0 license. The base XLM-RoBERTa weights and the English emotion classifier (j-hartmann/emotion-english-distilroberta-base) are subject to their respective original licenses.

Acknowledgments

  • Tatoeba Project for the English–Kabyle parallel corpus.
  • j-hartmann for the English emotion classifier used for label projection.
  • Hugging Face transformers, datasets, and accelerate teams for the training infrastructure.
Downloads last month
102
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for boffire/kabyle-emotion-xlmr

Finetuned
(3985)
this model

Dataset used to train boffire/kabyle-emotion-xlmr