Helsinki-NLP/tatoeba
Updated • 3.54k • 56
How to use boffire/kabyle-emotion-xlmr with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="boffire/kabyle-emotion-xlmr") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("boffire/kabyle-emotion-xlmr")
model = AutoModelForSequenceClassification.from_pretrained("boffire/kabyle-emotion-xlmr")# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("boffire/kabyle-emotion-xlmr")
model = AutoModelForSequenceClassification.from_pretrained("boffire/kabyle-emotion-xlmr")A fine-tuned XLM-RoBERTa model for 7-class emotion recognition in Kabyle (Taqbaylit), a low-resource Afro-Asiatic and Amazigh language spoken in Algeria.
| Attribute | Value |
|---|---|
| Base model | xlm-roberta-base (fine-tuned from boffire/kabyle-emotion-xlmr) |
| Architecture | XLM-RoBERTa for Sequence Classification |
| Parameters | ~278 M |
| Languages | Kabyle (kab) |
| Task | Text Classification (Emotion Detection) |
| Classes | 7 — anger, disgust, fear, joy, sadness, surprise, neutral |
The model was trained via cross-lingual label transfer from English to Kabyle using parallel sentence pairs:
eng_kab_roundtrip_good.tsv) — 131,301 English–Kabyle sentence pairs with back-translation quality scores.Labeling pipeline:
j-hartmann/emotion-english-distilroberta-base.< 0.75) were filtered out.neutral class was capped at 2,000 examples to reduce imbalance.Final balanced dataset:
| Emotion | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| anger | 0.70 | 0.75 | 0.73 | 832 |
| disgust | 0.81 | 0.64 | 0.72 | 1,797 |
| fear | 0.72 | 0.74 | 0.73 | 950 |
| joy | 0.82 | 0.80 | 0.81 | 1,881 |
| sadness | 0.72 | 0.77 | 0.75 | 1,450 |
| surprise | 0.87 | 0.87 | 0.87 | 963 |
| neutral | 0.22 | 0.39 | 0.28 | 300 |
transformers
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="boffire/kabyle-emotion-xlmr",
device=0 # use -1 for CPU
)
# Example sentences
examples = [
"Tafyirt 1", # → sadness (0.98)
"Tafyirt tis 2", # → joy (0.72)
"Tafyirt tis 3", # → neutral / sadness
]
for text in examples:
result = classifier(text, top_k=None)
top = sorted(result[0], key=lambda x: x["score"], reverse=True)[0]
print(f"{text} → {top['label']} ({top['score']:.3f})")
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("boffire/kabyle-emotion-xlmr")
model = AutoModelForSequenceClassification.from_pretrained("boffire/kabyle-emotion-xlmr")
# Tokenize and predict
inputs = tokenizer("Imir-a, Muiriel tesɛa 20 n yiseggasen.", return_tensors="pt", truncation=True)
outputs = model(**inputs)
| Hyperparameter | Value |
|---|---|
| Epochs | 6 (with early stopping, patience=2) |
| Batch size | 32 per device (effective 64 with gradient accumulation) |
| Learning rate | 2e-5 |
| Max sequence length | 96 |
| Weight decay | 0.01 |
| Warmup steps | ~10% of total steps |
| Mixed precision | FP16 |
| Class weights | Balanced (sklearn.utils.class_weight.compute_class_weight) |
| Optimizer | AdamW (Hugging Face default) |
| Best checkpoint | Epoch 6 (loaded automatically via load_best_model_at_end) |
neutral class performs poorly (F1 ~0.28) because it contains many low-confidence English predictions. Consider treating it as a "no strong emotion" fallback rather than a reliable label.If you use this model, please cite:
@misc{boffire_kabyle_emotion_xlmr,
title = {Kabyle Emotion Classifier},
author = {Boffire},
year = {2026},
howpublished = {\url{https://huggingface.co/boffire/kabyle-emotion-xlmr}},
note = {Fine-tuned XLM-RoBERTa for 7-class emotion detection in Kabyle via cross-lingual label transfer from English}
}
This model is released under the Apache 2.0 license. The base XLM-RoBERTa weights and the English emotion classifier (j-hartmann/emotion-english-distilroberta-base) are subject to their respective original licenses.
transformers, datasets, and accelerate teams for the training infrastructure.Base model
FacebookAI/xlm-roberta-base
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="boffire/kabyle-emotion-xlmr")