๐ŸŽญ Levantine Arabic Sentiment Classifier (Ordinal MARBERTv2)

This model is a fine-tuned version of MARBERTv2, designed to predict the sentiment of Levantine Arabic tweets (Jordanian, Lebanese, Palestinian, Syrian).

Technical Highlight: This model was trained using an Ordinal Loss Function (Mean Squared Error combined with Cross-Entropy). This makes the model "distance-aware," meaning it heavily penalizes extreme mistakes (like confusing a highly positive tweet for a highly negative one). This makes its predictions far more reliable in edge cases!

๐Ÿ“Š Performance

Metric Score Description
Accuracy 79.25% Overall correctness on the test set.
F1 (Macro) 0.7635 The balanced F1 score across all 3 classes.

๐Ÿ“– Labels

ID Label Meaning
0 Negative ๐Ÿ˜  Anger, complaints, sadness, or frustration.
1 Neutral ๐Ÿ˜ Objective facts, mixed emotions, or ambiguous statements.
2 Positive ๐Ÿ˜ƒ Joy, praise, excitement, or satisfaction.

๐Ÿš€ How to Use (Python)

Because this is a standard 3-class model, you can easily load it using Hugging Face's built-in pipeline.

from transformers import pipeline

# 1. Load Pipeline
model_id = "amitca71/marabert2-levantine-sentiment"
classifier = pipeline("text-classification", model=model_id)

def predict_sentiment(text):
    # Get the top prediction
    result = classifier(text)[0]

    # Format the output cleanly
    return {"text": text, "label": result['label'], "confidence": round(result['score'], 4)}

# 2. Test Examples
print(predict_sentiment("ุงู„ุฌูˆ ุงู„ูŠูˆู… ุจูŠุนู‚ุฏ! ุทุงู„ุนูŠู† ู…ุดูˆุงุฑ"))            # Should be Positive
print(predict_sentiment("ูˆุงู„ู„ู‡ ุทู‚ุช ุฑูˆุญูŠ ู…ู† ู‡ุงู„ุฒุญู…ุฉุŒ ุดูŠ ุจูŠู‚ุฑู"))        # Should be Negative
print(predict_sentiment("ูˆุตู„ุช ุนุงู„ุจูŠุช ู…ู† ุดูˆูŠ."))                       # Should be Neutral

โš ๏ธ Limitations

  • Dialect Focus: Optimized heavily for Levantine Twitter. It may underperform or misunderstand idioms in Egyptian, Gulf, or Maghrebi dialects.
  • The "Neutral" Bottleneck: Like most sentiment models, detecting true "Neutral" text is the most challenging, as human annotators often mix objective facts with subtle sarcasm in this category.
  • Arabizi: While MARBERTv2 has some exposure to Arabizi (Arabic written in English/Latin letters), this model performs best on native Arabic script.
Downloads last month
8
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results