Update README with balanced training results (Accuracy 88.76%)

a39acd0 29 days ago

2.23 kB

library_name: transformers
license: mit
base_model: xlm-roberta-base
tags:
  - generated_from_trainer
  - emotion-classification
  - midwest-emo
  - multilingual
metrics:
  - accuracy
datasets:
  - anggars/mbti-emotion
model-index:
  - name: xlm-emotion
    results:
      - task:
          name: Text Classification
          type: text-classification
        dataset:
          name: anggars/mbti-emotion
          type: anggars/mbti-emotion
        metrics:
          - name: Accuracy
            type: accuracy
            value: 0.8876

xlm-emotion

This model is a fine-tuned version of xlm-roberta-base for Emotion Classification (28 labels). It is specifically trained to recognize emotional nuances in Midwest Emo and Math Rock lyrical styles.

To ensure high reliability and prevent the model from being biased toward majority classes (like Grief), the training was conducted on a balanced version of the anggars/mbti-emotion dataset using undersampling techniques.

Model description

Model Type: XLM-RoBERTa Base
Labels: 28 Emotion Categories
Dataset: Balanced anggars/mbti-emotion (max 500 samples per class combo)
Language: English & Indonesian

Training results

The following results were achieved on the evaluation set during the 3-epoch training process:

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.5297	1.0	5618	0.4876	0.8480
0.3340	2.0	11236	0.4059	0.8732
0.2116	3.0	16854	0.3954	0.8876

Intended uses & limitations

This model is designed for sentiment and emotion analysis in creative writing, specifically lyrics. Limitations: Since it's trained on a specific subgenre context, it may perform differently on general conversational text or technical documents.

Training procedure

Training hyperparameters

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
num_epochs: 3

Framework versions

Transformers 4.44.2
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3