emBERT / README.md
poltextlab's picture
Update gated prompt
1fe433d verified
metadata
language:
  - hu
license: cc-by-4.0
extra_gated_fields:
  Country: country
  Institution: text
  Institution Email: text
  Full Name: text
  Please specify your academic project/use case you want to use the models for: text
extra_gated_prompt: >-
  Our models are intended for academic projects and academic research only. If
  you are not affiliated with an academic institution, please reach out to us at
  huggingface [at] poltextlab [dot] com for further inquiry. If we cannot
  clearly determine your academic affiliation and use case based on your form
  data, your request may be rejected. Please allow us a few business days to
  manually review subscriptions.

[README UNDER CONSTRUCTION]

emBert is a Hungarian text classification model, aimed at classifying 7 possible emotions and a neutral state. The model uses huBERT tokenizer, and was fine-tuned on a huBERT base model with a proprietary database of Hungarian online news site sentences. The sentences for the fine-tuning set were classified manually by experts in a double-blind manner. Inconsistencies were dealt with manually. The results of the fine-tuning validation were:

emotion precision recall f1-score
0 - Anger 0.70 0.74 0.72
1 - Disgust 0.72 0.73 0.73
2 - Fear 0.61 0.47 0.53
3 - Happiness 0.38 0.37 0.38
4 - Neutral 0.65 0.62 0.63
5 - Sad 0.74 0.72 0.73
6 - Successful 0.79 0.81 0.80
7 - Trustful 0.76 0.78 0.77
weighted avg 0.73 0.74 0.73
Accuracy reached 74%.

The emotions are based on Plutchik 1980, with anticipation substituted with neutral.

Proper use of the model:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("SZTAKI-HLT/hubert-base-cc")

model = AutoModelForSequenceClassification.from_pretrained("poltextlab/emBERT")

The model was created by György Márk Kis, Orsolya Ring, Miklós Sebők of the Center for Social Sciences.