emBERT / README.md

Update gated prompt

1fe433d verified 8 days ago

2.24 kB

language:
  - hu
license: cc-by-4.0
extra_gated_fields:
  Country: country
  Institution: text
  Institution Email: text
  Full Name: text
  Please specify your academic project/use case you want to use the models for: text
extra_gated_prompt: >-
  Our models are intended for academic projects and academic research only. If
  you are not affiliated with an academic institution, please reach out to us at
  huggingface [at] poltextlab [dot] com for further inquiry. If we cannot
  clearly determine your academic affiliation and use case based on your form
  data, your request may be rejected. Please allow us a few business days to
  manually review subscriptions.

[README UNDER CONSTRUCTION]

emBert is a Hungarian text classification model, aimed at classifying 7 possible emotions and a neutral state. The model uses huBERT tokenizer, and was fine-tuned on a huBERT base model with a proprietary database of Hungarian online news site sentences. The sentences for the fine-tuning set were classified manually by experts in a double-blind manner. Inconsistencies were dealt with manually. The results of the fine-tuning validation were:

emotion	precision	recall	f1-score
0 - Anger	0.70	0.74	0.72
1 - Disgust	0.72	0.73	0.73
2 - Fear	0.61	0.47	0.53
3 - Happiness	0.38	0.37	0.38
4 - Neutral	0.65	0.62	0.63
5 - Sad	0.74	0.72	0.73
6 - Successful	0.79	0.81	0.80
7 - Trustful	0.76	0.78	0.77
weighted avg	0.73	0.74	0.73
Accuracy reached 74%.

The emotions are based on Plutchik 1980, with anticipation substituted with neutral.

Proper use of the model:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("SZTAKI-HLT/hubert-base-cc")

model = AutoModelForSequenceClassification.from_pretrained("poltextlab/emBERT")

The model was created by György Márk Kis, Orsolya Ring, Miklós Sebők of the Center for Social Sciences.