| --- |
| license: apache-2.0 |
| language: |
| - hu |
| metrics: |
| - accuracy |
| model-index: |
| - name: huBERTPlain |
| results: |
| - task: |
| type: text-classification |
| metrics: |
| - type: f1 |
| value: 0.77 |
| extra_gated_fields: |
| Country: country |
| Institution: text |
| Institution Email: text |
| Full Name: text |
| Please specify your academic project/use case you want to use the models for: text |
| extra_gated_prompt: Our models are intended for academic projects and academic research |
| only. If you are not affiliated with an academic institution, please reach out to |
| us at huggingface [at] poltextlab [dot] com for further inquiry. If we cannot clearly |
| determine your academic affiliation and use case based on your form data, your request |
| may be rejected. Please allow us a few business days to manually review subscriptions. |
| --- |
| |
| ## Model description |
|
|
| Cased fine-tuned BERT model for Hungarian, trained on (manuallay anniated) parliamentary pre-agenda speeches scraped from `parlament.hu`. |
|
|
| ## Intended uses & limitations |
|
|
| The model can be used as any other (cased) BERT model. It has been tested recognizing emotions at the sentence level in (parliamentary) pre-agenda speeches, where: |
| * 'Label_0': Neutral |
| * 'Label_1': Fear |
| * 'Label_2': Sadness |
| * 'Label_3': Anger |
| * 'Label_4': Disgust |
| * 'Label_5': Success |
| * 'Label_6': Joy |
| * 'Label_7': Trust |
|
|
| ## Training |
|
|
| Fine-tuned version of the original huBERT model (`SZTAKI-HLT/hubert-base-cc`), trained on HunEmPoli corpus. |
|
|
| | Category | Count | Ratio | Sentiment | Count | Ratio | |
| | -------- | ----- | ------ | --------- | ----- | ------ | |
| | Neutral | 351 | 1.85% | Neutral | 351 | 1.85% | |
| | Fear | 162 | 0.85% | Negative | 11180 | 58.84% | |
| | Sadness | 4258 | 22.41% | |
| | Anger | 643 | 3.38% | |
| | Disgust | 6117 | 32.19% | |
| | Success | 6602 | 34.74% | Positive | 7471 | 39.32% | |
| | Joy | 441 | 2.32% | |
| | Trust | 428 | 2.25% | |
| | Sum | 19002 | | | | | |
|
|
| ## Eval results |
|
|
| | Class | Precision | Recall | F-Score | |
| |-----|------------|------------|------| |
| | Fear | 0.625 | 0.625 | 0.625 | |
| | Sadness | 0.8535 | 0.6291 | 0.7243 | |
| | Anger | 0.7857 | 0.3437 | 0.4782 | |
| | Disgust | 0.7154 | 0.8790 | 0.7888 | |
| | Success | 0.8579 | 0.8683 | 0.8631 | |
| | Joy | 0.549 | 0.6363 | 0.5894 | |
| | Trust | 0.4705 | 0.5581 | 0.5106 | |
| | Macro AVG | 0.7134 | 0.6281 | 0.6497 | |
| | Weighted AVG | 0.791 | 0.7791 | 0.7743 | |
|
|
|
|
| ## Usage |
|
|
| ```py |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| |
| tokenizer = AutoTokenizer.from_pretrained("poltextlab/HunEmBERT8") |
| model = AutoModelForSequenceClassification.from_pretrained("poltextlab/HunEmBERT8") |
| ``` |
|
|
| ### BibTeX entry and citation info |
|
|
| If you use the model, please cite the following paper: |
|
|
| Bibtex: |
| ```bibtex |
| @ARTICLE{10149341, |
| author={{"U}veges, Istv{\'a}n and Ring, Orsolya}, |
| journal={IEEE Access}, |
| title={HunEmBERT: a fine-tuned BERT-model for classifying sentiment and emotion in political communication}, |
| year={2023}, |
| volume={11}, |
| number={}, |
| pages={60267-60278}, |
| doi={10.1109/ACCESS.2023.3285536} |
| } |
| ``` |