Upload folder using huggingface_hub
Browse files- .DS_Store +0 -0
- .gitattributes +1 -0
- README.md +27 -23
- labeled_data.csv +3 -0
- model.safetensors +1 -1
- training_args.bin +1 -1
.DS_Store
CHANGED
|
Binary files a/.DS_Store and b/.DS_Store differ
|
|
|
.gitattributes
CHANGED
|
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
checkpoint/labeled_data.csv filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
checkpoint/labeled_data.csv filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
labeled_data.csv filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
|
@@ -1,17 +1,25 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
datasets:
|
| 3 |
-
- thomasrenault/us_tweet_speech_congress
|
| 4 |
-
|
| 5 |
-
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
pipeline_tag: text-classification
|
| 9 |
---
|
| 10 |
|
| 11 |
# thomasrenault/emotion
|
| 12 |
|
| 13 |
-
A multi-label emotion intensity classifier fine-tuned on US
|
| 14 |
-
Built on `distilbert-base-uncased` using an **active learning** pipeline with GPT-4o-mini annotation.
|
| 15 |
|
| 16 |
## Labels
|
| 17 |
|
|
@@ -37,17 +45,13 @@ Scores are **independent** — multiple emotions can be high simultaneously.
|
|
| 37 |
| Base model | `distilbert-base-uncased` |
|
| 38 |
| Architecture | `DistilBertForSequenceClassification` (multi-label) |
|
| 39 |
| Problem type | `multi_label_classification` |
|
| 40 |
-
| Training data | ~
|
| 41 |
| Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
|
| 42 |
-
|
|
| 43 |
-
| Seed size | 1,000 documents (random) |
|
| 44 |
-
| AL query size | 25,000 documents / round |
|
| 45 |
-
| Epochs (seed) | 4 |
|
| 46 |
-
| Epochs (AL) | 2 (warm-start) |
|
| 47 |
| Learning rate | 2e-5 |
|
| 48 |
| Batch size | 16 |
|
| 49 |
| Max length | 512 tokens |
|
| 50 |
-
| Domain | US
|
| 51 |
|
| 52 |
## Usage
|
| 53 |
|
|
@@ -68,21 +72,19 @@ def predict(text):
|
|
| 68 |
probs = torch.sigmoid(model(**enc).logits).squeeze().tolist()
|
| 69 |
return dict(zip(EMOTIONS, probs))
|
| 70 |
|
| 71 |
-
print(predict("
|
| 72 |
-
{'anger': 0.7810273766517639, 'sadness': 0.2732137441635132, 'fear': 0.17815309762954712, 'disgust': 0.6877616047859192, 'pride': 0.02167985402047634, 'joy': 0.002112995134666562, 'gratitude': 0.0007932100561447442, 'hope': 0.042895104736089706}
|
| 73 |
|
| 74 |
```
|
| 75 |
|
| 76 |
## Intended Use
|
| 77 |
|
| 78 |
- Academic research on emotion in political communication
|
| 79 |
-
- Analysis of congressional speeches and
|
| 80 |
- Temporal trend analysis of emotional rhetoric
|
| 81 |
|
| 82 |
## Limitations
|
| 83 |
|
| 84 |
- Trained exclusively on **US English political text** — performance may degrade on other domains
|
| 85 |
-
- Annotation by GPT-4o-mini may carry its own biases
|
| 86 |
- Emotions are subjective; inter-annotator agreement on intensity scores is inherently noisy
|
| 87 |
- Labels are silver-standard (LLM-generated), not human-verified gold labels
|
| 88 |
|
|
@@ -91,9 +93,11 @@ print(predict("This tax policy is a disgrace. It punishes work, protects the wel
|
|
| 91 |
If you use this model, please cite:
|
| 92 |
|
| 93 |
```
|
| 94 |
-
@
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
year={
|
|
|
|
|
|
|
| 98 |
}
|
| 99 |
-
```
|
|
|
|
| 1 |
---
|
| 2 |
+
language: en
|
| 3 |
+
license: mit
|
| 4 |
+
tags:
|
| 5 |
+
- text-classification
|
| 6 |
+
- multi-label-classification
|
| 7 |
+
- emotion-analysis
|
| 8 |
+
- political-text
|
| 9 |
+
- tweets
|
| 10 |
+
- distilbert
|
| 11 |
datasets:
|
| 12 |
+
- thomasrenault/us_tweet_speech_congress
|
| 13 |
+
metrics:
|
| 14 |
+
- rmse
|
| 15 |
+
- mae
|
| 16 |
+
base_model: distilbert-base-uncased
|
| 17 |
pipeline_tag: text-classification
|
| 18 |
---
|
| 19 |
|
| 20 |
# thomasrenault/emotion
|
| 21 |
|
| 22 |
+
A multi-label emotion intensity classifier fine-tuned on US tweets, campaign speeches and congressional speeches. Built on `distilbert-base-uncased` with GPT-4o-mini annotation via the OpenAI Batch API.
|
|
|
|
| 23 |
|
| 24 |
## Labels
|
| 25 |
|
|
|
|
| 45 |
| Base model | `distilbert-base-uncased` |
|
| 46 |
| Architecture | `DistilBertForSequenceClassification` (multi-label) |
|
| 47 |
| Problem type | `multi_label_classification` |
|
| 48 |
+
| Training data | ~200,000 labeled documents |
|
| 49 |
| Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
|
| 50 |
+
| Epochs | 4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
| Learning rate | 2e-5 |
|
| 52 |
| Batch size | 16 |
|
| 53 |
| Max length | 512 tokens |
|
| 54 |
+
| Domain | US tweets about policy, campaign speeches and congressional floor speeches |
|
| 55 |
|
| 56 |
## Usage
|
| 57 |
|
|
|
|
| 72 |
probs = torch.sigmoid(model(**enc).logits).squeeze().tolist()
|
| 73 |
return dict(zip(EMOTIONS, probs))
|
| 74 |
|
| 75 |
+
print(predict("We must stand together and fight for justice!"))
|
|
|
|
| 76 |
|
| 77 |
```
|
| 78 |
|
| 79 |
## Intended Use
|
| 80 |
|
| 81 |
- Academic research on emotion in political communication
|
| 82 |
+
- Analysis of congressional speeches and social media
|
| 83 |
- Temporal trend analysis of emotional rhetoric
|
| 84 |
|
| 85 |
## Limitations
|
| 86 |
|
| 87 |
- Trained exclusively on **US English political text** — performance may degrade on other domains
|
|
|
|
| 88 |
- Emotions are subjective; inter-annotator agreement on intensity scores is inherently noisy
|
| 89 |
- Labels are silver-standard (LLM-generated), not human-verified gold labels
|
| 90 |
|
|
|
|
| 93 |
If you use this model, please cite:
|
| 94 |
|
| 95 |
```
|
| 96 |
+
@misc{renault2025emotion,
|
| 97 |
+
author = {Renault, Thomas},
|
| 98 |
+
title = {thomasrenault/emotion: Multi-label emotion classifier for US political text},
|
| 99 |
+
year = {2025},
|
| 100 |
+
publisher = {HuggingFace},
|
| 101 |
+
url = {https://huggingface.co/thomasrenault/emotion}
|
| 102 |
}
|
| 103 |
+
```
|
labeled_data.csv
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3a4ce676be1a930d0a20f249e1e4160487e6f60cd8d6e7fbe5dc5c09ae73ec70
|
| 3 |
+
size 72441116
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 267851024
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6a1564b0fa17e4a3a3193e6ce6722326acbb48f6a277186092d11d3499550f59
|
| 3 |
size 267851024
|
training_args.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 5201
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a02060ce1c1ce5ff8f312c873e62c5580ccb947b9ac8a4f70d6f4f73f23ceef6
|
| 3 |
size 5201
|