thomasrenault commited on
Commit
f35ab69
·
verified ·
1 Parent(s): 9b9204b

Upload folder using huggingface_hub

Browse files
Files changed (6) hide show
  1. .DS_Store +0 -0
  2. .gitattributes +1 -0
  3. README.md +27 -23
  4. labeled_data.csv +3 -0
  5. model.safetensors +1 -1
  6. training_args.bin +1 -1
.DS_Store CHANGED
Binary files a/.DS_Store and b/.DS_Store differ
 
.gitattributes CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  checkpoint/labeled_data.csv filter=lfs diff=lfs merge=lfs -text
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  checkpoint/labeled_data.csv filter=lfs diff=lfs merge=lfs -text
37
+ labeled_data.csv filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,17 +1,25 @@
1
  ---
 
 
 
 
 
 
 
 
 
2
  datasets:
3
- - thomasrenault/us_tweet_speech_congress
4
- language:
5
- - en
6
- base_model:
7
- - distilbert/distilbert-base-uncased
8
  pipeline_tag: text-classification
9
  ---
10
 
11
  # thomasrenault/emotion
12
 
13
- A multi-label emotion intensity classifier fine-tuned on US political tweets and congressional speeches.
14
- Built on `distilbert-base-uncased` using an **active learning** pipeline with GPT-4o-mini annotation.
15
 
16
  ## Labels
17
 
@@ -37,17 +45,13 @@ Scores are **independent** — multiple emotions can be high simultaneously.
37
  | Base model | `distilbert-base-uncased` |
38
  | Architecture | `DistilBertForSequenceClassification` (multi-label) |
39
  | Problem type | `multi_label_classification` |
40
- | Training data | ~216,000 labeled documents |
41
  | Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
42
- | Strategy | Active learning (uncertainty sampling, 12 rounds) |
43
- | Seed size | 1,000 documents (random) |
44
- | AL query size | 25,000 documents / round |
45
- | Epochs (seed) | 4 |
46
- | Epochs (AL) | 2 (warm-start) |
47
  | Learning rate | 2e-5 |
48
  | Batch size | 16 |
49
  | Max length | 512 tokens |
50
- | Domain | US political tweets and congressional floor speeches |
51
 
52
  ## Usage
53
 
@@ -68,21 +72,19 @@ def predict(text):
68
  probs = torch.sigmoid(model(**enc).logits).squeeze().tolist()
69
  return dict(zip(EMOTIONS, probs))
70
 
71
- print(predict("This tax policy is a disgrace. It punishes work, protects the well-connected, and pretends it’s “fair” while shifting the burden onto those who can least afford it. Calling this reform is an insult. It’s not policy, it’s a blatant failure of responsibility."))
72
- {'anger': 0.7810273766517639, 'sadness': 0.2732137441635132, 'fear': 0.17815309762954712, 'disgust': 0.6877616047859192, 'pride': 0.02167985402047634, 'joy': 0.002112995134666562, 'gratitude': 0.0007932100561447442, 'hope': 0.042895104736089706}
73
 
74
  ```
75
 
76
  ## Intended Use
77
 
78
  - Academic research on emotion in political communication
79
- - Analysis of congressional speeches and political social media
80
  - Temporal trend analysis of emotional rhetoric
81
 
82
  ## Limitations
83
 
84
  - Trained exclusively on **US English political text** — performance may degrade on other domains
85
- - Annotation by GPT-4o-mini may carry its own biases
86
  - Emotions are subjective; inter-annotator agreement on intensity scores is inherently noisy
87
  - Labels are silver-standard (LLM-generated), not human-verified gold labels
88
 
@@ -91,9 +93,11 @@ print(predict("This tax policy is a disgrace. It punishes work, protects the wel
91
  If you use this model, please cite:
92
 
93
  ```
94
- @article{algan2026emotions,
95
- title={Emotions and policy views},
96
- author={Algan, Y, Davoine, E., Renault, T., and Stantcheva, S},
97
- year={2026}
 
 
98
  }
99
- ```
 
1
  ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - text-classification
6
+ - multi-label-classification
7
+ - emotion-analysis
8
+ - political-text
9
+ - tweets
10
+ - distilbert
11
  datasets:
12
+ - thomasrenault/us_tweet_speech_congress
13
+ metrics:
14
+ - rmse
15
+ - mae
16
+ base_model: distilbert-base-uncased
17
  pipeline_tag: text-classification
18
  ---
19
 
20
  # thomasrenault/emotion
21
 
22
+ A multi-label emotion intensity classifier fine-tuned on US tweets, campaign speeches and congressional speeches. Built on `distilbert-base-uncased` with GPT-4o-mini annotation via the OpenAI Batch API.
 
23
 
24
  ## Labels
25
 
 
45
  | Base model | `distilbert-base-uncased` |
46
  | Architecture | `DistilBertForSequenceClassification` (multi-label) |
47
  | Problem type | `multi_label_classification` |
48
+ | Training data | ~200,000 labeled documents |
49
  | Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
50
+ | Epochs | 4 |
 
 
 
 
51
  | Learning rate | 2e-5 |
52
  | Batch size | 16 |
53
  | Max length | 512 tokens |
54
+ | Domain | US tweets about policy, campaign speeches and congressional floor speeches |
55
 
56
  ## Usage
57
 
 
72
  probs = torch.sigmoid(model(**enc).logits).squeeze().tolist()
73
  return dict(zip(EMOTIONS, probs))
74
 
75
+ print(predict("We must stand together and fight for justice!"))
 
76
 
77
  ```
78
 
79
  ## Intended Use
80
 
81
  - Academic research on emotion in political communication
82
+ - Analysis of congressional speeches and social media
83
  - Temporal trend analysis of emotional rhetoric
84
 
85
  ## Limitations
86
 
87
  - Trained exclusively on **US English political text** — performance may degrade on other domains
 
88
  - Emotions are subjective; inter-annotator agreement on intensity scores is inherently noisy
89
  - Labels are silver-standard (LLM-generated), not human-verified gold labels
90
 
 
93
  If you use this model, please cite:
94
 
95
  ```
96
+ @misc{renault2025emotion,
97
+ author = {Renault, Thomas},
98
+ title = {thomasrenault/emotion: Multi-label emotion classifier for US political text},
99
+ year = {2025},
100
+ publisher = {HuggingFace},
101
+ url = {https://huggingface.co/thomasrenault/emotion}
102
  }
103
+ ```
labeled_data.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3a4ce676be1a930d0a20f249e1e4160487e6f60cd8d6e7fbe5dc5c09ae73ec70
3
+ size 72441116
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:57018e57be82c7dca611b9b73a55d660de05ead9f6c6ab70572836186209a20e
3
  size 267851024
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a1564b0fa17e4a3a3193e6ce6722326acbb48f6a277186092d11d3499550f59
3
  size 267851024
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:29e0ebcea2c58eb8c3b9c9a2f5862e7c5d7214d529544b9dfdad666021ba1906
3
  size 5201
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a02060ce1c1ce5ff8f312c873e62c5580ccb947b9ac8a4f70d6f4f73f23ceef6
3
  size 5201