Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

.DS_Store +0 -0
.gitattributes +1 -0
README.md +27 -23
labeled_data.csv +3 -0
model.safetensors +1 -1
training_args.bin +1 -1

.DS_Store CHANGED Viewed

Binary files a/.DS_Store and b/.DS_Store differ

.gitattributes CHANGED Viewed

@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 checkpoint/labeled_data.csv filter=lfs diff=lfs merge=lfs -text

 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 checkpoint/labeled_data.csv filter=lfs diff=lfs merge=lfs -text
+labeled_data.csv filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,17 +1,25 @@
 ---
 datasets:
-- thomasrenault/us_tweet_speech_congress
-language:
-- en
-base_model:
-- distilbert/distilbert-base-uncased
 pipeline_tag: text-classification
 ---
 # thomasrenault/emotion
-A multi-label emotion intensity classifier fine-tuned on US political tweets and congressional speeches.
-Built on `distilbert-base-uncased` using an **active learning** pipeline with GPT-4o-mini annotation.
 ## Labels
@@ -37,17 +45,13 @@ Scores are **independent** — multiple emotions can be high simultaneously.
 | Base model | `distilbert-base-uncased` |
 | Architecture | `DistilBertForSequenceClassification` (multi-label) |
 | Problem type | `multi_label_classification` |
-| Training data | ~216,000 labeled documents |
 | Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
-| Strategy | Active learning (uncertainty sampling, 12 rounds) |
-| Seed size | 1,000 documents (random) |
-| AL query size | 25,000 documents / round |
-| Epochs (seed) | 4 |
-| Epochs (AL) | 2 (warm-start) |
 | Learning rate | 2e-5 |
 | Batch size | 16 |
 | Max length | 512 tokens |
-| Domain | US political tweets and congressional floor speeches |
 ## Usage
@@ -68,21 +72,19 @@ def predict(text):
         probs = torch.sigmoid(model(**enc).logits).squeeze().tolist()
     return dict(zip(EMOTIONS, probs))
-print(predict("This tax policy is a disgrace. It punishes work, protects the well-connected, and pretends it’s “fair” while shifting the burden onto those who can least afford it. Calling this reform is an insult. It’s not policy, it’s a blatant failure of responsibility."))
-{'anger': 0.7810273766517639, 'sadness': 0.2732137441635132, 'fear': 0.17815309762954712, 'disgust': 0.6877616047859192, 'pride': 0.02167985402047634, 'joy': 0.002112995134666562, 'gratitude': 0.0007932100561447442, 'hope': 0.042895104736089706}
 ```
 ## Intended Use
 - Academic research on emotion in political communication
-- Analysis of congressional speeches and political social media
 - Temporal trend analysis of emotional rhetoric
 ## Limitations
 - Trained exclusively on **US English political text** — performance may degrade on other domains
-- Annotation by GPT-4o-mini may carry its own biases
 - Emotions are subjective; inter-annotator agreement on intensity scores is inherently noisy
 - Labels are silver-standard (LLM-generated), not human-verified gold labels
@@ -91,9 +93,11 @@ print(predict("This tax policy is a disgrace. It punishes work, protects the wel
 If you use this model, please cite:
 ```
-@article{algan2026emotions,
-  title={Emotions and policy views},
-  author={Algan, Y, Davoine, E., Renault, T., and Stantcheva, S},
-  year={2026}
 }
-```

 ---
+language: en
+license: mit
+tags:
+  - text-classification
+  - multi-label-classification
+  - emotion-analysis
+  - political-text
+  - tweets
+  - distilbert
 datasets:
+  - thomasrenault/us_tweet_speech_congress
+metrics:
+  - rmse
+  - mae
+base_model: distilbert-base-uncased
 pipeline_tag: text-classification
 ---
 # thomasrenault/emotion
+A multi-label emotion intensity classifier fine-tuned on US tweets, campaign speeches and congressional speeches.  Built on `distilbert-base-uncased` with GPT-4o-mini annotation via the OpenAI Batch API.
 ## Labels
 | Base model | `distilbert-base-uncased` |
 | Architecture | `DistilBertForSequenceClassification` (multi-label) |
 | Problem type | `multi_label_classification` |
+| Training data | ~200,000 labeled documents |
 | Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
+| Epochs | 4 |
 | Learning rate | 2e-5 |
 | Batch size | 16 |
 | Max length | 512 tokens |
+| Domain | US tweets about policy, campaign speeches and congressional floor speeches |
 ## Usage
         probs = torch.sigmoid(model(**enc).logits).squeeze().tolist()
     return dict(zip(EMOTIONS, probs))
+print(predict("We must stand together and fight for justice!"))
 ```
 ## Intended Use
 - Academic research on emotion in political communication
+- Analysis of congressional speeches and social media
 - Temporal trend analysis of emotional rhetoric
 ## Limitations
 - Trained exclusively on **US English political text** — performance may degrade on other domains
 - Emotions are subjective; inter-annotator agreement on intensity scores is inherently noisy
 - Labels are silver-standard (LLM-generated), not human-verified gold labels
 If you use this model, please cite:
 ```
+@misc{renault2025emotion,
+  author    = {Renault, Thomas},
+  title     = {thomasrenault/emotion: Multi-label emotion classifier for US political text},
+  year      = {2025},
+  publisher = {HuggingFace},
+  url       = {https://huggingface.co/thomasrenault/emotion}
 }
+```

labeled_data.csv ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3a4ce676be1a930d0a20f249e1e4160487e6f60cd8d6e7fbe5dc5c09ae73ec70
+size 72441116

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:57018e57be82c7dca611b9b73a55d660de05ead9f6c6ab70572836186209a20e
 size 267851024

 version https://git-lfs.github.com/spec/v1
+oid sha256:6a1564b0fa17e4a3a3193e6ce6722326acbb48f6a277186092d11d3499550f59
 size 267851024

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:29e0ebcea2c58eb8c3b9c9a2f5862e7c5d7214d529544b9dfdad666021ba1906
 size 5201

 version https://git-lfs.github.com/spec/v1
+oid sha256:a02060ce1c1ce5ff8f312c873e62c5580ccb947b9ac8a4f70d6f4f73f23ceef6
 size 5201