---
language: en
license: mit
tags:
  - text-classification
  - multi-label-classification
  - emotion-analysis
  - political-text
  - tweets
  - distilbert
datasets:
  - thomasrenault/us_tweet_speech_congress
metrics:
  - rmse
  - mae
base_model: distilbert-base-uncased
pipeline_tag: text-classification
---

# thomasrenault/emotion

A multi-label emotion intensity classifier fine-tuned on US tweets, campaign speeches and congressional speeches.  Built on `distilbert-base-uncased` with GPT-4o-mini annotation via the OpenAI Batch API.

## Labels

The model predicts **8 independent emotion intensities** (sigmoid, range 0–1):

| Label | 
|---|---|
| `anger` | 
| `sadness` | 
| `fear` | 
| `disgust` | 
| `pride` | 
| `joy` | 
| `gratitude` | 
| `hope` | 

Scores are **independent** — multiple emotions can be high simultaneously.

## Training

| Setting | Value |
|---|---|
| Base model | `distilbert-base-uncased` |
| Architecture | `DistilBertForSequenceClassification` (multi-label) |
| Problem type | `multi_label_classification` |
| Training data | ~200,000 labeled documents |
| Annotation | GPT-4o-mini (temperature=0) via OpenAI Batch API |
| Epochs | 4 |
| Learning rate | 2e-5 |
| Batch size | 16 |
| Max length | 512 tokens |
| Domain | US tweets about policy, campaign speeches and congressional floor speeches |

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "thomasrenault/emotion"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model     = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()

EMOTIONS = ["anger", "sadness", "fear", "disgust", "pride", "joy", "gratitude", "hope"]

def predict(text):
    enc = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        probs = torch.sigmoid(model(**enc).logits).squeeze().tolist()
    return dict(zip(EMOTIONS, probs))

print(predict("We must stand together and fight for justice!"))

```

## Intended Use

- Academic research on emotion in political communication
- Analysis of congressional speeches and social media
- Temporal trend analysis of emotional rhetoric

## Limitations

- Trained exclusively on **US English political text** — performance may degrade on other domains
- Emotions are subjective; inter-annotator agreement on intensity scores is inherently noisy
- Labels are silver-standard (LLM-generated), not human-verified gold labels

## Citation

If you use this model, please cite:

```
@misc{renault2025emotion,
  author    = {Renault, Thomas},
  title     = {thomasrenault/emotion: Multi-label emotion classifier for US political text},
  year      = {2025},
  publisher = {HuggingFace},
  url       = {https://huggingface.co/thomasrenault/emotion}
}
```