NBA Press Conference Sentiment - Fine-tuned RoBERTa

A RoBERTa model fine-tuned for 3-class sentiment analysis on NBA playoff press conference transcripts.

Model Description

Base model: cardiffnlp/twitter-roberta-base-sentiment

Fine-tuned on 2,050 NBA press conference speaker turns (50 hand-labeled seed turns + 2,000 GPT-4o-mini weak labels), covering Conference Finals and NBA Finals transcripts from 2013-2022 (2,790 transcripts, 23,166 speaker turns total).

Labels: NEGATIVE (0), NEUTRAL (1), POSITIVE (2)

Performance

Evaluated on a 50-turn hand-labeled seed set:

Model	Accuracy	Macro F1
This model (fine-tuned)	92%	0.932
Twitter RoBERTa (base, no fine-tune)	54%	0.467
DistilBERT SST-2	52%	0.380
FinBERT	34%	0.288

Fine-tuning closed a +38 percentage point gap over the best off-the-shelf baseline. General-purpose sentiment models fail on sports language because athletes and coaches systematically frame losses in positive terms ("we competed hard", "we'll make adjustments") rather than expressing raw negativity.

Training Details

Base model: cardiffnlp/twitter-roberta-base-sentiment
Training data: 2,050 labeled speaker turns (80/20 train/val split)
Weak labeling: GPT-4o-mini with sports-specific 3-class definitions, batched 20/call
Framework: Hugging Face Trainer
Epochs: 5 (early stopping patience=2; best checkpoint at epoch 4)
Learning rate: 2e-5 with linear warmup (10%)
Batch size: 16
Experiment tracking: MLflow

Usage

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="EgeDenizPekel/nba-press-sentiment-roberta"
)

classifier("We competed hard tonight. We'll make some adjustments and come back stronger.")
# [{'label': 'POSITIVE', 'score': 0.87}]

classifier("We got killed out there. That was embarrassing.")
# [{'label': 'NEGATIVE', 'score': 0.94}]

Research Context

Built as part of an end-to-end NLP portfolio project investigating whether post-game press conference sentiment correlates with NBA playoff outcomes.

Key finding: No statistically significant correlation between post-game sentiment and point differential (r=-0.088, p=0.30, n=141 games). Press conference framing is strategically managed and does not leak game-level performance.

Full project: press-conference-sentiment-analyzer

Downloads last month: 5

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for EgeDenizPekel/nba-press-sentiment-roberta

Base model

cardiffnlp/twitter-roberta-base-sentiment

Finetuned

(60)

this model