Pythia-14M Sentiment Classifier
A sentiment classification model built on EleutherAI's Pythia-14M (14.1M parameters), fine-tuned for binary movie review sentiment prediction.
Key Results
| Metric | Best | Mean ± Std (3 seeds) |
|---|---|---|
| Test Accuracy | 70.5% | 70.2% ± 0.2% |
| Test F1 | 0.7401 | 0.7153 |
| Test Precision | 0.7000 | |
| Test Recall | 0.7850 |
What Makes This Interesting
This is a 14 million parameter model doing real sentiment classification at 70.5% accuracy. Found via Bayesian hyperparameter optimization (Optuna TPE, 40 trials) over full fine-tuning, LoRA, and frozen backbone configurations.
Full fine-tuning won decisively at this scale — all top 10 configurations were full FT. LoRA and frozen backbone couldn't compete on a model this small.
Training Details
- Base Model: EleutherAI/pythia-14m
- Architecture: GPT-NeoX backbone + classification head (Dropout → Linear → GELU → Dropout → Linear)
- Dataset: jtatman/movie_sentiment_reviews (1600 train / 200 val / 200 test)
- Method: Full fine-tuning with differential learning rates (backbone LR × 10 for classifier head)
Optimal Hyperparameters (Bayesian Search)
| Parameter | Value |
|---|---|
| Learning Rate | 7.36e-05 |
| Batch Size | 16 |
| Epochs | 3 |
| Max Seq Length | 512 |
| Weight Decay | 0.0017 |
| Warmup Ratio | 16.5% |
| LR Scheduler | cosine |
Classification Report (Best Seed)
precision recall f1-score support
negative 0.71 0.61 0.66 93
positive 0.70 0.79 0.74 107
accuracy 0.70 200
macro avg 0.71 0.70 0.70 200
weighted avg 0.71 0.70 0.70 200
Confusion Matrix
Predicted
Neg Pos
Actual Neg [ 57 36 ]
Actual Pos [ 23 84 ]
Usage
import torch
from transformers import AutoTokenizer, GPTNeoXModel
from huggingface_hub import hf_hub_download
import torch.nn as nn
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-14m")
tokenizer.pad_token = tokenizer.eos_token
# Load model (see repo for full class definition)
# Quick inference example:
text = "This movie was absolutely fantastic, I loved every minute!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding="max_length")
# ... load model and predict
Limitations
- English only
- Trained on movie reviews — domain transfer to other sentiment tasks not tested
- Small dataset (1600 training examples) — more data would likely improve performance
- 14M params inherently limits capacity
Training Infrastructure
- GPU: NVIDIA RTX 3050 (4GB VRAM)
- Training time: ~3 minutes per run
- Sweep: 40 Bayesian trials via Optuna TPE sampler
Citation
@misc{pythia-14m-sentiment,
author = {James Tatman},
title = {Pythia-14M Sentiment Classifier},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/jtatman/pythia-14m-sentiment}
}
- Downloads last month
- 20
Dataset used to train jtatman/pythia-14m-sentiment
Evaluation results
- Accuracy on movie_sentiment_reviewsself-reported0.705
- F1 on movie_sentiment_reviewsself-reported0.740