Pythia-14M Sentiment Classifier

A sentiment classification model built on EleutherAI's Pythia-14M (14.1M parameters), fine-tuned for binary movie review sentiment prediction.

Key Results

Metric	Best	Mean ± Std (3 seeds)
Test Accuracy	70.5%	70.2% ± 0.2%
Test F1	0.7401	0.7153
Test Precision	0.7000
Test Recall	0.7850

What Makes This Interesting

This is a 14 million parameter model doing real sentiment classification at 70.5% accuracy. Found via Bayesian hyperparameter optimization (Optuna TPE, 40 trials) over full fine-tuning, LoRA, and frozen backbone configurations.

Full fine-tuning won decisively at this scale — all top 10 configurations were full FT. LoRA and frozen backbone couldn't compete on a model this small.

Training Details

Base Model: EleutherAI/pythia-14m
Architecture: GPT-NeoX backbone + classification head (Dropout → Linear → GELU → Dropout → Linear)
Dataset: jtatman/movie_sentiment_reviews (1600 train / 200 val / 200 test)
Method: Full fine-tuning with differential learning rates (backbone LR × 10 for classifier head)

Optimal Hyperparameters (Bayesian Search)

Parameter	Value
Learning Rate	7.36e-05
Batch Size	16
Epochs	3
Max Seq Length	512
Weight Decay	0.0017
Warmup Ratio	16.5%
LR Scheduler	cosine

Classification Report (Best Seed)

              precision    recall  f1-score   support

    negative       0.71      0.61      0.66        93
    positive       0.70      0.79      0.74       107

    accuracy                           0.70       200
   macro avg       0.71      0.70      0.70       200
weighted avg       0.71      0.70      0.70       200

Confusion Matrix

              Predicted
              Neg    Pos
Actual Neg  [  57     36 ]
Actual Pos  [  23     84 ]

Usage

import torch
from transformers import AutoTokenizer, GPTNeoXModel
from huggingface_hub import hf_hub_download
import torch.nn as nn

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-14m")
tokenizer.pad_token = tokenizer.eos_token

# Load model (see repo for full class definition)
# Quick inference example:
text = "This movie was absolutely fantastic, I loved every minute!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding="max_length")
# ... load model and predict

Limitations

English only
Trained on movie reviews — domain transfer to other sentiment tasks not tested
Small dataset (1600 training examples) — more data would likely improve performance
14M params inherently limits capacity

Training Infrastructure

GPU: NVIDIA RTX 3050 (4GB VRAM)
Training time: ~3 minutes per run
Sweep: 40 Bayesian trials via Optuna TPE sampler

Citation

@misc{pythia-14m-sentiment,
  author = {James Tatman},
  title = {Pythia-14M Sentiment Classifier},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/jtatman/pythia-14m-sentiment}
}

Downloads last month: 20

Dataset used to train jtatman/pythia-14m-sentiment

Evaluation results

Accuracy on movie_sentiment_reviews
self-reported

0.705
F1 on movie_sentiment_reviews
self-reported

0.740