Pythia-14M Sentiment Classifier

A sentiment classification model built on EleutherAI's Pythia-14M (14.1M parameters), fine-tuned for binary movie review sentiment prediction.

Key Results

Metric Best Mean ± Std (3 seeds)
Test Accuracy 70.5% 70.2% ± 0.2%
Test F1 0.7401 0.7153
Test Precision 0.7000
Test Recall 0.7850

What Makes This Interesting

This is a 14 million parameter model doing real sentiment classification at 70.5% accuracy. Found via Bayesian hyperparameter optimization (Optuna TPE, 40 trials) over full fine-tuning, LoRA, and frozen backbone configurations.

Full fine-tuning won decisively at this scale — all top 10 configurations were full FT. LoRA and frozen backbone couldn't compete on a model this small.

Training Details

  • Base Model: EleutherAI/pythia-14m
  • Architecture: GPT-NeoX backbone + classification head (Dropout → Linear → GELU → Dropout → Linear)
  • Dataset: jtatman/movie_sentiment_reviews (1600 train / 200 val / 200 test)
  • Method: Full fine-tuning with differential learning rates (backbone LR × 10 for classifier head)

Optimal Hyperparameters (Bayesian Search)

Parameter Value
Learning Rate 7.36e-05
Batch Size 16
Epochs 3
Max Seq Length 512
Weight Decay 0.0017
Warmup Ratio 16.5%
LR Scheduler cosine

Classification Report (Best Seed)

              precision    recall  f1-score   support

    negative       0.71      0.61      0.66        93
    positive       0.70      0.79      0.74       107

    accuracy                           0.70       200
   macro avg       0.71      0.70      0.70       200
weighted avg       0.71      0.70      0.70       200

Confusion Matrix

              Predicted
              Neg    Pos
Actual Neg  [  57     36 ]
Actual Pos  [  23     84 ]

Usage

import torch
from transformers import AutoTokenizer, GPTNeoXModel
from huggingface_hub import hf_hub_download
import torch.nn as nn

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-14m")
tokenizer.pad_token = tokenizer.eos_token

# Load model (see repo for full class definition)
# Quick inference example:
text = "This movie was absolutely fantastic, I loved every minute!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512, padding="max_length")
# ... load model and predict

Limitations

  • English only
  • Trained on movie reviews — domain transfer to other sentiment tasks not tested
  • Small dataset (1600 training examples) — more data would likely improve performance
  • 14M params inherently limits capacity

Training Infrastructure

  • GPU: NVIDIA RTX 3050 (4GB VRAM)
  • Training time: ~3 minutes per run
  • Sweep: 40 Bayesian trials via Optuna TPE sampler

Citation

@misc{pythia-14m-sentiment,
  author = {James Tatman},
  title = {Pythia-14M Sentiment Classifier},
  year = {2026},
  publisher = {HuggingFace},
  url = {https://huggingface.co/jtatman/pythia-14m-sentiment}
}
Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train jtatman/pythia-14m-sentiment

Evaluation results