Georgia Sports Llama 3 DPO

A fine-tuned version of Meta Llama 3.1 8B Instruct, trained with Direct Preference Optimization (DPO) on Georgia high school sports content from GPB Sports.

This model is designed to answer questions about Georgia high school athletics — football, basketball, baseball, and more — with accurate, well-sourced responses grounded in real sports journalism.

What is DPO?

Direct Preference Optimization is a technique for aligning language models with human preferences. Instead of just training a model on "correct" answers, DPO shows the model pairs of responses — one chosen (better) and one rejected (worse) — and teaches it to prefer the style and quality of the better response.

Think of it like a coach reviewing game film: "This play was good, this one wasn't — learn the difference."

How This Model Was Made

Step 1: Collect Source Material

We scraped ~1,350 articles from GPB Sports covering Georgia high school athletics from April 2022 through September 2025.

Step 2: Generate Questions

For each article, a local LLM (Mistral 7B) read the article and generated 3 challenging questions that could be answered using the article's content. This produced 4,078 question-answer prompts.

Step 3: Generate Competing Answers

Two different local models answered each question independently:

  • Llama 3.1 8B
  • Mistral 7B Instruct

Both models received the full article text as context and generated their best answer.

Step 4: Judge the Answers

A judge model (Llama 3.1 8B) scored each response on a 1–5 scale for accuracy, specificity, and faithfulness to the source article. The higher-scoring response became the chosen response; the lower-scoring one became the rejected response. Ties were discarded.

Step 5: DPO Fine-Tuning

The 4,078 chosen/rejected pairs were used to fine-tune Llama 3.1 8B Instruct using:

  • QLoRA (4-bit quantization + Low-Rank Adaptation) to fit training on a single GPU
  • TRL's DPOTrainer from Hugging Face
  • Training on a Google Colab T4 GPU

Training Details

Parameter Value
Base model meta-llama/Llama-3.1-8B-Instruct
Method DPO with QLoRA
Quantization 4-bit NormalFloat (NF4)
LoRA rank 16
LoRA alpha 32
LoRA target modules q_proj, k_proj, v_proj, o_proj
Learning rate 5e-5
Batch size 2 (with 4× gradient accumulation)
Epochs 3
DPO beta 0.1
Max sequence length 1024
Optimizer Paged AdamW 8-bit
Hardware Google Colab T4 16GB

Dataset Summary

Stat Value
Total DPO pairs 4,078
Source articles ~1,350
Date range April 2022 – September 2025
Topic Georgia high school sports (GHSA)
Avg chosen response length 525 characters
Avg rejected response length 467 characters

Which model "won" more often?

Model Times Chosen (Better) Times Rejected (Worse)
Llama 3.1 8B 2,349 (57.6%) 1,729 (42.4%)
Mistral 7B Instruct 1,729 (42.4%) 2,349 (57.6%)

Rating distributions

Chosen responses:

Rating Count
5 1,203
4 2,759
3 109
2 7

Rejected responses:

Rating Count
4 1,079
3 1,420
2 1,560
1 19

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "kslote/georgia-sports-llama3-dpo"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

messages = [
    {"role": "system", "content": "You are a knowledgeable Georgia high school sports analyst."},
    {"role": "user", "content": "Who were the top football programs in Georgia in 2024?"},
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7, do_sample=True)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Limitations

  • Knowledge cutoff: The training data covers April 2022 through September 2025. The model has no knowledge of events after that period.
  • Georgia-specific: This model is specialized for Georgia high school sports (GHSA). It may not perform well on questions about other states or college/professional sports.
  • Source-dependent: The training data comes exclusively from GPB Sports. Coverage may be uneven across sports, regions, or school classifications.
  • Small model: At 8B parameters with QLoRA, this model trades some capability for accessibility. It can run on consumer hardware but won't match larger models on complex reasoning.

Dataset

The full DPO training dataset is available at kslote/georgia-high-school-sports.

Acknowledgments

Downloads last month
588
Safetensors
Model size
8B params
Tensor type
F32
·
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for round-bird/georgia-sports-llama3-v1

Quantized
(615)
this model

Dataset used to train round-bird/georgia-sports-llama3-v1

Space using round-bird/georgia-sports-llama3-v1 1