Georgia Sports Llama 3 DPO

A fine-tuned version of Meta Llama 3.1 8B Instruct, trained with Direct Preference Optimization (DPO) on Georgia high school sports content from GPB Sports.

This model is designed to answer questions about Georgia high school athletics — football, basketball, baseball, and more — with accurate, well-sourced responses grounded in real sports journalism.

What is DPO?

Direct Preference Optimization is a technique for aligning language models with human preferences. Instead of just training a model on "correct" answers, DPO shows the model pairs of responses — one chosen (better) and one rejected (worse) — and teaches it to prefer the style and quality of the better response.

Think of it like a coach reviewing game film: "This play was good, this one wasn't — learn the difference."

How This Model Was Made

Step 1: Collect Source Material

We scraped ~1,350 articles from GPB Sports covering Georgia high school athletics from April 2022 through September 2025.

Step 2: Generate Questions

For each article, a local LLM (Mistral 7B) read the article and generated 3 challenging questions that could be answered using the article's content. This produced 4,078 question-answer prompts.

Step 3: Generate Competing Answers

Two different local models answered each question independently:

Llama 3.1 8B
Mistral 7B Instruct

Both models received the full article text as context and generated their best answer.

Step 4: Judge the Answers

A judge model (Llama 3.1 8B) scored each response on a 1–5 scale for accuracy, specificity, and faithfulness to the source article. The higher-scoring response became the chosen response; the lower-scoring one became the rejected response. Ties were discarded.

Step 5: DPO Fine-Tuning

The 4,078 chosen/rejected pairs were used to fine-tune Llama 3.1 8B Instruct using:

QLoRA (4-bit quantization + Low-Rank Adaptation) to fit training on a single GPU
TRL's DPOTrainer from Hugging Face
Training on a Google Colab T4 GPU

Training Details

Parameter	Value
Base model	`meta-llama/Llama-3.1-8B-Instruct`
Method	DPO with QLoRA
Quantization	4-bit NormalFloat (NF4)
LoRA rank	16
LoRA alpha	32
LoRA target modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`
Learning rate	5e-5
Batch size	2 (with 4× gradient accumulation)
Epochs	3
DPO beta	0.1
Max sequence length	1024
Optimizer	Paged AdamW 8-bit
Hardware	Google Colab T4 16GB

Dataset Summary

Stat	Value
Total DPO pairs	4,078
Source articles	~1,350
Date range	April 2022 – September 2025
Topic	Georgia high school sports (GHSA)
Avg chosen response length	525 characters
Avg rejected response length	467 characters

Which model "won" more often?

Model	Times Chosen (Better)	Times Rejected (Worse)
Llama 3.1 8B	2,349 (57.6%)	1,729 (42.4%)
Mistral 7B Instruct	1,729 (42.4%)	2,349 (57.6%)

Rating distributions

Chosen responses:

Rating	Count
5	1,203
4	2,759
3	109
2	7

Rejected responses:

Rating	Count
4	1,079
3	1,420
2	1,560
1	19

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "kslote/georgia-sports-llama3-dpo"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")

messages = [
    {"role": "system", "content": "You are a knowledgeable Georgia high school sports analyst."},
    {"role": "user", "content": "Who were the top football programs in Georgia in 2024?"},
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7, do_sample=True)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Limitations

Knowledge cutoff: The training data covers April 2022 through September 2025. The model has no knowledge of events after that period.
Georgia-specific: This model is specialized for Georgia high school sports (GHSA). It may not perform well on questions about other states or college/professional sports.
Source-dependent: The training data comes exclusively from GPB Sports. Coverage may be uneven across sports, regions, or school classifications.
Small model: At 8B parameters with QLoRA, this model trades some capability for accessibility. It can run on consumer hardware but won't match larger models on complex reasoning.

Dataset

The full DPO training dataset is available at kslote/georgia-high-school-sports.

Acknowledgments

Source articles from GPB Sports — Georgia's home for high school sports coverage
Built with Hugging Face TRL, PEFT, and bitsandbytes
DPO preference pairs generated locally using Ollama with Llama 3.1 8B and Mistral 7B on Apple Silicon

Downloads last month: 588

Safetensors

Model size

8B params

Tensor type

F32

BF16

Model tree for round-bird/georgia-sports-llama3-v1

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Quantized

(615)

this model

round-bird
/

georgia-sports-llama3-v1