Georgia Sports Llama 3 DPO
A fine-tuned version of Meta Llama 3.1 8B Instruct, trained with Direct Preference Optimization (DPO) on Georgia high school sports content from GPB Sports.
This model is designed to answer questions about Georgia high school athletics — football, basketball, baseball, and more — with accurate, well-sourced responses grounded in real sports journalism.
What is DPO?
Direct Preference Optimization is a technique for aligning language models with human preferences. Instead of just training a model on "correct" answers, DPO shows the model pairs of responses — one chosen (better) and one rejected (worse) — and teaches it to prefer the style and quality of the better response.
Think of it like a coach reviewing game film: "This play was good, this one wasn't — learn the difference."
How This Model Was Made
Step 1: Collect Source Material
We scraped ~1,350 articles from GPB Sports covering Georgia high school athletics from April 2022 through September 2025.
Step 2: Generate Questions
For each article, a local LLM (Mistral 7B) read the article and generated 3 challenging questions that could be answered using the article's content. This produced 4,078 question-answer prompts.
Step 3: Generate Competing Answers
Two different local models answered each question independently:
- Llama 3.1 8B
- Mistral 7B Instruct
Both models received the full article text as context and generated their best answer.
Step 4: Judge the Answers
A judge model (Llama 3.1 8B) scored each response on a 1–5 scale for accuracy, specificity, and faithfulness to the source article. The higher-scoring response became the chosen response; the lower-scoring one became the rejected response. Ties were discarded.
Step 5: DPO Fine-Tuning
The 4,078 chosen/rejected pairs were used to fine-tune Llama 3.1 8B Instruct using:
- QLoRA (4-bit quantization + Low-Rank Adaptation) to fit training on a single GPU
- TRL's DPOTrainer from Hugging Face
- Training on a Google Colab T4 GPU
Training Details
| Parameter | Value |
|---|---|
| Base model | meta-llama/Llama-3.1-8B-Instruct |
| Method | DPO with QLoRA |
| Quantization | 4-bit NormalFloat (NF4) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| LoRA target modules | q_proj, k_proj, v_proj, o_proj |
| Learning rate | 5e-5 |
| Batch size | 2 (with 4× gradient accumulation) |
| Epochs | 3 |
| DPO beta | 0.1 |
| Max sequence length | 1024 |
| Optimizer | Paged AdamW 8-bit |
| Hardware | Google Colab T4 16GB |
Dataset Summary
| Stat | Value |
|---|---|
| Total DPO pairs | 4,078 |
| Source articles | ~1,350 |
| Date range | April 2022 – September 2025 |
| Topic | Georgia high school sports (GHSA) |
| Avg chosen response length | 525 characters |
| Avg rejected response length | 467 characters |
Which model "won" more often?
| Model | Times Chosen (Better) | Times Rejected (Worse) |
|---|---|---|
| Llama 3.1 8B | 2,349 (57.6%) | 1,729 (42.4%) |
| Mistral 7B Instruct | 1,729 (42.4%) | 2,349 (57.6%) |
Rating distributions
Chosen responses:
| Rating | Count |
|---|---|
| 5 | 1,203 |
| 4 | 2,759 |
| 3 | 109 |
| 2 | 7 |
Rejected responses:
| Rating | Count |
|---|---|
| 4 | 1,079 |
| 3 | 1,420 |
| 2 | 1,560 |
| 1 | 19 |
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "kslote/georgia-sports-llama3-dpo"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto")
messages = [
{"role": "system", "content": "You are a knowledgeable Georgia high school sports analyst."},
{"role": "user", "content": "Who were the top football programs in Georgia in 2024?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
Limitations
- Knowledge cutoff: The training data covers April 2022 through September 2025. The model has no knowledge of events after that period.
- Georgia-specific: This model is specialized for Georgia high school sports (GHSA). It may not perform well on questions about other states or college/professional sports.
- Source-dependent: The training data comes exclusively from GPB Sports. Coverage may be uneven across sports, regions, or school classifications.
- Small model: At 8B parameters with QLoRA, this model trades some capability for accessibility. It can run on consumer hardware but won't match larger models on complex reasoning.
Dataset
The full DPO training dataset is available at kslote/georgia-high-school-sports.
Acknowledgments
- Source articles from GPB Sports — Georgia's home for high school sports coverage
- Built with Hugging Face TRL, PEFT, and bitsandbytes
- DPO preference pairs generated locally using Ollama with Llama 3.1 8B and Mistral 7B on Apple Silicon
- Downloads last month
- 588
Model tree for round-bird/georgia-sports-llama3-v1
Base model
meta-llama/Llama-3.1-8B