Qwen2.5 Tax 3B — IRS Tax Code Expert

A 3B parameter model fine-tuned on the Internal Revenue Code using a 3-stage RL pipeline:

SFT (Supervised Fine-Tuning) on 16,909 RAG-grounded Q&A pairs from all 2,113 IRC sections
DPO (Direct Preference Optimization) with 1,311 hard-negative pairs + on-policy error correction
GRPO (Group Relative Policy Optimization) with citation accuracy reward signal

Training Pipeline

Stage	Data	Iterations	Key Metric
SFT	16,909 grounded pairs	1,000	Val loss: 0.765
DPO	1,311 preference pairs	500	Loss: 0.005
GRPO	16,909 prompts	300	Avg reward: 0.978

Model Versions

v1: Initial SFT + DPO + GRPO (basic reward)
v2: Improved grounded data + hard-negative DPO (best factual accuracy)
v3: + On-policy DPO + citation accuracy reward (best citation specificity)

Usage with Ollama

# Download the GGUF
wget https://huggingface.co/dennisonb/qwen25-tax-3b/resolve/main/qwen25-tax-3b-v3-q8_0.gguf

# Create Ollama model
ollama create qwen25-tax-3b -f Modelfile

# Run
ollama run qwen25-tax-3b "What is the standard deduction for a single filer?"

Evaluation Results

Model	5-Q Score	GRPO Reward	Notes
v1	2.5/5	0.605	Frequent hallucinations
v2	4.5/5	0.828	Best factual accuracy
v3	3.5/5	0.978	Best citation specificity

Limitations

3B model cannot reliably memorize all IRC section numbers and dollar thresholds
May hallucinate specific amounts (e.g., Section 179 limits)
Best used with RAG (retrieval-augmented generation) for production
Not a substitute for professional tax advice

Training Data

All training data was generated using RAG from the actual IRC statutory text:

Source: 2,113 IRC sections parsed from the US Code
Generation: GPT-4o-mini with actual statute text in context
Validation: Cross-reference checking, citation accuracy validation
Cost: ~$9 total API cost for full dataset generation

Built With

MLX - Apple Silicon native ML framework
Qwen2.5-3B-Instruct - Base model
Ollama - Local model deployment
OpenAI Batch API - Training data generation

Downloads last month: 47

GGUF

Model size

3B params

Architecture

qwen2

Hardware compatibility

8-bit

Video Preview

Reinforcement Learning

Model tree for dennisonb/qwen25-tax-3b

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Quantized

(204)

this model