Qwen2.5 Tax 3B โ€” IRS Tax Code Expert

A 3B parameter model fine-tuned on the Internal Revenue Code using a 3-stage RL pipeline:

  1. SFT (Supervised Fine-Tuning) on 16,909 RAG-grounded Q&A pairs from all 2,113 IRC sections
  2. DPO (Direct Preference Optimization) with 1,311 hard-negative pairs + on-policy error correction
  3. GRPO (Group Relative Policy Optimization) with citation accuracy reward signal

Training Pipeline

Stage Data Iterations Key Metric
SFT 16,909 grounded pairs 1,000 Val loss: 0.765
DPO 1,311 preference pairs 500 Loss: 0.005
GRPO 16,909 prompts 300 Avg reward: 0.978

Model Versions

  • v1: Initial SFT + DPO + GRPO (basic reward)
  • v2: Improved grounded data + hard-negative DPO (best factual accuracy)
  • v3: + On-policy DPO + citation accuracy reward (best citation specificity)

Usage with Ollama

# Download the GGUF
wget https://huggingface.co/dennisonb/qwen25-tax-3b/resolve/main/qwen25-tax-3b-v3-q8_0.gguf

# Create Ollama model
ollama create qwen25-tax-3b -f Modelfile

# Run
ollama run qwen25-tax-3b "What is the standard deduction for a single filer?"

Evaluation Results

Model 5-Q Score GRPO Reward Notes
v1 2.5/5 0.605 Frequent hallucinations
v2 4.5/5 0.828 Best factual accuracy
v3 3.5/5 0.978 Best citation specificity

Limitations

  • 3B model cannot reliably memorize all IRC section numbers and dollar thresholds
  • May hallucinate specific amounts (e.g., Section 179 limits)
  • Best used with RAG (retrieval-augmented generation) for production
  • Not a substitute for professional tax advice

Training Data

All training data was generated using RAG from the actual IRC statutory text:

  • Source: 2,113 IRC sections parsed from the US Code
  • Generation: GPT-4o-mini with actual statute text in context
  • Validation: Cross-reference checking, citation accuracy validation
  • Cost: ~$9 total API cost for full dataset generation

Built With

  • MLX - Apple Silicon native ML framework
  • Qwen2.5-3B-Instruct - Base model
  • Ollama - Local model deployment
  • OpenAI Batch API - Training data generation
Downloads last month
47
GGUF
Model size
3B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

8-bit

Video Preview
loading

Model tree for dennisonb/qwen25-tax-3b

Base model

Qwen/Qwen2.5-3B
Quantized
(204)
this model