Qwen2.5 Tax 3B โ IRS Tax Code Expert
A 3B parameter model fine-tuned on the Internal Revenue Code using a 3-stage RL pipeline:
- SFT (Supervised Fine-Tuning) on 16,909 RAG-grounded Q&A pairs from all 2,113 IRC sections
- DPO (Direct Preference Optimization) with 1,311 hard-negative pairs + on-policy error correction
- GRPO (Group Relative Policy Optimization) with citation accuracy reward signal
Training Pipeline
| Stage | Data | Iterations | Key Metric |
|---|---|---|---|
| SFT | 16,909 grounded pairs | 1,000 | Val loss: 0.765 |
| DPO | 1,311 preference pairs | 500 | Loss: 0.005 |
| GRPO | 16,909 prompts | 300 | Avg reward: 0.978 |
Model Versions
- v1: Initial SFT + DPO + GRPO (basic reward)
- v2: Improved grounded data + hard-negative DPO (best factual accuracy)
- v3: + On-policy DPO + citation accuracy reward (best citation specificity)
Usage with Ollama
# Download the GGUF
wget https://huggingface.co/dennisonb/qwen25-tax-3b/resolve/main/qwen25-tax-3b-v3-q8_0.gguf
# Create Ollama model
ollama create qwen25-tax-3b -f Modelfile
# Run
ollama run qwen25-tax-3b "What is the standard deduction for a single filer?"
Evaluation Results
| Model | 5-Q Score | GRPO Reward | Notes |
|---|---|---|---|
| v1 | 2.5/5 | 0.605 | Frequent hallucinations |
| v2 | 4.5/5 | 0.828 | Best factual accuracy |
| v3 | 3.5/5 | 0.978 | Best citation specificity |
Limitations
- 3B model cannot reliably memorize all IRC section numbers and dollar thresholds
- May hallucinate specific amounts (e.g., Section 179 limits)
- Best used with RAG (retrieval-augmented generation) for production
- Not a substitute for professional tax advice
Training Data
All training data was generated using RAG from the actual IRC statutory text:
- Source: 2,113 IRC sections parsed from the US Code
- Generation: GPT-4o-mini with actual statute text in context
- Validation: Cross-reference checking, citation accuracy validation
- Cost: ~$9 total API cost for full dataset generation
Built With
- MLX - Apple Silicon native ML framework
- Qwen2.5-3B-Instruct - Base model
- Ollama - Local model deployment
- OpenAI Batch API - Training data generation
- Downloads last month
- 47
Hardware compatibility
Log In to add your hardware
8-bit