qwen25-tax-3b-v5-adapters
LoRA adapters for Qwen 2.5 3B Instruct fine-tuned on the IRS Tax Code (IRC Title 26 + 26 CFR Treasury Regulations) using a three-stage pipeline.
v5 Training Details
| Stage | Dataset Size | Iterations | Notes |
|---|---|---|---|
| SFT | 99K examples | 1,500 iters | LoRA rank 32, lr 1e-5, cosine decay |
| DPO | 23K pairs | 1,500 iters | Fixed length normalization bug from v4 |
| GRPO | 99K prompts | 1,000 iters | Rule-based rewards: citation accuracy + completeness |
v5 Improvements over v4
- Fixed DPO length normalization bug β DPO loss now correctly normalized per token
- Inflation upsampling 20x β IRC inflation adjustment sections upsampled to improve coverage
- CFR data included β 26 CFR Treasury Regulations added to all three training stages
- Larger SFT and GRPO datasets (99K vs ~50K in v4)
- More DPO iterations (1,500 vs 1,000 in v4)
Base Model
- Model:
Qwen/Qwen2.5-3B-Instruct - Architecture: Transformer, 3B parameters
- Context window: 2,048 tokens during training
Adapter Files
sft/ β Stage 1 adapter (after supervised fine-tuning)
adapter_config.json
adapters.safetensors (~102 MB)
0000200_adapters.safetensors (checkpoint at step 200)
0000400_adapters.safetensors
0000600_adapters.safetensors
0000800_adapters.safetensors
0001000_adapters.safetensors
0001200_adapters.safetensors
0001400_adapters.safetensors
dpo/ β Stage 2 adapter (after DPO on top of SFT)
adapter_config.json
adapters.safetensors (~102 MB)
adapters_best.safetensors (~102 MB, best checkpoint)
grpo/ β Stage 3 adapter (final, after GRPO on top of DPO)
adapter_config.json
adapters.safetensors (~102 MB) β recommended
adapters_best.safetensors (~102 MB, best checkpoint)
Usage
These are MLX LoRA adapters. To use with mlx_lm:
pip install mlx-lm
# Inference with GRPO adapter (final stage)
python -m mlx_lm.generate \
--model Qwen/Qwen2.5-3B-Instruct \
--adapter-path grpo/ \
--prompt "What is the standard deduction for a single filer in 2024?"
Training Hardware
Trained on Apple M4 Max (128 GB unified memory) using mlx_lm.
Repository
Source code: https://github.com/dennisonbertram/rl-irs-tax-code
Hardware compatibility
Log In to add your hardware
Quantized
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support