qwen25-tax-3b-v5-adapters

LoRA adapters for Qwen 2.5 3B Instruct fine-tuned on the IRS Tax Code (IRC Title 26 + 26 CFR Treasury Regulations) using a three-stage pipeline.

v5 Training Details

Stage Dataset Size Iterations Notes
SFT 99K examples 1,500 iters LoRA rank 32, lr 1e-5, cosine decay
DPO 23K pairs 1,500 iters Fixed length normalization bug from v4
GRPO 99K prompts 1,000 iters Rule-based rewards: citation accuracy + completeness

v5 Improvements over v4

  • Fixed DPO length normalization bug β€” DPO loss now correctly normalized per token
  • Inflation upsampling 20x β€” IRC inflation adjustment sections upsampled to improve coverage
  • CFR data included β€” 26 CFR Treasury Regulations added to all three training stages
  • Larger SFT and GRPO datasets (99K vs ~50K in v4)
  • More DPO iterations (1,500 vs 1,000 in v4)

Base Model

  • Model: Qwen/Qwen2.5-3B-Instruct
  • Architecture: Transformer, 3B parameters
  • Context window: 2,048 tokens during training

Adapter Files

sft/    β€” Stage 1 adapter (after supervised fine-tuning)
  adapter_config.json
  adapters.safetensors      (~102 MB)
  0000200_adapters.safetensors  (checkpoint at step 200)
  0000400_adapters.safetensors
  0000600_adapters.safetensors
  0000800_adapters.safetensors
  0001000_adapters.safetensors
  0001200_adapters.safetensors
  0001400_adapters.safetensors

dpo/    β€” Stage 2 adapter (after DPO on top of SFT)
  adapter_config.json
  adapters.safetensors      (~102 MB)
  adapters_best.safetensors (~102 MB, best checkpoint)

grpo/   β€” Stage 3 adapter (final, after GRPO on top of DPO)
  adapter_config.json
  adapters.safetensors      (~102 MB)  ← recommended
  adapters_best.safetensors (~102 MB, best checkpoint)

Usage

These are MLX LoRA adapters. To use with mlx_lm:

pip install mlx-lm

# Inference with GRPO adapter (final stage)
python -m mlx_lm.generate \
  --model Qwen/Qwen2.5-3B-Instruct \
  --adapter-path grpo/ \
  --prompt "What is the standard deduction for a single filer in 2024?"

Training Hardware

Trained on Apple M4 Max (128 GB unified memory) using mlx_lm.

Repository

Source code: https://github.com/dennisonbertram/rl-irs-tax-code

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for dennisonb/qwen25-tax-3b-v5-adapters

Base model

Qwen/Qwen2.5-3B
Adapter
(1133)
this model