qwen25-tax-3b-v5-adapters

LoRA adapters for Qwen 2.5 3B Instruct fine-tuned on the IRS Tax Code (IRC Title 26 + 26 CFR Treasury Regulations) using a three-stage pipeline.

v5 Training Details

Stage	Dataset Size	Iterations	Notes
SFT	99K examples	1,500 iters	LoRA rank 32, lr 1e-5, cosine decay
DPO	23K pairs	1,500 iters	Fixed length normalization bug from v4
GRPO	99K prompts	1,000 iters	Rule-based rewards: citation accuracy + completeness

v5 Improvements over v4

Fixed DPO length normalization bug — DPO loss now correctly normalized per token
Inflation upsampling 20x — IRC inflation adjustment sections upsampled to improve coverage
CFR data included — 26 CFR Treasury Regulations added to all three training stages
Larger SFT and GRPO datasets (99K vs ~50K in v4)
More DPO iterations (1,500 vs 1,000 in v4)

Base Model

Model: Qwen/Qwen2.5-3B-Instruct
Architecture: Transformer, 3B parameters
Context window: 2,048 tokens during training

Adapter Files

sft/    — Stage 1 adapter (after supervised fine-tuning)
  adapter_config.json
  adapters.safetensors      (~102 MB)
  0000200_adapters.safetensors  (checkpoint at step 200)
  0000400_adapters.safetensors
  0000600_adapters.safetensors
  0000800_adapters.safetensors
  0001000_adapters.safetensors
  0001200_adapters.safetensors
  0001400_adapters.safetensors

dpo/    — Stage 2 adapter (after DPO on top of SFT)
  adapter_config.json
  adapters.safetensors      (~102 MB)
  adapters_best.safetensors (~102 MB, best checkpoint)

grpo/   — Stage 3 adapter (final, after GRPO on top of DPO)
  adapter_config.json
  adapters.safetensors      (~102 MB)  ← recommended
  adapters_best.safetensors (~102 MB, best checkpoint)

Usage

These are MLX LoRA adapters. To use with mlx_lm:

pip install mlx-lm

# Inference with GRPO adapter (final stage)
python -m mlx_lm.generate \
  --model Qwen/Qwen2.5-3B-Instruct \
  --adapter-path grpo/ \
  --prompt "What is the standard deduction for a single filer in 2024?"

Training Hardware

Trained on Apple M4 Max (128 GB unified memory) using mlx_lm.

Repository

Source code: https://github.com/dennisonbertram/rl-irs-tax-code

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dennisonb/qwen25-tax-3b-v5-adapters

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Adapter

(1133)

this model