hexa-forge-code-7b-qwen2.5-lora-r64-v0.4.1-delegate (r41)

⚠️ LABELED EXPERIMENT — NOT GA. This is the v0.4.1 rebalanced-SFT follow-up to r40 (round 41). Rebalanced the delegation share 25% → 9%

added 4 new blocks (T4-RL-reinforce, over-delegate-counter, refusal-shape, OOD-extension) + halved LR (5e-5 → 2e-5) + doubled epochs (1 → 2). Result: basically flat vs r40 — the specialist↔routing tradeoff in 7B+LoRA SFT is fundamental, not a parameter problem. The actual v0.4.0 GA is dancinlab/hexa-forge-code-7b-qwen2.5-lora-r64-v0.4.0-rl-t4-v3-t3patch (r39, 94.29% Mk.I). Use that one for production.

Why this exists

r40 and r41 together empirically disprove SFT-only delegation training on a saturated specialist. The remaining viable path is routing-RL (GRPO with binary route-correctness reward), queued as v0.4.2.

Scores (Mk.I 665 strict + DLG-mk0 routing eval)

family	r39 GA	r40 v18 (25% del)	r41 v19 (9% del)	Δ vs r40
Mk.I overall	94.29%	82.71%	83.01%	+0.30 (flat)
T1 syntax	97.6%	76.5%	75.3%	−1.2
T2 atlas	87.0%	78.0%	85.0%	+7.0 (rambling-cover artifact)
T3 @grace	100.0%	98.8%	98.8%	0
T4 enum	100.0%	77.0%	73.0%	−4.0 ⚠
T5 HX-codes	94.8%	86.5%	89.6%	+3.1
T6 triples	95.5%	92.4%	87.9%	−4.5
T7 stdlib	87.9%	89.7%	89.7%	0
T8 refusal	90.0%	68.8%	68.8%	0 ⚠
5-NL i18n	96%	60%	52%	−8 ⚠
DLG-mk0	n/a	0.7652	0.7760	+1.08 (still <0.85 gate)

DLG-mk0 per-category (r40 → r41):

in-domain s_route: 86.25 → 87.5 (Block J slight help)
OOD-delegate s_route: 30 → 35 (still very low)
security-refuse s_route: 60 → 73.3 (+13 ✅, K+dilution helped)
long-context s_route: 90 → 60 (−30 ⚠, OOD-extension misrouted long-ctx)

Lessons (full writeup in `dancinlab/hexa-codex/lm_foundry/ROADMAP.md` r41)

SFT-only can't escape the specialist↔routing tradeoff in 7B+LoRA.
RL decision boundary can't be reinforced by 50 SFT examples (0.4% of the 12k rollouts that originally taught it).
Refusal shape needs ≥ 100 SFT pairs OR non-SFT signal.
OOD-extension causes cross-dimension routing artifacts.
5-NL is a non-trivial cross-family casualty of delegation-heavy training.

License

MIT (adapter weights). Base model: Qwen/Qwen2.5-Coder-7B.

Downloads last month: 22

Model tree for dancinlab/hexa-forge-code-7b-qwen2.5-lora-r64-v0.4.1-delegate

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Coder-7B

Adapter

(53)

this model

dancinlab
/

hexa-forge-code-7b-qwen2.5-lora-r64-v0.4.1-delegate

hexa-forge-code-7b-qwen2.5-lora-r64-v0.4.1-delegate (r41)

Why this exists

Scores (Mk.I 665 strict + DLG-mk0 routing eval)

Lessons (full writeup in `dancinlab/hexa-codex/lm_foundry/ROADMAP.md` r41)

License

Model tree for dancinlab/hexa-forge-code-7b-qwen2.5-lora-r64-v0.4.1-delegate

Dataset used to train dancinlab/hexa-forge-code-7b-qwen2.5-lora-r64-v0.4.1-delegate

hexa-forge-code-7b-qwen2.5-lora-r64-v0.4.1-delegate (r41)

Why this exists

Scores (Mk.I 665 strict + DLG-mk0 routing eval)

Lessons (full writeup in dancinlab/hexa-codex/lm_foundry/ROADMAP.md r41)

License

Model tree for dancinlab/hexa-forge-code-7b-qwen2.5-lora-r64-v0.4.1-delegate

Dataset used to train dancinlab/hexa-forge-code-7b-qwen2.5-lora-r64-v0.4.1-delegate

Lessons (full writeup in `dancinlab/hexa-codex/lm_foundry/ROADMAP.md` r41)