hexa-forge-code-7b-qwen2.5-lora-r64-v0.4.1-delegate (r41)

⚠️ LABELED EXPERIMENT — NOT GA. This is the v0.4.1 rebalanced-SFT follow-up to r40 (round 41). Rebalanced the delegation share 25% → 9%

  • added 4 new blocks (T4-RL-reinforce, over-delegate-counter, refusal-shape, OOD-extension) + halved LR (5e-5 → 2e-5) + doubled epochs (1 → 2). Result: basically flat vs r40 — the specialist↔routing tradeoff in 7B+LoRA SFT is fundamental, not a parameter problem. The actual v0.4.0 GA is dancinlab/hexa-forge-code-7b-qwen2.5-lora-r64-v0.4.0-rl-t4-v3-t3patch (r39, 94.29% Mk.I). Use that one for production.

Why this exists

r40 and r41 together empirically disprove SFT-only delegation training on a saturated specialist. The remaining viable path is routing-RL (GRPO with binary route-correctness reward), queued as v0.4.2.

Scores (Mk.I 665 strict + DLG-mk0 routing eval)

family r39 GA r40 v18 (25% del) r41 v19 (9% del) Δ vs r40
Mk.I overall 94.29% 82.71% 83.01% +0.30 (flat)
T1 syntax 97.6% 76.5% 75.3% −1.2
T2 atlas 87.0% 78.0% 85.0% +7.0 (rambling-cover artifact)
T3 @grace 100.0% 98.8% 98.8% 0
T4 enum 100.0% 77.0% 73.0% −4.0
T5 HX-codes 94.8% 86.5% 89.6% +3.1
T6 triples 95.5% 92.4% 87.9% −4.5
T7 stdlib 87.9% 89.7% 89.7% 0
T8 refusal 90.0% 68.8% 68.8% 0
5-NL i18n 96% 60% 52% −8
DLG-mk0 n/a 0.7652 0.7760 +1.08 (still <0.85 gate)

DLG-mk0 per-category (r40 → r41):

  • in-domain s_route: 86.25 → 87.5 (Block J slight help)
  • OOD-delegate s_route: 30 → 35 (still very low)
  • security-refuse s_route: 60 → 73.3 (+13 ✅, K+dilution helped)
  • long-context s_route: 90 → 60 (−30 ⚠, OOD-extension misrouted long-ctx)

Lessons (full writeup in dancinlab/hexa-codex/lm_foundry/ROADMAP.md r41)

  1. SFT-only can't escape the specialist↔routing tradeoff in 7B+LoRA.
  2. RL decision boundary can't be reinforced by 50 SFT examples (0.4% of the 12k rollouts that originally taught it).
  3. Refusal shape needs ≥ 100 SFT pairs OR non-SFT signal.
  4. OOD-extension causes cross-dimension routing artifacts.
  5. 5-NL is a non-trivial cross-family casualty of delegation-heavy training.

License

MIT (adapter weights). Base model: Qwen/Qwen2.5-Coder-7B.

Downloads last month
22
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dancinlab/hexa-forge-code-7b-qwen2.5-lora-r64-v0.4.1-delegate

Base model

Qwen/Qwen2.5-7B
Adapter
(53)
this model

Dataset used to train dancinlab/hexa-forge-code-7b-qwen2.5-lora-r64-v0.4.1-delegate