continuum-ai
/

qwen3.5-4b-code-forged

EnricoFermi commited on Mar 30

Commit

ab22f51

verified ·

1 Parent(s): f5f576d

Add Qwen2.5-Coder-1.5B benchmark comparison — forged model beats purpose-built coder

Files changed (1) hide show

README.md CHANGED Viewed

@@ -33,7 +33,7 @@ datasets:
 # qwen3.5-4b-code-forged
-**+26.6% better than baseline.** Forged from [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) for **code** tasks.
 **Not quantized. Not distilled. Structurally reshaped.**
@@ -61,10 +61,11 @@ The architecture co-evolves with training: heads that contribute to the domain s
 | StarCoder2-3B | 3B | 31.7% | — |
 | Qwen2.5-Coder-3B | 3B | ~31% | — |
 | Phi-2 | 2.7B | 47.6% | — |
 | **qwen3.5-4b-code-forged** | **3.4B** | **57.3%** | **49.4%** |
 | **qwen3.5-4b-code-forged Q4_K_M** | **2.6GB** | **53.0%** | **47.0%** |
-**+20% above Phi-2, +82% above StarCoder2-3B** in the sub-5B class.
 - **HumanEval**: 57.3% pass@1 (94/164 base problems)
 - **HumanEval+**: 49.4% pass@1 (81/164 base + extra tests)

 # qwen3.5-4b-code-forged
+**Beats Qwen2.5-Coder-1.5B** — a purpose-built coder pre-trained on trillions of code tokens — **with a general model forged in 3 hours.** 53.0% vs 51.8% HumanEval (Q4_K_M). Forged from [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) for **code** tasks (+26.6% perplexity improvement).
 **Not quantized. Not distilled. Structurally reshaped.**
 | StarCoder2-3B | 3B | 31.7% | — |
 | Qwen2.5-Coder-3B | 3B | ~31% | — |
 | Phi-2 | 2.7B | 47.6% | — |
+| Qwen2.5-Coder-1.5B Q4_K_M | ~1GB | 51.8% | 48.2% |
 | **qwen3.5-4b-code-forged** | **3.4B** | **57.3%** | **49.4%** |
 | **qwen3.5-4b-code-forged Q4_K_M** | **2.6GB** | **53.0%** | **47.0%** |
+**Beats Qwen2.5-Coder-1.5B** (purpose-built coder, ~1GB) at Q4_K_M: 53.0% vs 51.8%. **+20% above Phi-2, +82% above StarCoder2-3B** in the sub-5B class.
 - **HumanEval**: 57.3% pass@1 (94/164 base problems)
 - **HumanEval+**: 49.4% pass@1 (81/164 base + extra tests)