EnricoFermi commited on
Commit
ab22f51
·
verified ·
1 Parent(s): f5f576d

Add Qwen2.5-Coder-1.5B benchmark comparison — forged model beats purpose-built coder

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -33,7 +33,7 @@ datasets:
33
 
34
  # qwen3.5-4b-code-forged
35
 
36
- **+26.6% better than baseline.** Forged from [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) for **code** tasks.
37
 
38
  **Not quantized. Not distilled. Structurally reshaped.**
39
 
@@ -61,10 +61,11 @@ The architecture co-evolves with training: heads that contribute to the domain s
61
  | StarCoder2-3B | 3B | 31.7% | — |
62
  | Qwen2.5-Coder-3B | 3B | ~31% | — |
63
  | Phi-2 | 2.7B | 47.6% | — |
 
64
  | **qwen3.5-4b-code-forged** | **3.4B** | **57.3%** | **49.4%** |
65
  | **qwen3.5-4b-code-forged Q4_K_M** | **2.6GB** | **53.0%** | **47.0%** |
66
 
67
- **+20% above Phi-2, +82% above StarCoder2-3B** in the sub-5B class.
68
 
69
  - **HumanEval**: 57.3% pass@1 (94/164 base problems)
70
  - **HumanEval+**: 49.4% pass@1 (81/164 base + extra tests)
 
33
 
34
  # qwen3.5-4b-code-forged
35
 
36
+ **Beats Qwen2.5-Coder-1.5B** a purpose-built coder pre-trained on trillions of code tokens — **with a general model forged in 3 hours.** 53.0% vs 51.8% HumanEval (Q4_K_M). Forged from [Qwen/Qwen3.5-4B](https://huggingface.co/Qwen/Qwen3.5-4B) for **code** tasks (+26.6% perplexity improvement).
37
 
38
  **Not quantized. Not distilled. Structurally reshaped.**
39
 
 
61
  | StarCoder2-3B | 3B | 31.7% | — |
62
  | Qwen2.5-Coder-3B | 3B | ~31% | — |
63
  | Phi-2 | 2.7B | 47.6% | — |
64
+ | Qwen2.5-Coder-1.5B Q4_K_M | ~1GB | 51.8% | 48.2% |
65
  | **qwen3.5-4b-code-forged** | **3.4B** | **57.3%** | **49.4%** |
66
  | **qwen3.5-4b-code-forged Q4_K_M** | **2.6GB** | **53.0%** | **47.0%** |
67
 
68
+ **Beats Qwen2.5-Coder-1.5B** (purpose-built coder, ~1GB) at Q4_K_M: 53.0% vs 51.8%. **+20% above Phi-2, +82% above StarCoder2-3B** in the sub-5B class.
69
 
70
  - **HumanEval**: 57.3% pass@1 (94/164 base problems)
71
  - **HumanEval+**: 49.4% pass@1 (81/164 base + extra tests)