continuum-ai
/

qwen3.5-4b-code-forged

EnricoFermi commited on Mar 30

Commit

44c62d8

verified ·

1 Parent(s): 73d687b

Add HumanEval benchmark results (57.3% pass@1)

Files changed (1) hide show

README.md CHANGED Viewed

@@ -54,6 +54,22 @@ The architecture co-evolves with training: heads that contribute to the domain s
 | Cycles | 3 |
 | Steps/Cycle | 500 |
 ## Runs On
 | Device | Format | Verified |

 | Cycles | 3 |
 | Steps/Cycle | 500 |
+## Benchmarks
+| Model | Size | HumanEval | HumanEval+ |
+|-------|------|-----------|------------|
+| StarCoder2-3B | 3B | 31.7% | — |
+| Qwen2.5-Coder-3B | 3B | ~31% | — |
+| Phi-2 | 2.7B | 47.6% | — |
+| **qwen3.5-4b-code-forged** | **3.4B** | **57.3%** | **49.4%** |
+**+20% above Phi-2, +82% above StarCoder2-3B** in the sub-5B class.
+- **HumanEval**: 57.3% pass@1 (94/164 base problems)
+- **HumanEval+**: 49.4% pass@1 (81/164 base + extra tests)
+- **Method**: Greedy decoding (temperature 0), single sample, EvalPlus framework
+- **Hardware**: Evaluated as fp16 HuggingFace transformers on RTX 5090
 ## Runs On
 | Device | Format | Verified |