Text Generation
MLX
Safetensors
GGUF
Rust
qwen3_5_text
4b
agentic-coding
android
apple-silicon
attested
bash
c
chain-of-custody
chinese
code
code-completion
code-generation
code-infill
coder
coding
consumer-gpu
cpp
cryptographically-verified
css
delta-forge
edge-inference
embedded
english
forge-alloy
function-calling
ggml
go
html
iphone
java
javascript
kotlin
llama-cpp
lm-studio
local-inference
macbook
mobile
multilingual
ollama
on-device
php
programming
python
q4-k-m
quantized
qwen
qwen3
qwen3.5
raspberry-pi
reproducible
ruby
software-engineering
sql
swift
typescript
Add GGUF Q4_K_M benchmark results (HumanEval 53.0%, HumanEval+ 47.0%)
Browse files
README.md
CHANGED
|
@@ -62,6 +62,7 @@ The architecture co-evolves with training: heads that contribute to the domain s
|
|
| 62 |
| Qwen2.5-Coder-3B | 3B | ~31% | — |
|
| 63 |
| Phi-2 | 2.7B | 47.6% | — |
|
| 64 |
| **qwen3.5-4b-code-forged** | **3.4B** | **57.3%** | **49.4%** |
|
|
|
|
| 65 |
|
| 66 |
**+20% above Phi-2, +82% above StarCoder2-3B** in the sub-5B class.
|
| 67 |
|
|
@@ -69,6 +70,8 @@ The architecture co-evolves with training: heads that contribute to the domain s
|
|
| 69 |
- **HumanEval+**: 49.4% pass@1 (81/164 base + extra tests)
|
| 70 |
- **Method**: Greedy decoding (temperature 0), single sample, EvalPlus framework
|
| 71 |
- **Hardware**: Evaluated as fp16 HuggingFace transformers on RTX 5090
|
|
|
|
|
|
|
| 72 |
|
| 73 |
## Runs On
|
| 74 |
|
|
@@ -76,6 +79,8 @@ The architecture co-evolves with training: heads that contribute to the domain s
|
|
| 76 |
|--------|--------|----------|
|
| 77 |
| MacBook Pro 16GB | fp16 | Yes |
|
| 78 |
| MacBook Pro 32GB | fp16 | Yes |
|
|
|
|
|
|
|
| 79 |
|
| 80 |
These models are designed for **consumer hardware**. No A100s required. Your MacBook, your gaming PC, your home server.
|
| 81 |
|
|
|
|
| 62 |
| Qwen2.5-Coder-3B | 3B | ~31% | — |
|
| 63 |
| Phi-2 | 2.7B | 47.6% | — |
|
| 64 |
| **qwen3.5-4b-code-forged** | **3.4B** | **57.3%** | **49.4%** |
|
| 65 |
+
| **qwen3.5-4b-code-forged Q4_K_M** | **2.6GB** | **53.0%** | **47.0%** |
|
| 66 |
|
| 67 |
**+20% above Phi-2, +82% above StarCoder2-3B** in the sub-5B class.
|
| 68 |
|
|
|
|
| 70 |
- **HumanEval+**: 49.4% pass@1 (81/164 base + extra tests)
|
| 71 |
- **Method**: Greedy decoding (temperature 0), single sample, EvalPlus framework
|
| 72 |
- **Hardware**: Evaluated as fp16 HuggingFace transformers on RTX 5090
|
| 73 |
+
- **GGUF Q4_K_M**: 53.0% / 47.0% — only -4.3 points (7.5% relative drop from fp16)
|
| 74 |
+
- **GGUF evaluated via**: llama-cpp-python on RTX 5090
|
| 75 |
|
| 76 |
## Runs On
|
| 77 |
|
|
|
|
| 79 |
|--------|--------|----------|
|
| 80 |
| MacBook Pro 16GB | fp16 | Yes |
|
| 81 |
| MacBook Pro 32GB | fp16 | Yes |
|
| 82 |
+
| RTX 5090 | GGUF Q4_K_M | Yes (HumanEval 53.0%) |
|
| 83 |
+
| MacBook Pro M1 | GGUF Q4_K_M | Yes (llama.cpp Metal) |
|
| 84 |
|
| 85 |
These models are designed for **consumer hardware**. No A100s required. Your MacBook, your gaming PC, your home server.
|
| 86 |
|