hexa-forge-code-3b-Q5_K_M-GGUF-v0.2.0-r4

GGUF Q5_K_M quantization of the v0.2.0-r4 LoRA-merged Qwen2.5-Coder-3B model.

Size: 2.07 GB (5.75 BPW) โ€” 64% smaller than F16, runs on 4 GB VRAM.

Eval baselines (STRICT, real hexa-cc compile + spec matchers)

Same r4 adapter as the F16 sibling โ€” the quantization is applied AFTER LoRA merge so the eval numbers carry over (with sub-percent Q5_K_M drift).

bench r4 (FP16 baseline) gate target
hexa-eval Mk.0.1 60.71% (17/28) โ‰ฅ80% (v1.0.0)
5-NL Mk.0.1 92% (23/25) โ‰ฅ70% โœ…

Quantization details

type:        Q5_K_M
size:        2,224,814,240 bytes (2.07 GB)
bpw:         5.75 bits per weight
src:         hexa-forge-code-3b-v0.2.0-r4.f16.gguf (6.17 GB)
tool:        llama-quantize from llama.cpp HEAD (built 2026-05-12)
quant time:  28.6 s

Inference

./llama-cli -m hexa-forge-code-3b-v0.2.0-r4.Q5_K_M.gguf \
    -p "### User:\nWrite a hexa function add(a: i32, b: i32) -> i32.\n### Assistant:\n"

Lineage

  • base: Qwen/Qwen2.5-Coder-3B
  • adapter: dancinlab/hexa-forge-code-3b-qwen2.5-lora-r16-v0.2.0-r4
  • f16 GGUF: dancinlab/hexa-forge-code-3b-GGUF-f16-v0.2.0-r4
Downloads last month
133
GGUF
Model size
3B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for dancinlab/hexa-forge-code-3b-Q5_K_M-GGUF-v0.2.0-r4

Base model

Qwen/Qwen2.5-3B
Adapter
(35)
this model