docs: add base vs tuned bench comparison
Browse files
README.md
CHANGED
|
@@ -111,3 +111,22 @@ LoRA weights: **cc-by-sa-4.0** — see License chain table above for derivation
|
|
| 111 |
## Related
|
| 112 |
|
| 113 |
See the full [Ailiance-fr LoRA collection](https://huggingface.co/Ailiance-fr).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 111 |
## Related
|
| 112 |
|
| 113 |
See the full [Ailiance-fr LoRA collection](https://huggingface.co/Ailiance-fr).
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
## Bench comparison (2026-05-11)
|
| 117 |
+
|
| 118 |
+
### Base model (Devstral-Small-2-24B-MLX-4bit) capability
|
| 119 |
+
|
| 120 |
+
| Task | Score | Notes |
|
| 121 |
+
|---|---:|---|
|
| 122 |
+
| GSM8K-CoT flex EM | **0.96** | W3 lm-eval-harness (--limit 100) |
|
| 123 |
+
| ARC-Easy acc / acc_norm | **0.80 / 0.75** | |
|
| 124 |
+
| MMLU-Pro Computer Science | **0.64** | |
|
| 125 |
+
|
| 126 |
+
Source: <https://github.com/ailiance/ailiance/tree/main/output/lm-eval-base-2026-05-11>
|
| 127 |
+
|
| 128 |
+
### This LoRA (tuned) — bench PENDING
|
| 129 |
+
|
| 130 |
+
Will include kicad-sch / iact-bench validators + W3 lm-eval delta. See spec for
|
| 131 |
+
methodology:
|
| 132 |
+
<https://github.com/ailiance/ailiance-bench/blob/main/docs/superpowers/specs/2026-05-11-kicad-sch-gap-design.md>
|