clemsail commited on
Commit
efbda71
·
verified ·
1 Parent(s): 55dfd8e

docs: add base vs tuned bench comparison

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md CHANGED
@@ -121,3 +121,22 @@ LoRA weights: **cc-by-sa-4.0** — see License chain table above for derivation
121
  ## Related
122
 
123
  See the full [Ailiance-fr LoRA collection](https://huggingface.co/Ailiance-fr).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
121
  ## Related
122
 
123
  See the full [Ailiance-fr LoRA collection](https://huggingface.co/Ailiance-fr).
124
+
125
+
126
+ ## Bench comparison (2026-05-11)
127
+
128
+ ### Base model (Devstral-Small-2-24B-MLX-4bit) capability
129
+
130
+ | Task | Score | Notes |
131
+ |---|---:|---|
132
+ | GSM8K-CoT flex EM | **0.96** | W3 lm-eval-harness (--limit 100) |
133
+ | ARC-Easy acc / acc_norm | **0.80 / 0.75** | |
134
+ | MMLU-Pro Computer Science | **0.64** | |
135
+
136
+ Source: <https://github.com/ailiance/ailiance/tree/main/output/lm-eval-base-2026-05-11>
137
+
138
+ ### This LoRA (tuned) — bench PENDING
139
+
140
+ Will include kicad-sch / iact-bench validators + W3 lm-eval delta. See spec for
141
+ methodology:
142
+ <https://github.com/ailiance/ailiance-bench/blob/main/docs/superpowers/specs/2026-05-11-kicad-sch-gap-design.md>