clemsail commited on
Commit
86df231
·
verified ·
1 Parent(s): 65d5c18

docs: add base vs tuned bench comparison

Browse files
Files changed (1) hide show
  1. README.md +16 -0
README.md CHANGED
@@ -260,3 +260,19 @@ Track progress: [ailiance-bench issues](https://github.com/ailiance/ailiance-ben
260
  For reference benchmarks on the `gemma-4-E4B` base, see the
261
  [base-vs-LoRA matrix](https://github.com/ailiance/ailiance-bench/blob/main/bench-results/compare_base_vs_lora.md).
262
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
260
  For reference benchmarks on the `gemma-4-E4B` base, see the
261
  [base-vs-LoRA matrix](https://github.com/ailiance/ailiance-bench/blob/main/bench-results/compare_base_vs_lora.md).
262
 
263
+
264
+ ## Bench comparison (2026-05-11)
265
+
266
+ ### Base model (Apertus-70B-Instruct-2509) capability
267
+
268
+ | Task | Score | Notes |
269
+ |---|---:|---|
270
+ | ARC-Easy acc / acc_norm | **0.81 / 0.77** | W3 lm-eval-harness BF16 |
271
+ | GSM8K-CoT | TIMEOUT (1800s budget) | base 70B BF16 too slow for CoT |
272
+ | MMLU-Pro Computer Science | TIMEOUT | |
273
+
274
+ ### This LoRA (tuned) — bench PENDING
275
+
276
+ Production usage: served via gateway alias `ailiance-apertus-<domain>` on
277
+ <https://www.ailiance.fr> through the Apertus multi-LoRA hot-swap server
278
+ (Studio :9322, 1 base + 10 LoRA dynamic swap, ~40GB VRAM).