docs: add base vs tuned bench comparison
Browse files
README.md
CHANGED
|
@@ -260,3 +260,19 @@ Track progress: [ailiance-bench issues](https://github.com/ailiance/ailiance-ben
|
|
| 260 |
For reference benchmarks on the `gemma-4-E4B` base, see the
|
| 261 |
[base-vs-LoRA matrix](https://github.com/ailiance/ailiance-bench/blob/main/bench-results/compare_base_vs_lora.md).
|
| 262 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 260 |
For reference benchmarks on the `gemma-4-E4B` base, see the
|
| 261 |
[base-vs-LoRA matrix](https://github.com/ailiance/ailiance-bench/blob/main/bench-results/compare_base_vs_lora.md).
|
| 262 |
|
| 263 |
+
|
| 264 |
+
## Bench comparison (2026-05-11)
|
| 265 |
+
|
| 266 |
+
### Base model (Apertus-70B-Instruct-2509) capability
|
| 267 |
+
|
| 268 |
+
| Task | Score | Notes |
|
| 269 |
+
|---|---:|---|
|
| 270 |
+
| ARC-Easy acc / acc_norm | **0.81 / 0.77** | W3 lm-eval-harness BF16 |
|
| 271 |
+
| GSM8K-CoT | TIMEOUT (1800s budget) | base 70B BF16 too slow for CoT |
|
| 272 |
+
| MMLU-Pro Computer Science | TIMEOUT | |
|
| 273 |
+
|
| 274 |
+
### This LoRA (tuned) — bench PENDING
|
| 275 |
+
|
| 276 |
+
Production usage: served via gateway alias `ailiance-apertus-<domain>` on
|
| 277 |
+
<https://www.ailiance.fr> through the Apertus multi-LoRA hot-swap server
|
| 278 |
+
(Studio :9322, 1 base + 10 LoRA dynamic swap, ~40GB VRAM).
|