docs: add Benchmark / Training metrics section
Browse files
README.md
CHANGED
|
@@ -56,6 +56,24 @@ print(generate(model, tokenizer, prompt="..."))
|
|
| 56 |
|
| 57 |
For per-sample provenance and attribution status, consult the dataset card.
|
| 58 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
## License chain
|
| 60 |
|
| 61 |
| Component | License |
|
|
|
|
| 56 |
|
| 57 |
For per-sample provenance and attribution status, consult the dataset card.
|
| 58 |
|
| 59 |
+
## Benchmark roadmap
|
| 60 |
+
|
| 61 |
+
This LoRA has **not yet been evaluated** through `electron-bench` (the current
|
| 62 |
+
pipeline supports `gemma-4-E4B` base only). Training was completed with the
|
| 63 |
+
standard `mlx-lm` LoRA trainer (rank 16, alpha 32, scale 2.0, AdamW
|
| 64 |
+
LR 1e-5, 500 iters) — full hyperparameters are in the `Training` table above.
|
| 65 |
+
|
| 66 |
+
Planned evaluations:
|
| 67 |
+
|
| 68 |
+
- Perplexity on the validation split of the training data
|
| 69 |
+
- Functional benchmark on **devstral**-specific tasks
|
| 70 |
+
- Comparison vs base `mistralai/Devstral-Small-2-24B-Instruct-2512`
|
| 71 |
+
|
| 72 |
+
Track progress: [ailiance-bench issues](https://github.com/ailiance/ailiance-bench/issues).
|
| 73 |
+
|
| 74 |
+
For reference benchmarks on the `gemma-4-E4B` base, see the
|
| 75 |
+
[base-vs-LoRA matrix](https://github.com/ailiance/ailiance-bench/blob/main/bench-results/compare_base_vs_lora.md).
|
| 76 |
+
|
| 77 |
## License chain
|
| 78 |
|
| 79 |
| Component | License |
|