docs: add upstream base model official evaluations

77f0fd6 verified 3 days ago

6 kB

	---
	license: apache-2.0
	base_model: swiss-ai/Apertus-70B-Instruct-2509
	library_name: peft
	tags:
	- mlx
	- lora
	- peft
	- ailiance
	- apertus
	- math
	language:
	- en
	- fr
	pipeline_tag: text-generation
	---

	# Ailiance — Apertus-70B-Instruct math LoRA

	LoRA adapter fine-tuned on `swiss-ai/Apertus-70B-Instruct-2509` for math tasks.

	> Maintained by Ailiance — French AI org publishing EU AI Act aligned LoRA adapters and datasets.

	## Quick start (MLX)

	```python
	from mlx_lm import load, generate

	model, tokenizer = load(
	"swiss-ai/Apertus-70B-Instruct-2509",
	adapter_path="Ailiance-fr/apertus-math-lora",
	)

	print(generate(model, tokenizer, prompt="..."))
	```

	## Training

	\| Hyperparameter \| Value \|
	\|------------------\|------------------------\|
	\| Base model \| `swiss-ai/Apertus-70B-Instruct-2509` \|
	\| Method \| LoRA via `mlx-lm` \|
	\| Rank \| 16 \|
	\| Scale \| 2.0 \|
	\| Alpha \| 32 \|
	\| Max seq length \| 1024 \|
	\| Iterations \| 500 \|
	\| Optimizer \| Adam, LR 1e-5 \|
	\| Hardware \| Apple M3 Ultra 512 GB \|

	## Training data lineage

	Derived from the internal eu-kiki / mascarade curation. All upstream samples
	are synthetic, permissively-licensed, or generated from Apache-2.0 base resources.
	See the [Ailiance-fr catalog](https://huggingface.co/Ailiance-fr) for related cards.

	## Benchmark roadmap

	This LoRA has not yet been evaluated through `electron-bench` (the current
	pipeline supports `gemma-4-E4B` base only). Training was completed with the
	standard `mlx-lm` LoRA trainer (rank 16, alpha 32, scale 2.0, AdamW
	LR 1e-5, 500 iters) — full hyperparameters are in the `Training` table above.

	Planned evaluations:

	- Perplexity on the validation split of the training data
	- Functional benchmark on apertus-specific tasks
	- Comparison vs base `swiss-ai/Apertus-70B-Instruct-2509`

	Track progress: [ailiance-bench issues](https://github.com/ailiance/ailiance-bench/issues).

	For reference benchmarks on the `gemma-4-E4B` base, see the
	[base-vs-LoRA matrix](https://github.com/ailiance/ailiance-bench/blob/main/bench-results/compare_base_vs_lora.md).

	## License chain

	\| Component \| License \|
	\|-----------------------------------\|-------------------\|
	\| Base model (`swiss-ai/Apertus-70B-Instruct-2509`) \| apache-2.0 \|
	\| Training data (internal Ailiance curation (synthetic + permissive sources)) \| apache-2.0 \|
	\| LoRA adapter (this repo) \| apache-2.0\|

	_All upstream components are Apache 2.0 / MIT — LoRA inherits permissive terms._

	## EU AI Act compliance

	- Article 53(1)(c): training data licenses preserved (per-dataset cards declare upstream licenses).
	- Article 53(1)(d): training data summary — see upstream dataset cards on Ailiance-fr.
	- GPAI Code of Practice (July 2025): base `swiss-ai/Apertus-70B-Instruct-2509` released under apache-2.0.
	- No web scraping by Ailiance, no licensed data, no PII.
	- Upstream Stack Exchange content (where applicable) is CC-BY-SA-4.0 and propagates to this adapter.

	## License

	LoRA weights: apache-2.0 — see License chain table above for derivation rationale.

	## Citation

	```bibtex
	@misc{ailiance_apertus_math_2026,
	author = {Ailiance},
	title = {Ailiance — Apertus-70B-Instruct math LoRA},
	year = {2026},
	publisher = {Hugging Face},
	url = {https://huggingface.co/Ailiance-fr/apertus-math-lora}
	}
	```

	## Related

	See the full [Ailiance-fr LoRA collection](https://huggingface.co/Ailiance-fr).


	## Bench comparison (2026-05-11)

	### Base model (Apertus-70B-Instruct-2509) capability

	\| Task \| Score \| Notes \|
	\|---\|---:\|---\|
	\| ARC-Easy acc / acc_norm \| 0.81 / 0.77 \| W3 lm-eval-harness BF16 \|
	\| GSM8K-CoT \| TIMEOUT (1800s budget) \| base 70B BF16 too slow for CoT \|
	\| MMLU-Pro Computer Science \| TIMEOUT \| \|

	### This LoRA (tuned) — bench PENDING

	Production usage: served via gateway alias `ailiance-apertus-<domain>` on
	<https://www.ailiance.fr> through the Apertus multi-LoRA hot-swap server
	(Studio :9322, 1 base + 10 LoRA dynamic swap, ~40GB VRAM).

	## Upstream base model — official evaluations

	This LoRA fine-tunes [`swiss-ai/Apertus-70B-Instruct-2509`](https://huggingface.co/swiss-ai/Apertus-70B-Instruct-2509),
	the EU-sovereign open-source LLM released by the Swiss AI Initiative. Below are
	the official scores reported in the [Apertus Tech Report](https://arxiv.org/abs/2509.14233)
	on a suite of multilingual reasoning benchmarks.

	\| Model \| Avg \| ARC \| HellaSwag \| WinoGrande \| XNLI \| XCOPA \| PIQA \|
	\|-----------------------------\|------:\|------:\|----------:\|-----------:\|------:\|------:\|------:\|
	\| Apertus-70B (this base) \| 67.5 \| 70.6 \| 64.0 \| 73.3 \| 45.3 \| 69.8 \| 81.9 \|
	\| Apertus-8B \| 65.8 \| 72.7 \| 59.8 \| 70.6 \| 45.2 \| 66.5 \| 79.8 \|
	\| Llama3.1-70B \| 67.3 \| 74.4 \| 56.5 \| 79.4 \| 44.3 \| 66.7 \| 82.3 \|
	\| Qwen2.5-72B \| 69.8 \| 76.2 \| 67.5 \| 78.0 \| 46.9 \| 68.2 \| 82.0 \|
	\| OLMo2-32B \| 67.7 \| 76.2 \| 66.7 \| 78.6 \| 42.9 \| 60.1 \| 82.1 \|
	\| EuroLLM-9B \| 62.8 \| 67.9 \| 57.9 \| 68.8 \| 41.5 \| 61.1 \| 79.6 \|

	Many additional benchmark evaluations (pretraining/post-training phases,
	multilingual in ~100 languages, long-context) are in Section 5 of the
	[Apertus Tech Report](https://arxiv.org/abs/2509.14233).

	Source: [official Apertus-70B-Instruct-2509 model card](https://huggingface.co/swiss-ai/Apertus-70B-Instruct-2509).

	> Reading these alongside this LoRA: Apertus-70B is EU AI Act-compliant
	> (`Apertus_EU_Code_of_Practice.pdf`, `Apertus_EU_Public_Summary.pdf` included
	> in upstream weights). This LoRA inherits that compliance plus the
	> general-capability floor shown above, then adds domain specialization.