docs: add upstream base model official evaluations

c371f3c verified 2 days ago

6.49 kB

	---
	license: apache-2.0
	base_model: mistralai/Devstral-Small-2-24B-Instruct-2512
	library_name: peft
	tags:
	- mlx
	- lora
	- peft
	- ailiance
	- devstral
	- cpp
	language:
	- en
	- fr
	pipeline_tag: text-generation
	---

	# Ailiance — Devstral-Small-2-24B-Instruct cpp LoRA

	LoRA adapter fine-tuned on `mistralai/Devstral-Small-2-24B-Instruct-2512` for cpp tasks.

	> Maintained by Ailiance — French AI org publishing EU AI Act aligned LoRA adapters and datasets.

	## Quick start (MLX)

	```python
	from mlx_lm import load, generate

	model, tokenizer = load(
	"mistralai/Devstral-Small-2-24B-Instruct-2512",
	adapter_path="Ailiance-fr/devstral-cpp-lora",
	)

	print(generate(model, tokenizer, prompt="..."))
	```

	## Training

	\| Hyperparameter \| Value \|
	\|------------------\|------------------------\|
	\| Base model \| `mistralai/Devstral-Small-2-24B-Instruct-2512` \|
	\| Method \| LoRA via `mlx-lm` \|
	\| Rank \| 16 \|
	\| Scale \| 2.0 \|
	\| Alpha \| 32 \|
	\| Max seq length \| 2048 \|
	\| Iterations \| 500 \|
	\| Optimizer \| Adam, LR 1e-5 \|
	\| Hardware \| Apple M3 Ultra 512 GB \|

	## Training data lineage

	Derived from the internal eu-kiki / mascarade curation. All upstream samples
	are synthetic, permissively-licensed, or generated from Apache-2.0 base resources.
	See the [Ailiance-fr catalog](https://huggingface.co/Ailiance-fr) for related cards.

	## Training metrics

	Extracted from training log (`batch_eu_kiki_v2.log`):

	\| Metric \| Value \|
	\|---\|---:\|
	\| Final train loss \| 0.603 \|
	\| Final validation loss \| 0.401 \|
	\| Val loss reduction \| +1.779 (from 2.180) \|
	\| Iterations completed \| 500 \|
	\| Trainable parameters \| 0.224% (279.708M / 125025.989M) \|

	> Validation loss is measured every 200 iterations on a held-out split of the
	> training corpus (`val_batches=5`, `mlx-lm` LoRA trainer).

	## Benchmark on production tasks

	This LoRA has not yet been evaluated through the
	[`electron-bench`](https://github.com/ailiance/ailiance-bench/blob/main) functional benchmark
	pipeline. The current pipeline targets the `gemma-4-E4B` base only; support for
	the devstral base is on the roadmap
	([open issues](https://github.com/ailiance/ailiance-bench/issues)).

	For a comparable reference matrix on a related domain (electronics, embedded,
	KiCad), see the Gemma champions:

	\| Adapter \| Highlights \|
	\|---\|---\|
	\| [`Ailiance-fr/gemma-4-E4B-eukiki-lora`](https://huggingface.co/Ailiance-fr/gemma-4-E4B-eukiki-lora) \| +55 P1-DSL, +42 P1-PCB, +25 SPICE, +38 P3 \|
	\| [`Ailiance-fr/gemma-4-E4B-mascarade-lora`](https://huggingface.co/Ailiance-fr/gemma-4-E4B-mascarade-lora) \| +48 P3 extraction \|

	Full base-vs-LoRA matrix: [`compare_base_vs_lora.md`](https://github.com/ailiance/ailiance-bench/blob/main/bench-results/compare_base_vs_lora.md).

	## License chain

	\| Component \| License \|
	\|-----------------------------------\|-------------------\|
	\| Base model (`mistralai/Devstral-Small-2-24B-Instruct-2512`) \| apache-2.0 \|
	\| Training data (internal Ailiance curation (synthetic + permissive sources)) \| apache-2.0 \|
	\| LoRA adapter (this repo) \| apache-2.0\|

	_All upstream components are Apache 2.0 / MIT — LoRA inherits permissive terms._

	## EU AI Act compliance

	- Article 53(1)(c): training data licenses preserved (per-dataset cards declare upstream licenses).
	- Article 53(1)(d): training data summary — see upstream dataset cards on Ailiance-fr.
	- GPAI Code of Practice (July 2025): base `mistralai/Devstral-Small-2-24B-Instruct-2512` released under apache-2.0.
	- No web scraping by Ailiance, no licensed data, no PII.
	- Upstream Stack Exchange content (where applicable) is CC-BY-SA-4.0 and propagates to this adapter.

	## License

	LoRA weights: apache-2.0 — see License chain table above for derivation rationale.

	## Citation

	```bibtex
	@misc{ailiance_devstral_cpp_2026,
	author = {Ailiance},
	title = {Ailiance — Devstral-Small-2-24B-Instruct cpp LoRA},
	year = {2026},
	publisher = {Hugging Face},
	url = {https://huggingface.co/Ailiance-fr/devstral-cpp-lora}
	}
	```

	## Related

	See the full [Ailiance-fr LoRA collection](https://huggingface.co/Ailiance-fr).


	## Bench comparison (2026-05-11)

	### Base model (Devstral-Small-2-24B-MLX-4bit) capability

	\| Task \| Score \| Notes \|
	\|---\|---:\|---\|
	\| GSM8K-CoT flex EM \| 0.96 \| W3 lm-eval-harness (--limit 100) \|
	\| ARC-Easy acc / acc_norm \| 0.80 / 0.75 \| \|
	\| MMLU-Pro Computer Science \| 0.64 \| \|

	Source: <https://github.com/ailiance/ailiance/tree/main/output/lm-eval-base-2026-05-11>

	### This LoRA (tuned) — bench PENDING

	Will include kicad-sch / iact-bench validators + W3 lm-eval delta. See spec for
	methodology:
	<https://github.com/ailiance/ailiance-bench/blob/main/docs/superpowers/specs/2026-05-11-kicad-sch-gap-design.md>

	## Upstream base model — official evaluations

	This LoRA fine-tunes [`mistralai/Devstral-Small-2-24B-Instruct-2512`](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512),
	Mistral's coding-specialist LLM. Headline software-engineering benchmarks
	from the upstream model card:

	\| Benchmark \| Devstral Small 2 (24B) \| Devstral 2 (123B) \| DeepSeek v3.2 (671B) \| Claude Sonnet 4.5 \|
	\|--------------------------\|-----------------------:\|------------------:\|---------------------:\|------------------:\|
	\| SWE Bench Verified \| 68.0 % \| 72.2 % \| 73.1 % \| 77.2 % \|
	\| SWE Bench Multilingual \| 55.7 % \| 61.3 % \| 70.2 % \| 68.0 % \|
	\| Terminal Bench 2 \| 22.5 % \| 32.6 % \| 46.4 % \| 42.8 % \|

	(For reference, GPT-5.1 Codex High: 73.7 % SWE Verified · 52.8 % Terminal Bench 2.)

	Devstral Small 2 (24B) is competitive with much larger open models on
	SWE Bench Verified (e.g. matches GLM-4.6 at 355B). Architecture uses
	rope-scaling per Llama 4 + Scalable-Softmax ([arXiv:2501.19399](https://arxiv.org/abs/2501.19399)).

	Source: [official Devstral-Small-2-24B-Instruct-2512 model card](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512).

	> Reading these alongside this LoRA: Devstral Small 2 is a strong
	> coding base. This LoRA inherits its SWE-Bench performance and adds
	> language- or domain-specific specialization.