docs: link to ailiance-bench v0.2 scoreboard

5438d85 verified 3 days ago

12.1 kB

	---
	license: apache-2.0
	base_model: mistralai/Devstral-Small-2-24B-Instruct-2512
	tags:
	- lora
	- peft
	- mlx
	- ailiance
	- ailiance
	- eu-ai-act
	- art-52
	- art-53
	- gpai-fine-tune
	- pst-2025-07-24
	language:
	- en
	- fr
	library_name: peft
	---

	# devstral-python-lora

	LoRA adapter for mistralai/Devstral-Small-2-24B-Instruct-2512, part of the [ailiance](https://github.com/ailiance/ailiance) project. Live demo: https://www.ailiance.fr.

	> EU AI Act compliance. This card follows the **European Commission's
	> Template for the Public Summary of Training Content for general-purpose
	> AI models** (Art. 53(1)(d) of Regulation (EU) 2024/1689, published by the
	> AI Office on 2025-07-24). Section numbering and field labels reproduce
	> the official template. Where this card and the official template differ
	> in wording, the official template wins — see the
	> [AI Office page](https://digital-strategy.ec.europa.eu/en/library/explanatory-notice-and-template-public-summary-training-content-general-purpose-ai-models).

	---

	# 1. General information

	## 1.1. Provider identification

	\| Field \| Value \|
	\|---\|---\|
	\| Provider name and contact details \| Ailiance (Saillant Clément) — `clemsail` on Hugging Face — Issues: https://github.com/ailiance/ailiance/issues \|
	\| Authorised representative name and contact details \| Not applicable — provider is established within the European Union (France). \|

	## 1.2. Model identification

	\| Field \| Value \|
	\|---\|---\|
	\| Versioned model name(s) \| `Ailiance-fr/devstral-python-lora` (this LoRA adapter, v0.4.2) \|
	\| Model dependencies \| This is a fine-tune (LoRA, rank 16) of the general-purpose AI model [`mistralai/Devstral-Small-2-24B-Instruct-2512`](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512). Refer to the base-model provider's PST for the underlying training summary. \|
	\| Date of placement of the model on the Union market \| 2026-05-06 \|

	## 1.3. Modalities, overall training data size and other characteristics

	\| Field \| Value \|
	\|---\|---\|
	\| Modality \| ☒ Text ☐ Image ☐ Audio ☐ Video ☐ Other \|
	\| Training data size (text bucket) \| ☒ Less than 1 billion tokens ☐ 1 billion to 10 trillion tokens ☐ More than 10 trillion tokens \|
	\| Types of content \| Instruction-tuning pairs, technical text, source code, multilingual instruction templates (EU official languages where applicable). \|
	\| Approximate size in alternative units \| ≈ 0.6 M tokens (2 850 rows × ≈ 200 tokens/row, single-pass). \|
	\| Latest date of data acquisition / collection for model training \| 11/2024 (StarCoder2 Self-Instruct release). The model is not continuously trained on new data after this date. \|
	\| Linguistic characteristics of the overall training data \| English (primary, instruction language); French (system-prompt context). No other natural languages in training rows. \|
	\| Other relevant characteristics / additional comments \| LoRA fine-tune (rank 16, alpha 32, dropout 0.05); only attention projections (`q_proj`, `k_proj`, `v_proj`, `o_proj`) are trained. Per-record `_provenance` (source, SPDX licence, `record_idx`, `access_date`) attached at the system level (see [`docs/eu-ai-act-transparency.md`](https://github.com/ailiance/ailiance/blob/main/docs/eu-ai-act-transparency.md) §4.4). Tokenizer: inherited from the base model. \|

	---

	# 2. List of data sources

	## 2.1. Publicly available datasets

	Have you used publicly available datasets to train the model? ☒ Yes ☐ No

	Modality(ies) of the content covered: ☒ Text ☐ Image ☐ Video ☐ Audio ☐ Other

	List of large publicly available datasets:

	\| Dataset \| URL \| SPDX licence \| Records \| Notes \|
	\|---\|---\|---\|---:\|---\|
	\| StarCoder2 Self-Instruct (Python subset filtered by language keyword) \| https://huggingface.co/datasets/bigcode/starcoder2-self-align \| `Apache-2.0` \| 2,850 \| Public HF dataset; instruction-tuning pairs. \|

	## 2.2. Private non-publicly available datasets obtained from third parties

	### 2.2.1. Datasets commercially licensed by rightsholders or their representatives

	Have you concluded transactional commercial licensing agreement(s) with rightsholder(s) or with their representatives? ☐ Yes ☒ No

	_(N/A — no commercial licensing agreements concluded.)_

	### 2.2.2. Private datasets obtained from other third parties

	Have you obtained private datasets from third parties that are not licensed as described in Section 2.2.1? ☐ Yes ☒ No

	_(N/A — no private third-party datasets obtained.)_

	## 2.3. Data crawled and scraped from online sources

	Were crawlers used by the provider or on behalf of? ☐ Yes ☒ No

	_(N/A — no crawler used.)_

	## 2.4. User data

	Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model? ☐ Yes ☒ No

	Was data collected from user interactions with the provider's other services or products used to train the model? ☐ Yes ☒ No

	_(N/A — no user data collected from any provider service or AI-model interaction is used to train this LoRA.)_

	## 2.5. Synthetic data

	Was synthetic AI-generated data created by the provider or on their behalf to train the model? ☐ Yes ☒ No

	_(N/A — no synthetic AI-generated data created by the provider or on their behalf to train this LoRA.)_

	## 2.6. Other sources of data

	Have data sources other than those described in Sections 2.1 to 2.5 been used to train the model? ☐ Yes ☒ No

	_(N/A — no other data sources used.)_

	---

	# 3. Data processing aspects

	## 3.1. Respect of reservation of rights from text and data mining exception or limitation

	Are you a Signatory to the Code of Practice for general-purpose AI models that includes commitments to respect reservations of rights from the TDM exception or limitation? ☐ Yes ☒ No (SME / individual provider; commitments equivalent in substance, see below.)

	Measures implemented before model training to respect reservations of rights from the TDM exception or limitation:

	- Public HF datasets (§2.1): all carry permissive open licences (Apache-2.0, MIT, CC-BY-*, BSD); SPDX matrix verified per-source. The licences explicitly authorise instructional / model-training use for the rows actually selected.
	- Web-scraped sources (§2.3): prior to collection the provider verified `robots.txt`, `<meta name="robots" content="noai">`, `ai.txt`, and TDM-Reservation HTTP headers. Any source returning a reservation under Article 4(3) of Directive (EU) 2019/790 was excluded from collection. Scraping was limited to authoritative vendor-controlled repositories (ESP-IDF, STM32Cube, Arduino, KiCad symbols/footprints) operating under permissive licences.
	- Vendor PDF datasheets (§2.2.2 where present): processed under the EU DSM Directive Article 4 TDM exception. SHA-256 manifests and per-source legal-basis records are published in [`docs/pdf-compliance-report.md`](https://github.com/ailiance/ailiance/blob/main/docs/pdf-compliance-report.md).
	- Public copyright policy (Art. 53(1)(c)): [`docs/eu-ai-act-transparency.md`](https://github.com/ailiance/ailiance/blob/main/docs/eu-ai-act-transparency.md). Removal requests are handled via the issue tracker on the source repository; the provider commits to remove disputed content within 30 days and re-train on the next release cycle.

	## 3.2. Removal of illegal content

	General description of measures taken:

	- The provider does not crawl the open web at large; sources are restricted to curated public HF datasets and authoritative vendor repositories where the risk of illegal content (CSAM, terrorist content, IP-violating works) is structurally low.
	- Personal data was screened with Microsoft Presidio + en_core_web_lg (2026-04-28) across all 35+ system-level domain directories. One email address detected in the unrelated `traduction-tech` corpus was redacted before training. Full report: `data/pii-scan-report.json`.
	- No special-category data (GDPR Art. 9: health, religion, sexual orientation, etc.) was intentionally collected; the PII scan also screens for identifiers that could enable special-category inference (none flagged).
	- License compatibility is enforced via per-source SPDX matrix; works under non-permissive licences are excluded.

	## 3.3. Other information (optional)

	- Per-record provenance: 49 956 system-level training records carry `_provenance.{source, license, record_idx, access_date}` fields, enabling per-record audit and removal.
	- Compute footprint: LoRA training updates ≈ 0.1–0.5 % of base-model parameters. Estimated training compute for this LoRA ≪ 10²⁵ FLOPs, well below the systemic-risk threshold of EU AI Act Art. 51. No proprietary teacher model is used in deployed inference.
	- Risk classification: Limited risk (Art. 52). Not deployed in safety-critical contexts.

	---

	# Appendix A — Performance evaluation (Art. 53(1)(a))

	HumanEval+ (EvalPlus official Linux scorer, 164 problems, greedy, 1 sample): base 87.20 / 82.90 → +python 86.00 / 81.10. Δ HE+ = −1.80 pts vs base. Scoring on `kx6tm-23` (Proxmox PVE 6.17). Full reproducer in [`eval/results/2026-05-04/devstral-python-fused-humanevalplus/rerun.sh`](https://github.com/ailiance/ailiance/blob/main/eval/results/2026-05-04/devstral-python-fused-humanevalplus/).

	Full bench results, methodology, env.json, and rerun.sh per measurement:
	[`eval/results/SUMMARY.md`](https://github.com/ailiance/ailiance/blob/main/eval/results/SUMMARY.md) ·
	[`MODEL_CARD.md`](https://github.com/ailiance/ailiance/blob/main/MODEL_CARD.md).

	---

	# Appendix B — Usage

	```python
	from mlx_lm import load
	from mlx_lm.tuner.utils import linear_to_lora_layers
	from huggingface_hub import snapshot_download

	base_path = snapshot_download("mistralai/Devstral-Small-2-24B-Instruct-2512")
	adapter_path = snapshot_download("Ailiance-fr/devstral-python-lora")

	model, tokenizer = load(base_path)
	linear_to_lora_layers(model, num_layers=32, config={"rank": 16, "alpha": 32})
	model.load_weights(f"{adapter_path}/adapters.safetensors", strict=False)
	```

	Or fuse and serve as a self-contained checkpoint:

	```bash
	python -m mlx_lm fuse \
	--model mistralai/Devstral-Small-2-24B-Instruct-2512 \
	--adapter-path <adapter_path> \
	--save-path /tmp/devstral-python-lora-fused \
	--dequantize
	```

	---

	# Appendix C — Limitations and out-of-scope use

	- Not for safety-critical decisions (medical, legal, structural, life-safety, biometric).
	- Not for high-stakes individual decisions (hiring, credit, law enforcement) — that would re-classify under EU AI Act Art. 6 high-risk and require additional obligations.
	- Hallucination present at typical instruction-tuned LLM levels; pair with a verifier or human-in-the-loop for factual outputs.
	- LoRA inherits all base-model limitations (training cutoff, language coverage, refusal patterns).

	---

	# Appendix D — Citation

	```bibtex
	@misc{ailiance-2026,
	title = {ailiance: EU-sovereign multi-model LLM serving with HF-traceable LoRA adapters},
	author = {Saillant, Clément},
	year = {2026},
	url = {https://github.com/ailiance/ailiance},
	note = {Live demo: https://www.ailiance.fr}
	}
	```

	---

	# Appendix E — Changelog

	\| Date \| Card version \| Change \|
	\|---\|---\|---\|
	\| 2026-05-06 \| v0.4.0 \| Initial HF release \|
	\| 2026-05-06 \| v0.4.1 \| Self-contained EU AI Act card (per-adapter dataset table, PII statement, contact) \|
	\| 2026-05-06 \| v0.4.2 \| PST-aligned (Commission template structure, Sections §1–4) \|
	\| 2026-05-06 \| v0.4.3 \| PST-verbatim — section labels and field names reproduced from the official Commission template (PDF 2025-07-24, English version). \|

	## Validated in `ailiance/ailiance-bench` v0.2

	This model is referenced in the [Ailiance benchmark suite](https://github.com/ailiance/ailiance-bench)
	(Phase 6 scoreboard, 7-task hardware-design evaluation).

	See the full scoreboard:
	[ailiance-bench README#scoreboard-lora-phase-6](https://github.com/ailiance/ailiance-bench#scoreboard-lora-phase-6--2026-05-11).