| --- |
| license: apache-2.0 |
| base_model: mistralai/Devstral-Small-2-24B-Instruct-2512 |
| tags: |
| - lora |
| - peft |
| - mlx |
| - ailiance |
| - ailiance |
| - eu-ai-act |
| - art-52 |
| - art-53 |
| - gpai-fine-tune |
| - pst-2025-07-24 |
| language: |
| - en |
| - fr |
| library_name: peft |
| --- |
| |
| # devstral-python-lora |
|
|
| LoRA adapter for **mistralai/Devstral-Small-2-24B-Instruct-2512**, part of the [ailiance](https://github.com/ailiance/ailiance) project. Live demo: https://www.ailiance.fr. |
|
|
| > **EU AI Act compliance.** This card follows the **European Commission's |
| > *Template for the Public Summary of Training Content* for general-purpose |
| > AI models** (Art. 53(1)(d) of Regulation (EU) 2024/1689, published by the |
| > AI Office on 2025-07-24). Section numbering and field labels reproduce |
| > the official template. Where this card and the official template differ |
| > in wording, the **official template wins** β see the |
| > [AI Office page](https://digital-strategy.ec.europa.eu/en/library/explanatory-notice-and-template-public-summary-training-content-general-purpose-ai-models). |
|
|
| --- |
|
|
| # 1. General information |
|
|
| ## 1.1. Provider identification |
|
|
| | Field | Value | |
| |---|---| |
| | **Provider name and contact details** | Ailiance (Saillant ClΓ©ment) β `clemsail` on Hugging Face β Issues: https://github.com/ailiance/ailiance/issues | |
| | **Authorised representative name and contact details** | Not applicable β provider is established within the European Union (France). | |
|
|
| ## 1.2. Model identification |
|
|
| | Field | Value | |
| |---|---| |
| | **Versioned model name(s)** | `Ailiance-fr/devstral-python-lora` (this LoRA adapter, v0.4.2) | |
| | **Model dependencies** | This is a **fine-tune (LoRA, rank 16)** of the general-purpose AI model [`mistralai/Devstral-Small-2-24B-Instruct-2512`](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512). Refer to the base-model provider's PST for the underlying training summary. | |
| | **Date of placement of the model on the Union market** | 2026-05-06 | |
|
|
| ## 1.3. Modalities, overall training data size and other characteristics |
|
|
| | Field | Value | |
| |---|---| |
| | **Modality** | β Text β Image β Audio β Video β Other | |
| | **Training data size** (text bucket) | β Less than 1 billion tokens β 1 billion to 10 trillion tokens β More than 10 trillion tokens | |
| | **Types of content** | Instruction-tuning pairs, technical text, source code, multilingual instruction templates (EU official languages where applicable). | |
| | **Approximate size in alternative units** | β 0.6 M tokens (2 850 rows Γ β 200 tokens/row, single-pass). | |
| | **Latest date of data acquisition / collection for model training** | 11/2024 (StarCoder2 Self-Instruct release). The model is **not** continuously trained on new data after this date. | |
| | **Linguistic characteristics of the overall training data** | English (primary, instruction language); French (system-prompt context). No other natural languages in training rows. | |
| | **Other relevant characteristics / additional comments** | LoRA fine-tune (rank 16, alpha 32, dropout 0.05); only attention projections (`q_proj`, `k_proj`, `v_proj`, `o_proj`) are trained. Per-record `_provenance` (source, SPDX licence, `record_idx`, `access_date`) attached at the system level (see [`docs/eu-ai-act-transparency.md`](https://github.com/ailiance/ailiance/blob/main/docs/eu-ai-act-transparency.md) Β§4.4). Tokenizer: inherited from the base model. | |
|
|
| --- |
|
|
| # 2. List of data sources |
|
|
| ## 2.1. Publicly available datasets |
|
|
| **Have you used publicly available datasets to train the model?** β Yes β No |
|
|
| **Modality(ies) of the content covered:** β Text β Image β Video β Audio β Other |
|
|
| **List of large publicly available datasets:** |
|
|
| | Dataset | URL | SPDX licence | Records | Notes | |
| |---|---|---|---:|---| |
| | StarCoder2 Self-Instruct (Python subset filtered by language keyword) | https://huggingface.co/datasets/bigcode/starcoder2-self-align | `Apache-2.0` | 2,850 | Public HF dataset; instruction-tuning pairs. | |
|
|
| ## 2.2. Private non-publicly available datasets obtained from third parties |
|
|
| ### 2.2.1. Datasets commercially licensed by rightsholders or their representatives |
|
|
| **Have you concluded transactional commercial licensing agreement(s) with rightsholder(s) or with their representatives?** β Yes β No |
|
|
| _(N/A β no commercial licensing agreements concluded.)_ |
|
|
| ### 2.2.2. Private datasets obtained from other third parties |
|
|
| **Have you obtained private datasets from third parties that are not licensed as described in Section 2.2.1?** β Yes β No |
|
|
| _(N/A β no private third-party datasets obtained.)_ |
|
|
| ## 2.3. Data crawled and scraped from online sources |
|
|
| **Were crawlers used by the provider or on behalf of?** β Yes β No |
|
|
| _(N/A β no crawler used.)_ |
|
|
| ## 2.4. User data |
|
|
| **Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model?** β Yes β No |
|
|
| **Was data collected from user interactions with the provider's other services or products used to train the model?** β Yes β No |
|
|
| _(N/A β no user data collected from any provider service or AI-model interaction is used to train this LoRA.)_ |
|
|
| ## 2.5. Synthetic data |
|
|
| **Was synthetic AI-generated data created by the provider or on their behalf to train the model?** β Yes β No |
|
|
| _(N/A β no synthetic AI-generated data created by the provider or on their behalf to train this LoRA.)_ |
|
|
| ## 2.6. Other sources of data |
|
|
| **Have data sources other than those described in Sections 2.1 to 2.5 been used to train the model?** β Yes β No |
|
|
| _(N/A β no other data sources used.)_ |
|
|
| --- |
|
|
| # 3. Data processing aspects |
|
|
| ## 3.1. Respect of reservation of rights from text and data mining exception or limitation |
|
|
| **Are you a Signatory to the Code of Practice for general-purpose AI models that includes commitments to respect reservations of rights from the TDM exception or limitation?** β Yes β No *(SME / individual provider; commitments equivalent in substance, see below.)* |
|
|
| **Measures implemented before model training to respect reservations of rights from the TDM exception or limitation:** |
|
|
| - **Public HF datasets (Β§2.1):** all carry permissive open licences (Apache-2.0, MIT, CC-BY-*, BSD); SPDX matrix verified per-source. The licences explicitly authorise instructional / model-training use for the rows actually selected. |
| - **Web-scraped sources (Β§2.3):** prior to collection the provider verified `robots.txt`, `<meta name="robots" content="noai">`, `ai.txt`, and TDM-Reservation HTTP headers. Any source returning a reservation under Article 4(3) of Directive (EU) 2019/790 was excluded from collection. Scraping was limited to authoritative vendor-controlled repositories (ESP-IDF, STM32Cube, Arduino, KiCad symbols/footprints) operating under permissive licences. |
| - **Vendor PDF datasheets (Β§2.2.2 where present):** processed under the EU DSM Directive Article 4 TDM exception. SHA-256 manifests and per-source legal-basis records are published in [`docs/pdf-compliance-report.md`](https://github.com/ailiance/ailiance/blob/main/docs/pdf-compliance-report.md). |
| - **Public copyright policy (Art. 53(1)(c)):** [`docs/eu-ai-act-transparency.md`](https://github.com/ailiance/ailiance/blob/main/docs/eu-ai-act-transparency.md). Removal requests are handled via the issue tracker on the source repository; the provider commits to remove disputed content within 30 days and re-train on the next release cycle. |
| |
| ## 3.2. Removal of illegal content |
| |
| **General description of measures taken:** |
| |
| - The provider does not crawl the open web at large; sources are restricted to curated public HF datasets and authoritative vendor repositories where the risk of illegal content (CSAM, terrorist content, IP-violating works) is structurally low. |
| - Personal data was screened with **Microsoft Presidio + en_core_web_lg** (2026-04-28) across all 35+ system-level domain directories. **One** email address detected in the unrelated `traduction-tech` corpus was redacted before training. Full report: `data/pii-scan-report.json`. |
| - No special-category data (GDPR Art. 9: health, religion, sexual orientation, etc.) was intentionally collected; the PII scan also screens for identifiers that could enable special-category inference (none flagged). |
| - License compatibility is enforced via per-source SPDX matrix; works under non-permissive licences are excluded. |
| |
| ## 3.3. Other information (optional) |
| |
| - **Per-record provenance:** 49 956 system-level training records carry `_provenance.{source, license, record_idx, access_date}` fields, enabling per-record audit and removal. |
| - **Compute footprint:** LoRA training updates β 0.1β0.5 % of base-model parameters. **Estimated training compute for this LoRA βͺ 10Β²β΅ FLOPs**, well below the systemic-risk threshold of EU AI Act Art. 51. No proprietary teacher model is used in deployed inference. |
| - **Risk classification:** Limited risk (Art. 52). Not deployed in safety-critical contexts. |
| |
| --- |
| |
| # Appendix A β Performance evaluation (Art. 53(1)(a)) |
| |
| **HumanEval+** (EvalPlus official Linux scorer, 164 problems, greedy, 1 sample): base 87.20 / 82.90 β +python 86.00 / 81.10. **Ξ HE+ = β1.80 pts** vs base. Scoring on `kx6tm-23` (Proxmox PVE 6.17). Full reproducer in [`eval/results/2026-05-04/devstral-python-fused-humanevalplus/rerun.sh`](https://github.com/ailiance/ailiance/blob/main/eval/results/2026-05-04/devstral-python-fused-humanevalplus/). |
| |
| Full bench results, methodology, env.json, and rerun.sh per measurement: |
| [`eval/results/SUMMARY.md`](https://github.com/ailiance/ailiance/blob/main/eval/results/SUMMARY.md) Β· |
| [`MODEL_CARD.md`](https://github.com/ailiance/ailiance/blob/main/MODEL_CARD.md). |
| |
| --- |
| |
| # Appendix B β Usage |
| |
| ```python |
| from mlx_lm import load |
| from mlx_lm.tuner.utils import linear_to_lora_layers |
| from huggingface_hub import snapshot_download |
| |
| base_path = snapshot_download("mistralai/Devstral-Small-2-24B-Instruct-2512") |
| adapter_path = snapshot_download("Ailiance-fr/devstral-python-lora") |
| |
| model, tokenizer = load(base_path) |
| linear_to_lora_layers(model, num_layers=32, config={"rank": 16, "alpha": 32}) |
| model.load_weights(f"{adapter_path}/adapters.safetensors", strict=False) |
| ``` |
| |
| Or fuse and serve as a self-contained checkpoint: |
| |
| ```bash |
| python -m mlx_lm fuse \ |
| --model mistralai/Devstral-Small-2-24B-Instruct-2512 \ |
| --adapter-path <adapter_path> \ |
| --save-path /tmp/devstral-python-lora-fused \ |
| --dequantize |
| ``` |
| |
| --- |
| |
| # Appendix C β Limitations and out-of-scope use |
| |
| - Not for safety-critical decisions (medical, legal, structural, life-safety, biometric). |
| - Not for high-stakes individual decisions (hiring, credit, law enforcement) β that would re-classify under EU AI Act Art. 6 high-risk and require additional obligations. |
| - Hallucination present at typical instruction-tuned LLM levels; pair with a verifier or human-in-the-loop for factual outputs. |
| - LoRA inherits all base-model limitations (training cutoff, language coverage, refusal patterns). |
| |
| --- |
| |
| # Appendix D β Citation |
| |
| ```bibtex |
| @misc{ailiance-2026, |
| title = {ailiance: EU-sovereign multi-model LLM serving with HF-traceable LoRA adapters}, |
| author = {Saillant, ClΓ©ment}, |
| year = {2026}, |
| url = {https://github.com/ailiance/ailiance}, |
| note = {Live demo: https://www.ailiance.fr} |
| } |
| ``` |
| |
| --- |
| |
| # Appendix E β Changelog |
| |
| | Date | Card version | Change | |
| |---|---|---| |
| | 2026-05-06 | v0.4.0 | Initial HF release | |
| | 2026-05-06 | v0.4.1 | Self-contained EU AI Act card (per-adapter dataset table, PII statement, contact) | |
| | 2026-05-06 | v0.4.2 | PST-aligned (Commission template structure, Sections Β§1β4) | |
| | 2026-05-06 | **v0.4.3** | **PST-verbatim** β section labels and field names reproduced from the official Commission template (PDF 2025-07-24, English version). | |
| |
| ## Validated in `ailiance/ailiance-bench` v0.2 |
| |
| This model is referenced in the [Ailiance benchmark suite](https://github.com/ailiance/ailiance-bench) |
| (Phase 6 scoreboard, 7-task hardware-design evaluation). |
| |
| See the full scoreboard: |
| [ailiance-bench README#scoreboard-lora-phase-6](https://github.com/ailiance/ailiance-bench#scoreboard-lora-phase-6--2026-05-11). |
| |