---
license: apache-2.0
base_model: mistralai/Devstral-Small-2-24B-Instruct-2512
tags:
  - lora
  - peft
  - mlx
  - ailiance
  - eu-kiki
  - eu-ai-act
  - art-52
  - art-53
  - gpai-fine-tune
  - pst-2025-07-24
language:
  - en
library_name: peft
---

# devstral-cpp-lora

LoRA adapter for **mistralai/Devstral-Small-2-24B-Instruct-2512**, part of the [ailiance](https://github.com/L-electron-Rare/ailiance) project. Live demo: https://www.ailiance.fr.

> **EU AI Act compliance.** This card follows the **European Commission's
> *Template for the Public Summary of Training Content* for general-purpose
> AI models** (Art. 53(1)(d) of Regulation (EU) 2024/1689, published by the
> AI Office on 2025-07-24). Section numbering and field labels reproduce
> the official template. Where this card and the official template differ
> in wording, the **official template wins** — see the
> [AI Office page](https://digital-strategy.ec.europa.eu/en/library/explanatory-notice-and-template-public-summary-training-content-general-purpose-ai-models).

---

# 1. General information

## 1.1. Provider identification

| Field | Value |
|---|---|
| **Provider name and contact details** | L'Électron Rare (Saillant Clément) — `clemsail` on Hugging Face — Issues: https://github.com/L-electron-Rare/ailiance/issues |
| **Authorised representative name and contact details** | Not applicable — provider is established within the European Union (France). |

## 1.2. Model identification

| Field | Value |
|---|---|
| **Versioned model name(s)** | `Ailiance-fr/devstral-cpp-lora` (this LoRA adapter, v0.4.2) |
| **Model dependencies** | This is a **fine-tune (LoRA, rank 16)** of the general-purpose AI model [`mistralai/Devstral-Small-2-24B-Instruct-2512`](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512). Refer to the base-model provider's PST for the underlying training summary. |
| **Date of placement of the model on the Union market** | 2026-05-06 |

## 1.3. Modalities, overall training data size and other characteristics

| Field | Value |
|---|---|
| **Modality** | ☒ Text  ☐ Image  ☐ Audio  ☐ Video  ☐ Other |
| **Training data size** (text bucket) | ☒ Less than 1 billion tokens  ☐ 1 billion to 10 trillion tokens  ☐ More than 10 trillion tokens |
| **Types of content** | Instruction-tuning pairs, technical text, source code, multilingual instruction templates (EU official languages where applicable). |
| **Approximate size in alternative units** | ≈ 0.6 M tokens (2 850 rows × ≈ 200 tokens/row). |
| **Latest date of data acquisition / collection for model training** | 10/2025 (last commit on scraped repos). The model is **not** continuously trained on new data after this date. |
| **Linguistic characteristics of the overall training data** | English (technical instruction language). No other natural languages. |
| **Other relevant characteristics / additional comments** | LoRA fine-tune (rank 16, alpha 32, dropout 0.05); only attention projections (`q_proj`, `k_proj`, `v_proj`, `o_proj`) are trained. Per-record `_provenance` (source, SPDX licence, `record_idx`, `access_date`) attached at the system level (see [`docs/eu-ai-act-transparency.md`](https://github.com/L-electron-Rare/ailiance/blob/main/docs/eu-ai-act-transparency.md) §4.4). Tokenizer: inherited from the base model. |

---

# 2. List of data sources

## 2.1. Publicly available datasets

**Have you used publicly available datasets to train the model?** ☒ Yes  ☐ No

**Modality(ies) of the content covered:** ☒ Text  ☐ Image  ☐ Video  ☐ Audio  ☐ Other

**List of large publicly available datasets:**

| Dataset | URL | SPDX licence | Records | Notes |
|---|---|---|---:|---|
| CommitPackFT (C/C++ subset) | https://huggingface.co/datasets/bigcode/commitpackft | `MIT` | 1,500 | Public HF dataset; real-world commit message + diff pairs. |

## 2.2. Private non-publicly available datasets obtained from third parties

### 2.2.1. Datasets commercially licensed by rightsholders or their representatives

**Have you concluded transactional commercial licensing agreement(s) with rightsholder(s) or with their representatives?** ☐ Yes  ☒ No

_(N/A — no commercial licensing agreements concluded.)_

### 2.2.2. Private datasets obtained from other third parties

**Have you obtained private datasets from third parties that are not licensed as described in Section 2.2.1?** ☐ Yes  ☒ No

_(N/A — no private third-party datasets obtained.)_

## 2.3. Data crawled and scraped from online sources

**Were crawlers used by the provider or on behalf of?** ☒ Yes  ☐ No

**Crawler name(s) / identifier(s):** custom `huggingface_hub` + `requests` Python collectors operated by the provider.

**Purposes of the crawler(s):** Acquire authoritative vendor reference code for technical training (firmware examples, EDA libraries).

**General description of crawler behaviour:** Respects `robots.txt`, `meta robots noai`, `ai.txt`, and TDM-Reservation headers. Low QPS (≤ 1 req/s). Authenticated GitHub API where available. Captchas, password-protected pages and paywalls not bypassed.

**Period of data collection:** Mixed; per-source `access_date` fields logged. Latest collection date: 10/2025.

**Comprehensive description of the type of content and online sources crawled:** Three official vendor repositories scraped via authenticated GitHub API at low QPS. Robots.txt and rate limits respected. Per-source SHA-256 manifest in `data/scraped/<source>/manifest.json`. Compliant with EU DSM Directive Art. 4 TDM exception.

**Type of modality covered:** ☒ Text  ☐ Image  ☐ Video  ☐ Audio  ☐ Other

**Summary of the most relevant domain names crawled (top 5 % / max 1 000 — SME provider):**

- `https://github.com` — github.com (espressif/esp-idf, STMicroelectronics/STM32CubeF4, arduino/Arduino) (SPDX: `Apache-2.0 (ESP-IDF) / BSD-3-Clause (STM32Cube) / CC0-1.0 (Arduino)`, ≈ 1,350 records)

## 2.4. User data

**Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model?** ☐ Yes  ☒ No

**Was data collected from user interactions with the provider's other services or products used to train the model?** ☐ Yes  ☒ No

_(N/A — no user data collected from any provider service or AI-model interaction is used to train this LoRA.)_

## 2.5. Synthetic data

**Was synthetic AI-generated data created by the provider or on their behalf to train the model?** ☐ Yes  ☒ No

_(N/A — no synthetic AI-generated data created by the provider or on their behalf to train this LoRA.)_

## 2.6. Other sources of data

**Have data sources other than those described in Sections 2.1 to 2.5 been used to train the model?** ☐ Yes  ☒ No

_(N/A — no other data sources used.)_

---

# 3. Data processing aspects

## 3.1. Respect of reservation of rights from text and data mining exception or limitation

**Are you a Signatory to the Code of Practice for general-purpose AI models that includes commitments to respect reservations of rights from the TDM exception or limitation?** ☐ Yes  ☒ No  *(SME / individual provider; commitments equivalent in substance, see below.)*

**Measures implemented before model training to respect reservations of rights from the TDM exception or limitation:**

- **Public HF datasets (§2.1):** all carry permissive open licences (Apache-2.0, MIT, CC-BY-*, BSD); SPDX matrix verified per-source. The licences explicitly authorise instructional / model-training use for the rows actually selected.
- **Web-scraped sources (§2.3):** prior to collection the provider verified `robots.txt`, `<meta name="robots" content="noai">`, `ai.txt`, and TDM-Reservation HTTP headers. Any source returning a reservation under Article 4(3) of Directive (EU) 2019/790 was excluded from collection. Scraping was limited to authoritative vendor-controlled repositories (ESP-IDF, STM32Cube, Arduino, KiCad symbols/footprints) operating under permissive licences.
- **Vendor PDF datasheets (§2.2.2 where present):** processed under the EU DSM Directive Article 4 TDM exception. SHA-256 manifests and per-source legal-basis records are published in [`docs/pdf-compliance-report.md`](https://github.com/L-electron-Rare/ailiance/blob/main/docs/pdf-compliance-report.md).
- **Public copyright policy (Art. 53(1)(c)):** [`docs/eu-ai-act-transparency.md`](https://github.com/L-electron-Rare/ailiance/blob/main/docs/eu-ai-act-transparency.md). Removal requests are handled via the issue tracker on the source repository; the provider commits to remove disputed content within 30 days and re-train on the next release cycle.

## 3.2. Removal of illegal content

**General description of measures taken:**

- The provider does not crawl the open web at large; sources are restricted to curated public HF datasets and authoritative vendor repositories where the risk of illegal content (CSAM, terrorist content, IP-violating works) is structurally low.
- Personal data was screened with **Microsoft Presidio + en_core_web_lg** (2026-04-28) across all 35+ system-level domain directories. **One** email address detected in the unrelated `traduction-tech` corpus was redacted before training. Full report: `data/pii-scan-report.json`.
- No special-category data (GDPR Art. 9: health, religion, sexual orientation, etc.) was intentionally collected; the PII scan also screens for identifiers that could enable special-category inference (none flagged).
- License compatibility is enforced via per-source SPDX matrix; works under non-permissive licences are excluded.

## 3.3. Other information (optional)

- **Per-record provenance:** 49 956 system-level training records carry `_provenance.{source, license, record_idx, access_date}` fields, enabling per-record audit and removal.
- **Compute footprint:** LoRA training updates ≈ 0.1–0.5 % of base-model parameters. **Estimated training compute for this LoRA ≪ 10²⁵ FLOPs**, well below the systemic-risk threshold of EU AI Act Art. 51. No proprietary teacher model is used in deployed inference.
- **Risk classification:** Limited risk (Art. 52). Not deployed in safety-critical contexts.

---

# Appendix A — Performance evaluation (Art. 53(1)(a))

**HumanEval** (custom Studio scorer; EvalPlus extra-tests not run — Linux-only sandbox): base 87.20 → +cpp 85.98 = **−1.22 pts**. For rigorous HumanEval+ Δ, sample re-scoring on Linux is required (samples preserved at `eval/results/2026-05-04/devstral-cpp-fused-humanevalplus/`).

Full bench results, methodology, env.json, and rerun.sh per measurement:
[`eval/results/SUMMARY.md`](https://github.com/L-electron-Rare/ailiance/blob/main/eval/results/SUMMARY.md) ·
[`MODEL_CARD.md`](https://github.com/L-electron-Rare/ailiance/blob/main/MODEL_CARD.md).

---

# Appendix B — Usage

```python
from mlx_lm import load
from mlx_lm.tuner.utils import linear_to_lora_layers
from huggingface_hub import snapshot_download

base_path = snapshot_download("mistralai/Devstral-Small-2-24B-Instruct-2512")
adapter_path = snapshot_download("Ailiance-fr/devstral-cpp-lora")

model, tokenizer = load(base_path)
linear_to_lora_layers(model, num_layers=32, config={"rank": 16, "alpha": 32})
model.load_weights(f"{adapter_path}/adapters.safetensors", strict=False)
```

Or fuse and serve as a self-contained checkpoint:

```bash
python -m mlx_lm fuse \
    --model mistralai/Devstral-Small-2-24B-Instruct-2512 \
    --adapter-path <adapter_path> \
    --save-path /tmp/devstral-cpp-lora-fused \
    --dequantize
```

---

# Appendix C — Limitations and out-of-scope use

- Not for safety-critical decisions (medical, legal, structural, life-safety, biometric).
- Not for high-stakes individual decisions (hiring, credit, law enforcement) — that would re-classify under EU AI Act Art. 6 high-risk and require additional obligations.
- Hallucination present at typical instruction-tuned LLM levels; pair with a verifier or human-in-the-loop for factual outputs.
- LoRA inherits all base-model limitations (training cutoff, language coverage, refusal patterns).

---

# Appendix D — Citation

```bibtex
@misc{eu-kiki-2026,
  title  = {eu-kiki: EU-sovereign multi-model LLM serving with HF-traceable LoRA adapters},
  author = {Saillant, Clément},
  year   = {2026},
  url    = {https://github.com/L-electron-Rare/ailiance},
  note   = {Live demo: https://www.ailiance.fr}
}
```

---

# Appendix E — Changelog

| Date | Card version | Change |
|---|---|---|
| 2026-05-06 | v0.4.0 | Initial HF release |
| 2026-05-06 | v0.4.1 | Self-contained EU AI Act card (per-adapter dataset table, PII statement, contact) |
| 2026-05-06 | v0.4.2 | PST-aligned (Commission template structure, Sections §1–4) |
| 2026-05-06 | **v0.4.3** | **PST-verbatim** — section labels and field names reproduced from the official Commission template (PDF 2025-07-24, English version). |