File size: 11,835 Bytes
8701776
 
 
 
 
 
 
 
 
05daa5c
 
969e9f4
1653793
8701776
 
969e9f4
8701776
 
 
 
 
1653793
969e9f4
1653793
 
 
 
 
 
 
8701776
969e9f4
 
1653793
 
 
 
 
 
 
 
 
 
8701776
05daa5c
 
1653793
 
 
 
 
 
 
 
 
 
 
 
 
 
 
05daa5c
969e9f4
05daa5c
1653793
 
 
05daa5c
1653793
05daa5c
1653793
05daa5c
1653793
 
 
969e9f4
1653793
05daa5c
1653793
05daa5c
1653793
05daa5c
1653793
05daa5c
1653793
05daa5c
1653793
969e9f4
1653793
969e9f4
1653793
05daa5c
1653793
969e9f4
1653793
969e9f4
1653793
 
 
 
 
 
 
 
 
 
 
 
 
969e9f4
1653793
 
 
 
 
 
 
969e9f4
 
 
1653793
969e9f4
1653793
969e9f4
1653793
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
969e9f4
 
 
1653793
05daa5c
1653793
 
 
 
 
969e9f4
 
 
1653793
8701776
 
 
 
 
 
 
 
 
 
 
 
 
 
05daa5c
8701776
 
 
 
 
 
 
 
 
969e9f4
 
1653793
8701776
1653793
 
 
 
8701776
969e9f4
8701776
1653793
8701776
 
 
 
 
 
 
 
 
 
05daa5c
1653793
 
 
05daa5c
969e9f4
 
1653793
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
---
license: apache-2.0
base_model: mistralai/Devstral-Small-2-24B-Instruct-2512
tags:
  - lora
  - peft
  - mlx
  - eu-kiki
  - eu-ai-act
  - art-52
  - art-53
  - gpai-fine-tune
  - pst-2025-07-24
language:
  - en
  - fr
library_name: peft
---

# eu-kiki-devstral-python-lora

LoRA adapter for **mistralai/Devstral-Small-2-24B-Instruct-2512**, part of the [eu-kiki](https://github.com/L-electron-Rare/eu-kiki) project. Live demo: https://ml.saillant.cc.

> **EU AI Act compliance.** This card follows the **European Commission's
> *Template for the Public Summary of Training Content* for general-purpose
> AI models** (Art. 53(1)(d) of Regulation (EU) 2024/1689, published by the
> AI Office on 2025-07-24). Section numbering and field labels reproduce
> the official template. Where this card and the official template differ
> in wording, the **official template wins** β€” see the
> [AI Office page](https://digital-strategy.ec.europa.eu/en/library/explanatory-notice-and-template-public-summary-training-content-general-purpose-ai-models).

---

# 1. General information

## 1.1. Provider identification

| Field | Value |
|---|---|
| **Provider name and contact details** | L'Γ‰lectron Rare (Saillant ClΓ©ment) β€” `clemsail` on Hugging Face β€” Issues: https://github.com/L-electron-Rare/eu-kiki/issues |
| **Authorised representative name and contact details** | Not applicable β€” provider is established within the European Union (France). |

## 1.2. Model identification

| Field | Value |
|---|---|
| **Versioned model name(s)** | `clemsail/eu-kiki-devstral-python-lora` (this LoRA adapter, v0.4.2) |
| **Model dependencies** | This is a **fine-tune (LoRA, rank 16)** of the general-purpose AI model [`mistralai/Devstral-Small-2-24B-Instruct-2512`](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512). Refer to the base-model provider's PST for the underlying training summary. |
| **Date of placement of the model on the Union market** | 2026-05-06 |

## 1.3. Modalities, overall training data size and other characteristics

| Field | Value |
|---|---|
| **Modality** | β˜’ Text  ☐ Image  ☐ Audio  ☐ Video  ☐ Other |
| **Training data size** (text bucket) | β˜’ Less than 1 billion tokens  ☐ 1 billion to 10 trillion tokens  ☐ More than 10 trillion tokens |
| **Types of content** | Instruction-tuning pairs, technical text, source code, multilingual instruction templates (EU official languages where applicable). |
| **Approximate size in alternative units** | β‰ˆ 0.6 M tokens (2 850 rows Γ— β‰ˆ 200 tokens/row, single-pass). |
| **Latest date of data acquisition / collection for model training** | 11/2024 (StarCoder2 Self-Instruct release). The model is **not** continuously trained on new data after this date. |
| **Linguistic characteristics of the overall training data** | English (primary, instruction language); French (system-prompt context). No other natural languages in training rows. |
| **Other relevant characteristics / additional comments** | LoRA fine-tune (rank 16, alpha 32, dropout 0.05); only attention projections (`q_proj`, `k_proj`, `v_proj`, `o_proj`) are trained. Per-record `_provenance` (source, SPDX licence, `record_idx`, `access_date`) attached at the system level (see [`docs/eu-ai-act-transparency.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/docs/eu-ai-act-transparency.md) Β§4.4). Tokenizer: inherited from the base model. |

---

# 2. List of data sources

## 2.1. Publicly available datasets

**Have you used publicly available datasets to train the model?** β˜’ Yes  ☐ No

**Modality(ies) of the content covered:** β˜’ Text  ☐ Image  ☐ Video  ☐ Audio  ☐ Other

**List of large publicly available datasets:**

| Dataset | URL | SPDX licence | Records | Notes |
|---|---|---|---:|---|
| StarCoder2 Self-Instruct (Python subset filtered by language keyword) | https://huggingface.co/datasets/bigcode/starcoder2-self-align | `Apache-2.0` | 2,850 | Public HF dataset; instruction-tuning pairs. |

## 2.2. Private non-publicly available datasets obtained from third parties

### 2.2.1. Datasets commercially licensed by rightsholders or their representatives

**Have you concluded transactional commercial licensing agreement(s) with rightsholder(s) or with their representatives?** ☐ Yes  β˜’ No

_(N/A β€” no commercial licensing agreements concluded.)_

### 2.2.2. Private datasets obtained from other third parties

**Have you obtained private datasets from third parties that are not licensed as described in Section 2.2.1?** ☐ Yes  β˜’ No

_(N/A β€” no private third-party datasets obtained.)_

## 2.3. Data crawled and scraped from online sources

**Were crawlers used by the provider or on behalf of?** ☐ Yes  β˜’ No

_(N/A β€” no crawler used.)_

## 2.4. User data

**Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model?** ☐ Yes  β˜’ No

**Was data collected from user interactions with the provider's other services or products used to train the model?** ☐ Yes  β˜’ No

_(N/A β€” no user data collected from any provider service or AI-model interaction is used to train this LoRA.)_

## 2.5. Synthetic data

**Was synthetic AI-generated data created by the provider or on their behalf to train the model?** ☐ Yes  β˜’ No

_(N/A β€” no synthetic AI-generated data created by the provider or on their behalf to train this LoRA.)_

## 2.6. Other sources of data

**Have data sources other than those described in Sections 2.1 to 2.5 been used to train the model?** ☐ Yes  β˜’ No

_(N/A β€” no other data sources used.)_

---

# 3. Data processing aspects

## 3.1. Respect of reservation of rights from text and data mining exception or limitation

**Are you a Signatory to the Code of Practice for general-purpose AI models that includes commitments to respect reservations of rights from the TDM exception or limitation?** ☐ Yes  β˜’ No  *(SME / individual provider; commitments equivalent in substance, see below.)*

**Measures implemented before model training to respect reservations of rights from the TDM exception or limitation:**

- **Public HF datasets (Β§2.1):** all carry permissive open licences (Apache-2.0, MIT, CC-BY-*, BSD); SPDX matrix verified per-source. The licences explicitly authorise instructional / model-training use for the rows actually selected.
- **Web-scraped sources (Β§2.3):** prior to collection the provider verified `robots.txt`, `<meta name="robots" content="noai">`, `ai.txt`, and TDM-Reservation HTTP headers. Any source returning a reservation under Article 4(3) of Directive (EU) 2019/790 was excluded from collection. Scraping was limited to authoritative vendor-controlled repositories (ESP-IDF, STM32Cube, Arduino, KiCad symbols/footprints) operating under permissive licences.
- **Vendor PDF datasheets (Β§2.2.2 where present):** processed under the EU DSM Directive Article 4 TDM exception. SHA-256 manifests and per-source legal-basis records are published in [`docs/pdf-compliance-report.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/docs/pdf-compliance-report.md).
- **Public copyright policy (Art. 53(1)(c)):** [`docs/eu-ai-act-transparency.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/docs/eu-ai-act-transparency.md). Removal requests are handled via the issue tracker on the source repository; the provider commits to remove disputed content within 30 days and re-train on the next release cycle.

## 3.2. Removal of illegal content

**General description of measures taken:**

- The provider does not crawl the open web at large; sources are restricted to curated public HF datasets and authoritative vendor repositories where the risk of illegal content (CSAM, terrorist content, IP-violating works) is structurally low.
- Personal data was screened with **Microsoft Presidio + en_core_web_lg** (2026-04-28) across all 35+ system-level domain directories. **One** email address detected in the unrelated `traduction-tech` corpus was redacted before training. Full report: `data/pii-scan-report.json`.
- No special-category data (GDPR Art. 9: health, religion, sexual orientation, etc.) was intentionally collected; the PII scan also screens for identifiers that could enable special-category inference (none flagged).
- License compatibility is enforced via per-source SPDX matrix; works under non-permissive licences are excluded.

## 3.3. Other information (optional)

- **Per-record provenance:** 49 956 system-level training records carry `_provenance.{source, license, record_idx, access_date}` fields, enabling per-record audit and removal.
- **Compute footprint:** LoRA training updates β‰ˆ 0.1–0.5 % of base-model parameters. **Estimated training compute for this LoRA β‰ͺ 10²⁡ FLOPs**, well below the systemic-risk threshold of EU AI Act Art. 51. No proprietary teacher model is used in deployed inference.
- **Risk classification:** Limited risk (Art. 52). Not deployed in safety-critical contexts.

---

# Appendix A β€” Performance evaluation (Art. 53(1)(a))

**HumanEval+** (EvalPlus official Linux scorer, 164 problems, greedy, 1 sample): base 87.20 / 82.90 β†’ +python 86.00 / 81.10. **Ξ” HE+ = βˆ’1.80 pts** vs base. Scoring on `kx6tm-23` (Proxmox PVE 6.17). Full reproducer in [`eval/results/2026-05-04/devstral-python-fused-humanevalplus/rerun.sh`](https://github.com/L-electron-Rare/eu-kiki/blob/main/eval/results/2026-05-04/devstral-python-fused-humanevalplus/).

Full bench results, methodology, env.json, and rerun.sh per measurement:
[`eval/results/SUMMARY.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/eval/results/SUMMARY.md) Β·
[`MODEL_CARD.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/MODEL_CARD.md).

---

# Appendix B β€” Usage

```python
from mlx_lm import load
from mlx_lm.tuner.utils import linear_to_lora_layers
from huggingface_hub import snapshot_download

base_path = snapshot_download("mistralai/Devstral-Small-2-24B-Instruct-2512")
adapter_path = snapshot_download("clemsail/eu-kiki-devstral-python-lora")

model, tokenizer = load(base_path)
linear_to_lora_layers(model, num_layers=32, config={"rank": 16, "alpha": 32})
model.load_weights(f"{adapter_path}/adapters.safetensors", strict=False)
```

Or fuse and serve as a self-contained checkpoint:

```bash
python -m mlx_lm fuse \
    --model mistralai/Devstral-Small-2-24B-Instruct-2512 \
    --adapter-path <adapter_path> \
    --save-path /tmp/eu-kiki-devstral-python-lora-fused \
    --dequantize
```

---

# Appendix C β€” Limitations and out-of-scope use

- Not for safety-critical decisions (medical, legal, structural, life-safety, biometric).
- Not for high-stakes individual decisions (hiring, credit, law enforcement) β€” that would re-classify under EU AI Act Art. 6 high-risk and require additional obligations.
- Hallucination present at typical instruction-tuned LLM levels; pair with a verifier or human-in-the-loop for factual outputs.
- LoRA inherits all base-model limitations (training cutoff, language coverage, refusal patterns).

---

# Appendix D β€” Citation

```bibtex
@misc{eu-kiki-2026,
  title  = {eu-kiki: EU-sovereign multi-model LLM serving with HF-traceable LoRA adapters},
  author = {Saillant, ClΓ©ment},
  year   = {2026},
  url    = {https://github.com/L-electron-Rare/eu-kiki},
  note   = {Live demo: https://ml.saillant.cc}
}
```

---

# Appendix E β€” Changelog

| Date | Card version | Change |
|---|---|---|
| 2026-05-06 | v0.4.0 | Initial HF release |
| 2026-05-06 | v0.4.1 | Self-contained EU AI Act card (per-adapter dataset table, PII statement, contact) |
| 2026-05-06 | v0.4.2 | PST-aligned (Commission template structure, Sections Β§1–4) |
| 2026-05-06 | **v0.4.3** | **PST-verbatim** β€” section labels and field names reproduced from the official Commission template (PDF 2025-07-24, English version). |