File size: 9,687 Bytes
8701776
 
 
 
 
 
 
 
 
05daa5c
 
969e9f4
 
8701776
 
969e9f4
8701776
 
 
 
 
969e9f4
 
 
 
 
 
 
 
 
 
 
8701776
969e9f4
 
 
8701776
05daa5c
 
969e9f4
 
05daa5c
969e9f4
 
 
 
 
 
 
 
 
05daa5c
969e9f4
05daa5c
969e9f4
05daa5c
969e9f4
 
 
05daa5c
969e9f4
05daa5c
969e9f4
 
 
05daa5c
969e9f4
05daa5c
969e9f4
05daa5c
969e9f4
05daa5c
969e9f4
05daa5c
969e9f4
 
 
 
 
05daa5c
969e9f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
05daa5c
 
 
 
969e9f4
 
 
 
 
 
 
 
 
 
 
 
 
 
05daa5c
969e9f4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
05daa5c
 
 
 
 
 
 
969e9f4
05daa5c
 
 
 
 
 
8701776
969e9f4
 
 
 
 
 
 
 
 
 
 
8701776
 
 
 
 
 
 
 
 
 
 
 
 
 
05daa5c
8701776
 
 
 
 
 
 
 
 
969e9f4
 
 
8701776
05daa5c
 
 
 
 
 
 
969e9f4
 
8701776
969e9f4
8701776
969e9f4
8701776
 
 
 
 
 
 
 
 
 
05daa5c
969e9f4
05daa5c
969e9f4
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
---
license: apache-2.0
base_model: mistralai/Devstral-Small-2-24B-Instruct-2512
tags:
  - lora
  - peft
  - mlx
  - eu-kiki
  - eu-ai-act
  - art-52
  - art-53
  - gpai-fine-tune
  - pst-aligned
language:
  - en
  - fr
library_name: peft
---

# eu-kiki-devstral-python-lora

LoRA adapter for **mistralai/Devstral-Small-2-24B-Instruct-2512**, part of the [eu-kiki](https://github.com/L-electron-Rare/eu-kiki) project — a 100 % EU-sovereign multi-model LLM serving pipeline.

> **EU AI Act compliance posture.** This model card is structured to follow the
> European Commission's *Public Summary Template* (PST) for the training content
> of general-purpose AI models, published by the AI Office under
> **Article 53(1)(d)** of Regulation (EU) 2024/1689. The structure below
> (Sections 1–4) maps directly to the PST. Where the official template wording
> differs from what is reproduced here, the **official template wins**;
> please consult the
> [AI Office page](https://digital-strategy.ec.europa.eu/en/policies/ai-office)
> for the canonical version. This card is **PST-aligned, not PST-verbatim**.

---

## Section 1 — General information about the model

| Field | Value |
|---|---|
| **Model name** | `eu-kiki-devstral-python-lora` |
| **Type** | LoRA adapter (parameter-efficient fine-tune) |
| **Base model** | [`mistralai/Devstral-Small-2-24B-Instruct-2512`](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512) |
| **Provider of the fine-tune** | L'Électron Rare (Saillant Clément), `clemsail` |
| **Provider contact** | https://github.com/L-electron-Rare/eu-kiki/issues |
| **Date of first public release** | 2026-05-06 |
| **Latest version date** | 2026-05-06 |
| **Modalities** | Text in / text out (no image, audio, or video) |
| **Languages of intended use** | English, French |
| **Risk classification (EU AI Act)** | Limited risk (Art. 52) |
| **Systemic-risk class (Art. 51 / 55)** | **Not applicable** — this is a LoRA fine-tune, not a foundation model > 10²⁵ FLOPs |
| **Foundation-model provider responsibility** | The base model provider remains the GPAI provider for the base; this card describes only the fine-tune delta |

---

## Section 2 — Description of training content

The following four categories follow the PST four-way classification of
training-content sources. **Empty categories are listed explicitly** so
absence is auditable.

### 2.1 Publicly available datasets

| Source | URL / Hub ID | SPDX licence | Records | Notes |
|---|---|---|---:|---|
| StarCoder2 Self-Instruct (Python subset) | https://huggingface.co/datasets/bigcode/starcoder2-self-align | `Apache-2.0` | 2,850 | Public HF dataset, Python instruction-tuning pairs |

### 2.2 Data obtained from third parties under licence

_No third-party-licensed data used._

### 2.3 Data collected through web scraping

_No web-scraped data used._

### 2.4 User-provided data and synthetic data

_No user-provided or synthetic data used._

---

## Section 3 — Aggregate description of training content

| Aggregate field | Value |
|---|---|
| **Total records used for this LoRA** | 2,850 |
| **Domain label in the eu-kiki router** | `python` |
| **Time-period of source data** | Mixed; per-source download dates logged in `_provenance` fields |
| **Modalities in training data** | Text only |
| **Languages in training data** | English, French |
| **Estimated total tokens** | ≈ 570,000 (heuristic 200 tokens / record average) |

The full system-level inventory (all 35+ domains across 7 base models /
candidates, ≈ 82 K records, with per-source SPDX license, download dates,
and `n_used` counts) is published at
[`docs/eu-ai-act-transparency.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/docs/eu-ai-act-transparency.md)
§4.4. This adapter consumes a strict subset of that inventory.

---

## Section 4 — Other relevant elements

### 4.1 Copyright compliance and TDM opt-out (Art. 53(1)(c))

- **Public datasets (§2.1):** all carry permissive open-source licenses
  (Apache-2.0, MIT, CC-BY-*, BSD); SPDX matrix verified.
- **Third-party-licensed data (§2.2):** vendor datasheets used under EU
  Directive 2019/790 (DSM Directive) **Article 4 — Text and Data Mining
  exception**. Robots.txt respected at collection time. SHA-256 manifests
  published at
  [`docs/pdf-compliance-report.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/docs/pdf-compliance-report.md).
- **Scraped data (§2.3):** opt-out signals (robots.txt `Disallow`,
  `<meta name="robots" content="noai">`, TDM Reservation headers,
  ai.txt) are honoured at collection time. Manifests under
  `data/scraped/<source>/manifest.json` in the source repo.
- **Removal requests:** open an issue at the source repo URL above or
  contact the operator listed in §1. We commit to remove disputed
  content within 30 days and re-train the adapter on the next release
  cycle.

### 4.2 Quality and curation

- Per-record `_provenance` fields (source URL, SPDX license,
  `record_idx`, `access_date`) attached to 49,956 records across
  21 domains (system-level), enabling per-record audit and removal.
- Per-domain cap of ≤ 3 000 records applied to keep classes balanced
  across the routing surface.
- Synthetic data (when present) is explicitly marked `source: "synthetic"`
  in the row provenance.

### 4.3 Personal data and PII (Art. 10 + Art. 53(1)(d))

Training data scanned with **Microsoft Presidio + en_core_web_lg**
(2026-04-28) across all 35+ domain directories. **One** email address
detected in the unrelated `traduction-tech` corpus was redacted before
training. **No high-signal PII** (email, phone, credit card, SSN, IBAN)
remains in the released adapters. Low-signal Presidio detections
(PERSON, LOCATION, DATE_TIME) are common false positives in technical
text and were left in place. Full report:
`data/pii-scan-report.json` in the source repo.

### 4.4 Special categories of personal data (GDPR Art. 9)

No special-category data (health, religion, sexual orientation, etc.)
was intentionally collected. The PII scan above also screens for
identifiers that could lead to special-category inference; none were
flagged.

### 4.5 Copyright opt-out registry

The provider tracks opt-outs via the Issues tracker on the source
repository. As of release date no removal requests have been received.

---

## Section 5 — Performance evaluation (Art. 53(1)(a))

**HumanEval+** (Linux EvalPlus, 164 problems, greedy, 1 sample): base 87.20 / 82.90 → fused +python 86.00 / 81.10. **Δ HE+ = −1.80 pts** vs base. Scoring on `kx6tm-23` (Proxmox PVE 6.17, EvalPlus official sandbox).

Full bench results, methodology, env.json, and rerun.sh per measurement:
[`eval/results/SUMMARY.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/eval/results/SUMMARY.md) ·
[`MODEL_CARD.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/MODEL_CARD.md).

---

## Section 6 — Training configuration

| Parameter | Value |
|---|---|
| Method | LoRA |
| Rank | 16 |
| Alpha | 32 |
| Dropout | 0.05 |
| Target modules | `q_proj`, `k_proj`, `v_proj`, `o_proj` (attention only) |
| Precision | BF16 |
| Optimiser | AdamW |
| Learning rate | 1e-5 |
| Batch size × grad-accum | 1 × 4–8 |
| Framework | MLX (`mlx_lm` fork on Apple Silicon) |
| Hardware | Mac Studio M3 Ultra 512 GB unified memory |

### 6.1 Compute resources (Art. 53(1)(d))

LoRA training is parameter-efficient: only ≈ 0.1–0.5 % of base-model
parameters are updated. **Estimated training compute ≪ 10²⁵ FLOPs**
the systemic-risk threshold of Art. 51. Single-machine training on
Mac Studio M3 Ultra; no datacentre footprint. No proprietary teacher
model is used in deployed inference.

---

## Section 7 — Usage

```python
from mlx_lm import load
from mlx_lm.tuner.utils import linear_to_lora_layers
from huggingface_hub import snapshot_download

base_path = snapshot_download("mistralai/Devstral-Small-2-24B-Instruct-2512")
adapter_path = snapshot_download("clemsail/eu-kiki-devstral-python-lora")

model, tokenizer = load(base_path)
linear_to_lora_layers(model, num_layers=32, config={"rank": 16, "alpha": 32})
model.load_weights(f"{adapter_path}/adapters.safetensors", strict=False)
```

Or fuse and serve as a self-contained checkpoint:

```bash
python -m mlx_lm fuse \
    --model mistralai/Devstral-Small-2-24B-Instruct-2512 \
    --adapter-path <adapter_path> \
    --save-path /tmp/eu-kiki-devstral-python-lora-fused \
    --dequantize
```

---

## Section 8 — Limitations and out-of-scope use

- **Not for safety-critical decisions** (medical, legal, structural,
  life-safety, biometric).
- **Not for high-stakes individual decisions** (hiring, credit, law
  enforcement) — that would re-classify under EU AI Act Art. 6
  high-risk and require additional obligations.
- **Hallucination present** at typical instruction-tuned LLM levels;
  pair with a verifier or human-in-the-loop for factual outputs.
- **LoRA inherits all base-model limitations**: training cutoff,
  language coverage, refusal patterns.

---

## Section 9 — Citation

```bibtex
@misc{eu-kiki-2026,
  title  = {eu-kiki: EU-sovereign multi-model LLM serving with HF-traceable LoRA adapters},
  author = {Saillant, Clément},
  year   = {2026},
  url    = {https://github.com/L-electron-Rare/eu-kiki},
  note   = {Live demo: https://ml.saillant.cc}
}
```

## Section 10 — Changelog

| Date | Card version | Change |
|---|---|---|
| 2026-05-06 | v0.4.1 | First HF release — Apache-2.0, EU AI Act self-contained model card |
| 2026-05-06 | v0.4.2 | Restructured to align with Commission Public Summary Template (PST) §1–4; explicit empty-category disclosure; opt-out registry section added |