clemsail commited on
Commit
70fec39
Β·
verified Β·
1 Parent(s): 0ab975d

docs: PST-verbatim model card v0.4.3 (Commission template 2025-07-24)

Browse files
Files changed (1) hide show
  1. README.md +131 -152
README.md CHANGED
@@ -10,7 +10,7 @@ tags:
10
  - art-52
11
  - art-53
12
  - gpai-fine-tune
13
- - pst-aligned
14
  language:
15
  - en
16
  library_name: peft
@@ -18,180 +18,160 @@ library_name: peft
18
 
19
  # eu-kiki-devstral-cpp-lora
20
 
21
- LoRA adapter for **mistralai/Devstral-Small-2-24B-Instruct-2512**, part of the [eu-kiki](https://github.com/L-electron-Rare/eu-kiki) project β€” a 100 % EU-sovereign multi-model LLM serving pipeline.
22
 
23
- > **EU AI Act compliance posture.** This model card is structured to follow the
24
- > European Commission's *Public Summary Template* (PST) for the training content
25
- > of general-purpose AI models, published by the AI Office under
26
- > **Article 53(1)(d)** of Regulation (EU) 2024/1689. The structure below
27
- > (Sections 1–4) maps directly to the PST. Where the official template wording
28
- > differs from what is reproduced here, the **official template wins**;
29
- > please consult the
30
- > [AI Office page](https://digital-strategy.ec.europa.eu/en/policies/ai-office)
31
- > for the canonical version. This card is **PST-aligned, not PST-verbatim**.
32
 
33
  ---
34
 
35
- ## Section 1 β€” General information about the model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
  | Field | Value |
38
  |---|---|
39
- | **Model name** | `eu-kiki-devstral-cpp-lora` |
40
- | **Type** | LoRA adapter (parameter-efficient fine-tune) |
41
- | **Base model** | [`mistralai/Devstral-Small-2-24B-Instruct-2512`](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512) |
42
- | **Provider of the fine-tune** | L'Γ‰lectron Rare (Saillant ClΓ©ment), `clemsail` |
43
- | **Provider contact** | https://github.com/L-electron-Rare/eu-kiki/issues |
44
- | **Date of first public release** | 2026-05-06 |
45
- | **Latest version date** | 2026-05-06 |
46
- | **Modalities** | Text in / text out (no image, audio, or video) |
47
- | **Languages of intended use** | English |
48
- | **Risk classification (EU AI Act)** | Limited risk (Art. 52) |
49
- | **Systemic-risk class (Art. 51 / 55)** | **Not applicable** β€” this is a LoRA fine-tune, not a foundation model > 10²⁡ FLOPs |
50
- | **Foundation-model provider responsibility** | The base model provider remains the GPAI provider for the base; this card describes only the fine-tune delta |
51
 
52
  ---
53
 
54
- ## Section 2 β€” Description of training content
55
 
56
- The following four categories follow the PST four-way classification of
57
- training-content sources. **Empty categories are listed explicitly** so
58
- absence is auditable.
59
 
60
- ### 2.1 Publicly available datasets
61
 
62
- | Source | URL / Hub ID | SPDX licence | Records | Notes |
 
 
 
 
63
  |---|---|---|---:|---|
64
- | CommitPackFT (C/C++ subset) | https://huggingface.co/datasets/bigcode/commitpackft | `MIT` | 1,500 | Public HF dataset, real-world commit pairs |
65
 
66
- ### 2.2 Data obtained from third parties under licence
67
 
68
- _No third-party-licensed data used._
69
 
70
- ### 2.3 Data collected through web scraping
71
 
72
- | Source | URL / Hub ID | SPDX licence | Records | Notes |
73
- |---|---|---|---:|---|
74
- | ESP-IDF examples | https://github.com/espressif/esp-idf | `Apache-2.0` | 700 | Official Espressif repo, scraped under DSM Art. 4 TDM, robots.txt verified |
75
- | STM32Cube examples | https://github.com/STMicroelectronics/STM32CubeF4 | `BSD-3-Clause` | 450 | Official STMicroelectronics repo, scraped under DSM Art. 4 TDM |
76
- | Arduino examples | https://github.com/arduino/Arduino | `CC0-1.0` | 200 | Official Arduino repo, scraped under DSM Art. 4 TDM |
77
 
78
- ### 2.4 User-provided data and synthetic data
79
 
80
- _No user-provided or synthetic data used._
81
 
82
- ---
83
 
84
- ## Section 3 β€” Aggregate description of training content
85
 
86
- | Aggregate field | Value |
87
- |---|---|
88
- | **Total records used for this LoRA** | 2,850 |
89
- | **Domain label in the eu-kiki router** | `cpp` |
90
- | **Time-period of source data** | Mixed; per-source download dates logged in `_provenance` fields |
91
- | **Modalities in training data** | Text only |
92
- | **Languages in training data** | English |
93
- | **Estimated total tokens** | β‰ˆ 570,000 (heuristic 200 tokens / record average) |
94
-
95
- The full system-level inventory (all 35+ domains across 7 base models /
96
- candidates, β‰ˆ 82 K records, with per-source SPDX license, download dates,
97
- and `n_used` counts) is published at
98
- [`docs/eu-ai-act-transparency.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/docs/eu-ai-act-transparency.md)
99
- Β§4.4. This adapter consumes a strict subset of that inventory.
100
 
101
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
 
103
- ## Section 4 β€” Other relevant elements
104
-
105
- ### 4.1 Copyright compliance and TDM opt-out (Art. 53(1)(c))
106
-
107
- - **Public datasets (Β§2.1):** all carry permissive open-source licenses
108
- (Apache-2.0, MIT, CC-BY-*, BSD); SPDX matrix verified.
109
- - **Third-party-licensed data (Β§2.2):** vendor datasheets used under EU
110
- Directive 2019/790 (DSM Directive) **Article 4 β€” Text and Data Mining
111
- exception**. Robots.txt respected at collection time. SHA-256 manifests
112
- published at
113
- [`docs/pdf-compliance-report.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/docs/pdf-compliance-report.md).
114
- - **Scraped data (Β§2.3):** opt-out signals (robots.txt `Disallow`,
115
- `<meta name="robots" content="noai">`, TDM Reservation headers,
116
- ai.txt) are honoured at collection time. Manifests under
117
- `data/scraped/<source>/manifest.json` in the source repo.
118
- - **Removal requests:** open an issue at the source repo URL above or
119
- contact the operator listed in Β§1. We commit to remove disputed
120
- content within 30 days and re-train the adapter on the next release
121
- cycle.
122
-
123
- ### 4.2 Quality and curation
124
-
125
- - Per-record `_provenance` fields (source URL, SPDX license,
126
- `record_idx`, `access_date`) attached to 49,956 records across
127
- 21 domains (system-level), enabling per-record audit and removal.
128
- - Per-domain cap of ≀ 3 000 records applied to keep classes balanced
129
- across the routing surface.
130
- - Synthetic data (when present) is explicitly marked `source: "synthetic"`
131
- in the row provenance.
132
-
133
- ### 4.3 Personal data and PII (Art. 10 + Art. 53(1)(d))
134
-
135
- Training data scanned with **Microsoft Presidio + en_core_web_lg**
136
- (2026-04-28) across all 35+ domain directories. **One** email address
137
- detected in the unrelated `traduction-tech` corpus was redacted before
138
- training. **No high-signal PII** (email, phone, credit card, SSN, IBAN)
139
- remains in the released adapters. Low-signal Presidio detections
140
- (PERSON, LOCATION, DATE_TIME) are common false positives in technical
141
- text and were left in place. Full report:
142
- `data/pii-scan-report.json` in the source repo.
143
-
144
- ### 4.4 Special categories of personal data (GDPR Art. 9)
145
-
146
- No special-category data (health, religion, sexual orientation, etc.)
147
- was intentionally collected. The PII scan above also screens for
148
- identifiers that could lead to special-category inference; none were
149
- flagged.
150
-
151
- ### 4.5 Copyright opt-out registry
152
-
153
- The provider tracks opt-outs via the Issues tracker on the source
154
- repository. As of release date no removal requests have been received.
155
 
156
  ---
157
 
158
- ## Section 5 β€” Performance evaluation (Art. 53(1)(a))
159
 
160
- **HumanEval** (custom Studio scorer, EvalPlus extra-tests not run β€” Linux-only sandbox): base 87.20 β†’ +cpp 85.98 = **βˆ’1.22 pts**. Linux re-scoring required for rigorous Ξ” HE+.
161
 
162
- Full bench results, methodology, env.json, and rerun.sh per measurement:
163
- [`eval/results/SUMMARY.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/eval/results/SUMMARY.md) Β·
164
- [`MODEL_CARD.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/MODEL_CARD.md).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
165
 
166
  ---
167
 
168
- ## Section 6 β€” Training configuration
169
 
170
- | Parameter | Value |
171
- |---|---|
172
- | Method | LoRA |
173
- | Rank | 16 |
174
- | Alpha | 32 |
175
- | Dropout | 0.05 |
176
- | Target modules | `q_proj`, `k_proj`, `v_proj`, `o_proj` (attention only) |
177
- | Precision | BF16 |
178
- | Optimiser | AdamW |
179
- | Learning rate | 1e-5 |
180
- | Batch size Γ— grad-accum | 1 Γ— 4–8 |
181
- | Framework | MLX (`mlx_lm` fork on Apple Silicon) |
182
- | Hardware | Mac Studio M3 Ultra 512 GB unified memory |
183
-
184
- ### 6.1 Compute resources (Art. 53(1)(d))
185
-
186
- LoRA training is parameter-efficient: only β‰ˆ 0.1–0.5 % of base-model
187
- parameters are updated. **Estimated training compute β‰ͺ 10²⁡ FLOPs** β€”
188
- the systemic-risk threshold of Art. 51. Single-machine training on
189
- Mac Studio M3 Ultra; no datacentre footprint. No proprietary teacher
190
- model is used in deployed inference.
191
 
192
  ---
193
 
194
- ## Section 7 β€” Usage
195
 
196
  ```python
197
  from mlx_lm import load
@@ -218,21 +198,16 @@ python -m mlx_lm fuse \
218
 
219
  ---
220
 
221
- ## Section 8 β€” Limitations and out-of-scope use
222
 
223
- - **Not for safety-critical decisions** (medical, legal, structural,
224
- life-safety, biometric).
225
- - **Not for high-stakes individual decisions** (hiring, credit, law
226
- enforcement) β€” that would re-classify under EU AI Act Art. 6
227
- high-risk and require additional obligations.
228
- - **Hallucination present** at typical instruction-tuned LLM levels;
229
- pair with a verifier or human-in-the-loop for factual outputs.
230
- - **LoRA inherits all base-model limitations**: training cutoff,
231
- language coverage, refusal patterns.
232
 
233
  ---
234
 
235
- ## Section 9 β€” Citation
236
 
237
  ```bibtex
238
  @misc{eu-kiki-2026,
@@ -244,9 +219,13 @@ python -m mlx_lm fuse \
244
  }
245
  ```
246
 
247
- ## Section 10 β€” Changelog
 
 
248
 
249
  | Date | Card version | Change |
250
  |---|---|---|
251
- | 2026-05-06 | v0.4.1 | First HF release β€” Apache-2.0, EU AI Act self-contained model card |
252
- | 2026-05-06 | v0.4.2 | Restructured to align with Commission Public Summary Template (PST) Β§1–4; explicit empty-category disclosure; opt-out registry section added |
 
 
 
10
  - art-52
11
  - art-53
12
  - gpai-fine-tune
13
+ - pst-2025-07-24
14
  language:
15
  - en
16
  library_name: peft
 
18
 
19
  # eu-kiki-devstral-cpp-lora
20
 
21
+ LoRA adapter for **mistralai/Devstral-Small-2-24B-Instruct-2512**, part of the [eu-kiki](https://github.com/L-electron-Rare/eu-kiki) project. Live demo: https://ml.saillant.cc.
22
 
23
+ > **EU AI Act compliance.** This card follows the **European Commission's
24
+ > *Template for the Public Summary of Training Content* for general-purpose
25
+ > AI models** (Art. 53(1)(d) of Regulation (EU) 2024/1689, published by the
26
+ > AI Office on 2025-07-24). Section numbering and field labels reproduce
27
+ > the official template. Where this card and the official template differ
28
+ > in wording, the **official template wins** β€” see the
29
+ > [AI Office page](https://digital-strategy.ec.europa.eu/en/library/explanatory-notice-and-template-public-summary-training-content-general-purpose-ai-models).
 
 
30
 
31
  ---
32
 
33
+ # 1. General information
34
+
35
+ ## 1.1. Provider identification
36
+
37
+ | Field | Value |
38
+ |---|---|
39
+ | **Provider name and contact details** | L'Γ‰lectron Rare (Saillant ClΓ©ment) β€” `clemsail` on Hugging Face β€” Issues: https://github.com/L-electron-Rare/eu-kiki/issues |
40
+ | **Authorised representative name and contact details** | Not applicable β€” provider is established within the European Union (France). |
41
+
42
+ ## 1.2. Model identification
43
+
44
+ | Field | Value |
45
+ |---|---|
46
+ | **Versioned model name(s)** | `clemsail/eu-kiki-devstral-cpp-lora` (this LoRA adapter, v0.4.2) |
47
+ | **Model dependencies** | This is a **fine-tune (LoRA, rank 16)** of the general-purpose AI model [`mistralai/Devstral-Small-2-24B-Instruct-2512`](https://huggingface.co/mistralai/Devstral-Small-2-24B-Instruct-2512). Refer to the base-model provider's PST for the underlying training summary. |
48
+ | **Date of placement of the model on the Union market** | 2026-05-06 |
49
+
50
+ ## 1.3. Modalities, overall training data size and other characteristics
51
 
52
  | Field | Value |
53
  |---|---|
54
+ | **Modality** | β˜’ Text ☐ Image ☐ Audio ☐ Video ☐ Other |
55
+ | **Training data size** (text bucket) | β˜’ Less than 1 billion tokens ☐ 1 billion to 10 trillion tokens ☐ More than 10 trillion tokens |
56
+ | **Types of content** | Instruction-tuning pairs, technical text, source code, multilingual instruction templates (EU official languages where applicable). |
57
+ | **Approximate size in alternative units** | β‰ˆ 0.6 M tokens (2 850 rows Γ— β‰ˆ 200 tokens/row). |
58
+ | **Latest date of data acquisition / collection for model training** | 10/2025 (last commit on scraped repos). The model is **not** continuously trained on new data after this date. |
59
+ | **Linguistic characteristics of the overall training data** | English (technical instruction language). No other natural languages. |
60
+ | **Other relevant characteristics / additional comments** | LoRA fine-tune (rank 16, alpha 32, dropout 0.05); only attention projections (`q_proj`, `k_proj`, `v_proj`, `o_proj`) are trained. Per-record `_provenance` (source, SPDX licence, `record_idx`, `access_date`) attached at the system level (see [`docs/eu-ai-act-transparency.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/docs/eu-ai-act-transparency.md) Β§4.4). Tokenizer: inherited from the base model. |
 
 
 
 
 
61
 
62
  ---
63
 
64
+ # 2. List of data sources
65
 
66
+ ## 2.1. Publicly available datasets
 
 
67
 
68
+ **Have you used publicly available datasets to train the model?** β˜’ Yes ☐ No
69
 
70
+ **Modality(ies) of the content covered:** β˜’ Text ☐ Image ☐ Video ☐ Audio ☐ Other
71
+
72
+ **List of large publicly available datasets:**
73
+
74
+ | Dataset | URL | SPDX licence | Records | Notes |
75
  |---|---|---|---:|---|
76
+ | CommitPackFT (C/C++ subset) | https://huggingface.co/datasets/bigcode/commitpackft | `MIT` | 1,500 | Public HF dataset; real-world commit message + diff pairs. |
77
 
78
+ ## 2.2. Private non-publicly available datasets obtained from third parties
79
 
80
+ ### 2.2.1. Datasets commercially licensed by rightsholders or their representatives
81
 
82
+ **Have you concluded transactional commercial licensing agreement(s) with rightsholder(s) or with their representatives?** ☐ Yes β˜’ No
83
 
84
+ _(N/A β€” no commercial licensing agreements concluded.)_
 
 
 
 
85
 
86
+ ### 2.2.2. Private datasets obtained from other third parties
87
 
88
+ **Have you obtained private datasets from third parties that are not licensed as described in Section 2.2.1?** ☐ Yes β˜’ No
89
 
90
+ _(N/A β€” no private third-party datasets obtained.)_
91
 
92
+ ## 2.3. Data crawled and scraped from online sources
93
 
94
+ **Were crawlers used by the provider or on behalf of?** β˜’ Yes ☐ No
 
 
 
 
 
 
 
 
 
 
 
 
 
95
 
96
+ **Crawler name(s) / identifier(s):** custom `huggingface_hub` + `requests` Python collectors operated by the provider.
97
+
98
+ **Purposes of the crawler(s):** Acquire authoritative vendor reference code for technical training (firmware examples, EDA libraries).
99
+
100
+ **General description of crawler behaviour:** Respects `robots.txt`, `meta robots noai`, `ai.txt`, and TDM-Reservation headers. Low QPS (≀ 1 req/s). Authenticated GitHub API where available. Captchas, password-protected pages and paywalls not bypassed.
101
+
102
+ **Period of data collection:** Mixed; per-source `access_date` fields logged. Latest collection date: 10/2025.
103
+
104
+ **Comprehensive description of the type of content and online sources crawled:** Three official vendor repositories scraped via authenticated GitHub API at low QPS. Robots.txt and rate limits respected. Per-source SHA-256 manifest in `data/scraped/<source>/manifest.json`. Compliant with EU DSM Directive Art. 4 TDM exception.
105
+
106
+ **Type of modality covered:** β˜’ Text ☐ Image ☐ Video ☐ Audio ☐ Other
107
+
108
+ **Summary of the most relevant domain names crawled (top 5 % / max 1 000 β€” SME provider):**
109
+
110
+ - `https://github.com` β€” github.com (espressif/esp-idf, STMicroelectronics/STM32CubeF4, arduino/Arduino) (SPDX: `Apache-2.0 (ESP-IDF) / BSD-3-Clause (STM32Cube) / CC0-1.0 (Arduino)`, β‰ˆ 1,350 records)
111
+
112
+ ## 2.4. User data
113
+
114
+ **Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model?** ☐ Yes β˜’ No
115
 
116
+ **Was data collected from user interactions with the provider's other services or products used to train the model?** ☐ Yes β˜’ No
117
+
118
+ _(N/A β€” no user data collected from any provider service or AI-model interaction is used to train this LoRA.)_
119
+
120
+ ## 2.5. Synthetic data
121
+
122
+ **Was synthetic AI-generated data created by the provider or on their behalf to train the model?** ☐ Yes β˜’ No
123
+
124
+ _(N/A β€” no synthetic AI-generated data created by the provider or on their behalf to train this LoRA.)_
125
+
126
+ ## 2.6. Other sources of data
127
+
128
+ **Have data sources other than those described in Sections 2.1 to 2.5 been used to train the model?** ☐ Yes β˜’ No
129
+
130
+ _(N/A β€” no other data sources used.)_
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
131
 
132
  ---
133
 
134
+ # 3. Data processing aspects
135
 
136
+ ## 3.1. Respect of reservation of rights from text and data mining exception or limitation
137
 
138
+ **Are you a Signatory to the Code of Practice for general-purpose AI models that includes commitments to respect reservations of rights from the TDM exception or limitation?** ☐ Yes β˜’ No *(SME / individual provider; commitments equivalent in substance, see below.)*
139
+
140
+ **Measures implemented before model training to respect reservations of rights from the TDM exception or limitation:**
141
+
142
+ - **Public HF datasets (Β§2.1):** all carry permissive open licences (Apache-2.0, MIT, CC-BY-*, BSD); SPDX matrix verified per-source. The licences explicitly authorise instructional / model-training use for the rows actually selected.
143
+ - **Web-scraped sources (Β§2.3):** prior to collection the provider verified `robots.txt`, `<meta name="robots" content="noai">`, `ai.txt`, and TDM-Reservation HTTP headers. Any source returning a reservation under Article 4(3) of Directive (EU) 2019/790 was excluded from collection. Scraping was limited to authoritative vendor-controlled repositories (ESP-IDF, STM32Cube, Arduino, KiCad symbols/footprints) operating under permissive licences.
144
+ - **Vendor PDF datasheets (Β§2.2.2 where present):** processed under the EU DSM Directive Article 4 TDM exception. SHA-256 manifests and per-source legal-basis records are published in [`docs/pdf-compliance-report.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/docs/pdf-compliance-report.md).
145
+ - **Public copyright policy (Art. 53(1)(c)):** [`docs/eu-ai-act-transparency.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/docs/eu-ai-act-transparency.md). Removal requests are handled via the issue tracker on the source repository; the provider commits to remove disputed content within 30 days and re-train on the next release cycle.
146
+
147
+ ## 3.2. Removal of illegal content
148
+
149
+ **General description of measures taken:**
150
+
151
+ - The provider does not crawl the open web at large; sources are restricted to curated public HF datasets and authoritative vendor repositories where the risk of illegal content (CSAM, terrorist content, IP-violating works) is structurally low.
152
+ - Personal data was screened with **Microsoft Presidio + en_core_web_lg** (2026-04-28) across all 35+ system-level domain directories. **One** email address detected in the unrelated `traduction-tech` corpus was redacted before training. Full report: `data/pii-scan-report.json`.
153
+ - No special-category data (GDPR Art. 9: health, religion, sexual orientation, etc.) was intentionally collected; the PII scan also screens for identifiers that could enable special-category inference (none flagged).
154
+ - License compatibility is enforced via per-source SPDX matrix; works under non-permissive licences are excluded.
155
+
156
+ ## 3.3. Other information (optional)
157
+
158
+ - **Per-record provenance:** 49 956 system-level training records carry `_provenance.{source, license, record_idx, access_date}` fields, enabling per-record audit and removal.
159
+ - **Compute footprint:** LoRA training updates β‰ˆ 0.1–0.5 % of base-model parameters. **Estimated training compute for this LoRA β‰ͺ 10²⁡ FLOPs**, well below the systemic-risk threshold of EU AI Act Art. 51. No proprietary teacher model is used in deployed inference.
160
+ - **Risk classification:** Limited risk (Art. 52). Not deployed in safety-critical contexts.
161
 
162
  ---
163
 
164
+ # Appendix A β€” Performance evaluation (Art. 53(1)(a))
165
 
166
+ **HumanEval** (custom Studio scorer; EvalPlus extra-tests not run β€” Linux-only sandbox): base 87.20 β†’ +cpp 85.98 = **βˆ’1.22 pts**. For rigorous HumanEval+ Ξ”, sample re-scoring on Linux is required (samples preserved at `eval/results/2026-05-04/devstral-cpp-fused-humanevalplus/`).
167
+
168
+ Full bench results, methodology, env.json, and rerun.sh per measurement:
169
+ [`eval/results/SUMMARY.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/eval/results/SUMMARY.md) Β·
170
+ [`MODEL_CARD.md`](https://github.com/L-electron-Rare/eu-kiki/blob/main/MODEL_CARD.md).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
171
 
172
  ---
173
 
174
+ # Appendix B β€” Usage
175
 
176
  ```python
177
  from mlx_lm import load
 
198
 
199
  ---
200
 
201
+ # Appendix C β€” Limitations and out-of-scope use
202
 
203
+ - Not for safety-critical decisions (medical, legal, structural, life-safety, biometric).
204
+ - Not for high-stakes individual decisions (hiring, credit, law enforcement) β€” that would re-classify under EU AI Act Art. 6 high-risk and require additional obligations.
205
+ - Hallucination present at typical instruction-tuned LLM levels; pair with a verifier or human-in-the-loop for factual outputs.
206
+ - LoRA inherits all base-model limitations (training cutoff, language coverage, refusal patterns).
 
 
 
 
 
207
 
208
  ---
209
 
210
+ # Appendix D β€” Citation
211
 
212
  ```bibtex
213
  @misc{eu-kiki-2026,
 
219
  }
220
  ```
221
 
222
+ ---
223
+
224
+ # Appendix E β€” Changelog
225
 
226
  | Date | Card version | Change |
227
  |---|---|---|
228
+ | 2026-05-06 | v0.4.0 | Initial HF release |
229
+ | 2026-05-06 | v0.4.1 | Self-contained EU AI Act card (per-adapter dataset table, PII statement, contact) |
230
+ | 2026-05-06 | v0.4.2 | PST-aligned (Commission template structure, Sections Β§1–4) |
231
+ | 2026-05-06 | **v0.4.3** | **PST-verbatim** β€” section labels and field names reproduced from the official Commission template (PDF 2025-07-24, English version). |