pradeep-xpert commited on
Commit
e520bf1
·
verified ·
1 Parent(s): ae24bcb

Initial release: vulnerability_class baseline + comprehensive 8-oracle-path leakage diagnostic on CYB009 sample

Browse files
README.md ADDED
@@ -0,0 +1,511 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ library_name: pytorch
4
+ tags:
5
+ - cybersecurity
6
+ - vulnerability-management
7
+ - cve
8
+ - cvss
9
+ - epss
10
+ - cisa-kev
11
+ - tabular-classification
12
+ - synthetic-data
13
+ - xgboost
14
+ - baseline
15
+ - leakage-diagnostic
16
+ - data-quality-audit
17
+ pipeline_tag: tabular-classification
18
+ base_model: []
19
+ datasets:
20
+ - xpertsystems/cyb009-sample
21
+ metrics:
22
+ - accuracy
23
+ - f1
24
+ - roc_auc
25
+ model-index:
26
+ - name: cyb009-baseline-classifier
27
+ results:
28
+ - task:
29
+ type: tabular-classification
30
+ name: 8-class vulnerability classification (CWE-style families)
31
+ dataset:
32
+ type: xpertsystems/cyb009-sample
33
+ name: CYB009 Synthetic Vulnerability Intelligence Dataset (Sample)
34
+ metrics:
35
+ - type: roc_auc
36
+ value: 0.6837
37
+ name: Test macro ROC-AUC OvR (XGBoost, seed 42)
38
+ - type: accuracy
39
+ value: 0.2374
40
+ name: Test accuracy (XGBoost, seed 42)
41
+ - type: f1
42
+ value: 0.2244
43
+ name: Test macro-F1 (XGBoost, seed 42)
44
+ - type: accuracy
45
+ value: 0.244
46
+ name: Multi-seed accuracy mean ± 0.023 (XGBoost, 10 seeds)
47
+ - type: roc_auc
48
+ value: 0.687
49
+ name: Multi-seed ROC-AUC mean ± 0.014 (XGBoost, 10 seeds)
50
+ ---
51
+
52
+ # CYB009 Baseline Classifier
53
+
54
+ **Vulnerability classification baseline (8-class) trained on the CYB009
55
+ synthetic vulnerability intelligence sample. The primary artifact value
56
+ of this repo is `leakage_diagnostic.json` — the most comprehensive
57
+ structural-leakage audit in the XpertSystems baseline catalog,
58
+ documenting 8 oracle paths and 6 unlearnable README-suggested targets.
59
+ The classifier itself is the catalog's weakest baseline by design (acc
60
+ 0.244 vs majority 0.176), included to show that vulnerability_class is
61
+ the ONLY README-headline target that learns honestly on this sample.**
62
+
63
+ > **Read this first.** This repo ships three artifacts in priority
64
+ > order:
65
+ > 1. **`leakage_diagnostic.json`** — comprehensive audit of 8 oracle
66
+ > paths discovered on CYB009 and 6 README-suggested targets that
67
+ > are unlearnable on the sample after honest leak removal.
68
+ > 2. A working classifier for `vulnerability_class` 8-class — the
69
+ > only README target that learns honestly on this sample, and the
70
+ > weakest baseline in the XpertSystems catalog by design.
71
+ > 3. A feature engineering reference (`feature_engineering.py`).
72
+ >
73
+ > If you came here looking for a strong baseline, you will be
74
+ > disappointed. If you came here to understand why the CYB009 sample
75
+ > has hard-to-detect structural label-feature determinism, the
76
+ > diagnostic is exactly the artifact you need.
77
+
78
+ ## Model overview
79
+
80
+ | Property | Value |
81
+ |---|---|
82
+ | Primary task | 8-class `vulnerability_class` classification (CWE-style families) |
83
+ | Primary artifact | **`leakage_diagnostic.json`** — 8 oracle paths + 6 unlearnable targets |
84
+ | Training data | `xpertsystems/cyb009-sample` (2,638 vulnerabilities) |
85
+ | Models | XGBoost + PyTorch MLP |
86
+ | Input features | 57 (after one-hot encoding) |
87
+ | Split | Stratified random (per-vulnerability, no group structure to leak) |
88
+ | Validation | Single seed (artifact) + multi-seed aggregate across 10 seeds |
89
+ | License | CC-BY-NC-4.0 (matches dataset) |
90
+ | Status | Reference baseline + comprehensive leakage diagnostic |
91
+
92
+ ## Why this task — and the journey to get here
93
+
94
+ The CYB009 README lists 11 suggested use cases. We piloted every
95
+ README-headline target and found pervasive structural leakage. The
96
+ abandoned candidates, in order of how we discovered them:
97
+
98
+ ### Initial candidate: `exploit_maturity_final` 4-class (ABANDONED)
99
+
100
+ The most natural target — 4-class (unproven/PoC/functional/weaponised),
101
+ n=2638 well-balanced (36/27/25/12%), maps directly to EPSS calibration.
102
+ Initial feasibility hit **acc 0.74, macro-F1 0.72, ROC-AUC 0.91 vs
103
+ majority 0.36** — a +38pp lift looked excellent.
104
+
105
+ **Then we found the leak.** `cvss_temporal_score_final` divided by
106
+ `cvss_base_score` clusters near-deterministically per maturity tier:
107
+
108
+ | Maturity tier | Observed ratio (median ± std) | CVSS v3.1 multiplier |
109
+ |---|---:|---:|
110
+ | unproven | 0.801 ± 0.011 | 0.91 × (other Temporal factors) |
111
+ | proof_of_concept | 0.827 ± 0.011 | 0.94 × (other Temporal factors) |
112
+ | functional | 0.854 ± 0.011 | 0.97 × (other Temporal factors) |
113
+ | weaponised | 0.880 ± 0.012 | 1.00 × (other Temporal factors) |
114
+
115
+ This is exactly the CVSS v3.1 Exploit Code Maturity multiplier
116
+ (unproven 0.91 / PoC 0.94 / functional 0.97 / high or weaponised 1.00),
117
+ combined with other near-constant Temporal factors (Remediation Level,
118
+ Report Confidence). **The cvss_temporal/cvss_base ratio uniquely
119
+ identifies the maturity tier.**
120
+
121
+ Drop `cvss_temporal_score_final` → accuracy collapses to **0.31**
122
+ (below majority 0.36). The target is structurally unlearnable on the
123
+ sample once the oracle is removed.
124
+
125
+ ### Other 5 candidates: also unlearnable after honest leak removal
126
+
127
+ | Target | n_positive | Maj baseline | Honest acc | Honest AUC | Verdict |
128
+ |---|---:|---:|---:|---:|---|
129
+ | `exploitation_occurred_flag` | 203 | 0.923 | 0.857 | 0.65 | Below majority |
130
+ | `zero_day_flag` | 76 | 0.971 | 0.949 | 0.60 | Below majority |
131
+ | `cisa_kev_flag` | 14 | 0.995 | 0.992 | 0.61 | Below majority |
132
+ | `supply_chain_propagation_flag` | 20 | 0.992 | 0.992 | 0.80 | Below majority |
133
+ | `false_positive_flag` | 205 | 0.922 | 0.866 | 0.52 | Below majority |
134
+
135
+ All five rare-event binaries are oracled by `time_to_exploit_days`
136
+ (-1 sentinel) or `time_to_remediate_days` (120 sentinel) at full
137
+ features; after honest leak removal, all are at-or-below majority.
138
+
139
+ ### Per-timestep multi-class targets: state-machine oracles
140
+
141
+ `lifecycle_phase`, `patch_status`, and `remediation_status` on
142
+ `vulnerability_records.csv` form a tightly-coupled state machine:
143
+ - `lifecycle_phase = residual_risk_review` → 100% `remediated`
144
+ - `lifecycle_phase = discovery` → 100% `undetected`
145
+ - `lifecycle_phase = remediation_deployment` → 100% `in_remediation`
146
+ - `patch_status = deployed` → 100% `remediated`
147
+
148
+ Naive evaluation on these targets reaches accuracy 0.95-0.98, but any
149
+ two of the three deterministically pin the third. None of these is a
150
+ viable independent ML target on the sample.
151
+
152
+ ### `severity_class`: 100% mechanical CVSS function
153
+
154
+ Observed `cvss_base_score` ranges per severity match CVSS v3.1 exactly:
155
+ critical [9.0, 10.0], high [7.0, 9.0], medium [4.0, 7.0], low [1.8, 4.0].
156
+ Predicting severity is trivial with CVSS; below majority (acc 0.55 vs
157
+ 0.51) without it.
158
+
159
+ ### `vulnerability_class` 8-class: the only honest target — and the baseline ships
160
+
161
+ After exhausting the README-suggested targets, `vulnerability_class`
162
+ is the only one that learns honestly:
163
+ - **acc 0.244 ± 0.023, macro-F1 0.230 ± 0.024, ROC-AUC 0.687 ± 0.014**
164
+ - **+7pp lift over majority** (the catalog's smallest)
165
+ - **All 8 classes represented** (per-class F1 0.09-0.33)
166
+ - **No oracle feature** — modest signal genuinely spread across CVSS,
167
+ EPSS, asset context, and binary flags
168
+
169
+ This is the **weakest baseline in the XpertSystems catalog by design**.
170
+ The full ~487k-row product would tighten per-class signal materially.
171
+ The dataset roadmap recommendations in `leakage_diagnostic.json`
172
+ describe what would make CYB009's headline targets viable on the
173
+ sample.
174
+
175
+ ## Quick start
176
+
177
+ ```bash
178
+ pip install xgboost torch safetensors pandas huggingface_hub
179
+ ```
180
+
181
+ ```python
182
+ from huggingface_hub import hf_hub_download, snapshot_download
183
+ import json, numpy as np, torch, xgboost as xgb
184
+ from safetensors.torch import load_file
185
+
186
+ REPO = "xpertsystems/cyb009-baseline-classifier"
187
+
188
+ paths = {n: hf_hub_download(REPO, n) for n in [
189
+ "model_xgb.json", "model_mlp.safetensors",
190
+ "feature_engineering.py", "feature_meta.json", "feature_scaler.json",
191
+ ]}
192
+
193
+ import sys, os
194
+ sys.path.insert(0, os.path.dirname(paths["feature_engineering.py"]))
195
+ from feature_engineering import (
196
+ transform_single, load_meta, build_asset_lookup, INT_TO_LABEL,
197
+ )
198
+
199
+ meta = load_meta(paths["feature_meta.json"])
200
+
201
+ # Asset features are joined from asset_inventory.csv at inference time
202
+ ds = snapshot_download("xpertsystems/cyb009-sample", repo_type="dataset")
203
+ asset_lookup = build_asset_lookup(f"{ds}/asset_inventory.csv")
204
+
205
+ xgb_model = xgb.XGBClassifier(); xgb_model.load_model(paths["model_xgb.json"])
206
+
207
+ # Predict (see inference_example.ipynb for the full pattern)
208
+ # Note: do NOT include exploit_maturity_final, cvss_temporal_score_final,
209
+ # time_to_exploit_days, time_to_remediate_days, patch_lag_days, or
210
+ # risk_score_composite - those were the outcome-leak columns.
211
+ X = transform_single(my_vuln_record, meta, asset_lookup=asset_lookup)
212
+ proba = xgb_model.predict_proba(X)[0]
213
+ print(INT_TO_LABEL[int(np.argmax(proba))])
214
+ ```
215
+
216
+ See [`inference_example.ipynb`](./inference_example.ipynb) for the full
217
+ copy-paste demo.
218
+
219
+ ## Training data
220
+
221
+ Trained on the public sample of CYB009, 2,638 per-vulnerability records:
222
+
223
+ | Vulnerability class | Vulns | Class share |
224
+ |---|---:|---:|
225
+ | `memory_corruption` | 465 | 17.6% |
226
+ | `injection_family` | 436 | 16.5% |
227
+ | `misconfiguration` | 435 | 16.5% |
228
+ | `auth_access_control` | 350 | 13.3% |
229
+ | `cryptographic_failure` | 301 | 11.4% |
230
+ | `supply_chain_weakness` | 271 | 10.3% |
231
+ | `logic_flaw` | 228 | 8.6% |
232
+ | `information_disclosure` | 152 | 5.8% |
233
+
234
+ ### Stratified split
235
+
236
+ Per-vulnerability task (one row per vuln in `vuln_summary.csv`),
237
+ **StratifiedShuffleSplit** nested 70/15/15:
238
+
239
+ | Fold | Vulns |
240
+ |---|---:|
241
+ | Train | 1,846 |
242
+ | Validation | 396 |
243
+ | Test | 396 |
244
+
245
+ Class imbalance addressed with `class_weight='balanced'` (XGBoost
246
+ `sample_weight`) and weighted cross-entropy (MLP).
247
+
248
+ ## Feature pipeline
249
+
250
+ The bundled `feature_engineering.py` is the canonical recipe. 57
251
+ features survive after encoding, drawn from:
252
+
253
+ - **Per-vulnerability numeric** (10): `cvss_base_score`,
254
+ `epss_score_final`, plus 8 binary post-hoc flags
255
+ - **Per-vulnerability categorical** (1, one-hot): `severity_class`
256
+ (4 values, CVSS-derived but useful as feature)
257
+ - **Asset features** (joined from `asset_inventory.csv`): 8 numeric
258
+ + 4 categorical (asset_type, criticality_tier, environment_type,
259
+ os_family)
260
+ - **Engineered** (5): `log_epss`, `is_high_cvss`,
261
+ `exposure_severity_composite`, `risk_flag_count`, `epss_x_base`
262
+
263
+ ### Excluded columns (outcome leaks)
264
+
265
+ | Column | Why excluded |
266
+ |---|---|
267
+ | `exploit_maturity_final` | Indirect leak via CVSS temporal multiplier (would reintroduce the 0.91/0.94/0.97/1.00 oracle) |
268
+ | `cvss_temporal_score_final` | Near-deterministic per `exploit_maturity_final` tier (the primary leak we discovered) |
269
+ | `time_to_exploit_days` | -1 sentinel oracle for `exploitation_occurred_flag` |
270
+ | `time_to_remediate_days` | 120 sentinel oracle for `remediation_success_flag` |
271
+ | `patch_lag_days` | Suspected similar sentinel (precaution) |
272
+ | `risk_score_composite` | Computed from flag fields (indirect oracle) |
273
+
274
+ ## Evaluation
275
+
276
+ ### Test-set metrics, seed 42 (n = 396 vulnerabilities)
277
+
278
+ **XGBoost** (the published `model_xgb.json` artifact)
279
+
280
+ | Metric | Value |
281
+ |---|---:|
282
+ | Macro ROC-AUC (OvR) | **0.6837** |
283
+ | Accuracy | **0.2374** |
284
+ | Macro-F1 | 0.2244 |
285
+ | Weighted-F1 | 0.2407 |
286
+
287
+ **MLP** (the published `model_mlp.safetensors` artifact)
288
+
289
+ | Metric | Value |
290
+ |---|---:|
291
+ | Macro ROC-AUC (OvR) | **0.6899** |
292
+ | Accuracy | **0.2323** |
293
+ | Macro-F1 | 0.2209 |
294
+ | Weighted-F1 | 0.2362 |
295
+
296
+ MLP and XGBoost are within noise of each other on this task — both
297
+ are publishing the same modest honest signal.
298
+
299
+ ### Multi-seed robustness (XGBoost, 10 seeds)
300
+
301
+ | Metric | Mean | Std | Min | Max |
302
+ |---|---:|---:|---:|---:|
303
+ | Accuracy | 0.244 | 0.023 | 0.217 | 0.283 |
304
+ | Macro-F1 | 0.230 | 0.024 | 0.206 | 0.280 |
305
+ | Macro ROC-AUC OvR | 0.687 | 0.014 | 0.660 | 0.700 |
306
+
307
+ All 10 seeds yielded all 8 classes in the test fold (stratified split
308
+ guarantees this). Full per-seed results in
309
+ [`multi_seed_results.json`](./multi_seed_results.json).
310
+
311
+ ### Per-class F1 (seed 42)
312
+
313
+ | Vulnerability class | Class share | XGBoost F1 | MLP F1 |
314
+ |---|---:|---:|---:|
315
+ | `memory_corruption` | 17.6% | **0.333** | 0.365 |
316
+ | `information_disclosure` | 5.8% | 0.291 | 0.154 |
317
+ | `misconfiguration` | 16.5% | 0.259 | 0.162 |
318
+ | `injection_family` | 16.5% | 0.237 | 0.235 |
319
+ | `supply_chain_weakness` | 10.3% | 0.222 | 0.292 |
320
+ | `cryptographic_failure` | 11.4% | 0.217 | 0.168 |
321
+ | `auth_access_control` | 13.3% | 0.146 | 0.163 |
322
+ | `logic_flaw` | 8.6% | **0.090** | 0.228 |
323
+
324
+ `memory_corruption` (highest mean CVSS at 8.3) and
325
+ `information_disclosure` (lowest mean CVSS at 5.4) are the most
326
+ distinctive classes. `logic_flaw` is the hardest — its feature
327
+ distribution overlaps closely with everything else.
328
+
329
+ ### Ablation: which feature groups matter
330
+
331
+ | Configuration | Accuracy | Macro-F1 | ROC-AUC | Δ accuracy |
332
+ |---|---:|---:|---:|---:|
333
+ | Full feature set (published) | 0.2374 | 0.2244 | 0.6837 | — |
334
+ | No CVSS features | 0.2121 | 0.1926 | 0.6690 | **−0.0253** |
335
+ | No asset features | 0.2172 | 0.1967 | 0.6870 | −0.0202 |
336
+ | No engineered features | 0.2323 | 0.2216 | 0.6871 | −0.0051 |
337
+ | No severity (one-hot) | 0.2273 | 0.2175 | 0.6857 | −0.0101 |
338
+ | No EPSS features | 0.2475 | 0.2237 | 0.6926 | +0.0101 |
339
+ | No binary flags | 0.2273 | 0.2114 | 0.6776 | −0.0101 |
340
+
341
+ Three findings:
342
+
343
+ 1. **No feature group is dominant.** Largest single drop is 2.5pp
344
+ (CVSS features). Every group contributes a little; nothing
345
+ contributes a lot. The signal is genuinely diffuse.
346
+ 2. **CVSS and asset features carry the most signal** (~2pp each),
347
+ consistent with the observation that per-class CVSS means
348
+ differ (5.4 to 8.3) and asset features modestly inform class.
349
+ 3. **EPSS features slightly *hurt*** on this task (+1pp without
350
+ them). EPSS is intended for exploitation prediction, not class
351
+ prediction; on this sample it acts as small additional noise.
352
+
353
+ ### Architecture
354
+
355
+ **XGBoost:** multi-class gradient boosting (`multi:softprob`, 8 classes),
356
+ `hist` tree method, class-balanced sample weights, early stopping on
357
+ validation mlogloss.
358
+
359
+ **MLP:** `57 → 128 → 64 → 8`, each hidden layer followed by
360
+ `BatchNorm1d` → `ReLU` → `Dropout(0.3)`, weighted cross-entropy loss,
361
+ AdamW optimizer, early stopping on validation macro-F1.
362
+
363
+ Training hyperparameters are held internally by XpertSystems.
364
+
365
+ ## Limitations
366
+
367
+ **This is a baseline reference, not a production vulnerability
368
+ classifier.**
369
+
370
+ 1. **The headline finding is the leakage diagnostic, not the
371
+ classifier.** Read `leakage_diagnostic.json` first. The classifier
372
+ demonstrates that vulnerability_class is the only README-suggested
373
+ target that learns honestly on the sample.
374
+
375
+ 2. **Per-class F1 ranges 0.09–0.33.** The model is more confident on
376
+ memory_corruption and information_disclosure than on logic_flaw
377
+ and auth_access_control. For production use, expect different
378
+ error patterns by class.
379
+
380
+ 3. **No feature group contributes more than 3pp accuracy.** The
381
+ model has no single decisive signal; instead it integrates many
382
+ weakly-informative features. Removing any one group has minimal
383
+ impact.
384
+
385
+ 4. **Synthetic-vs-real transfer.** The dataset is synthetic, calibrated
386
+ to 12 benchmarks from authoritative vulnerability intelligence
387
+ sources (NIST NVD, EPSS v3, CISA KEV, Mandiant, Verizon DBIR,
388
+ Rapid7, Qualys, Tenable). Real vulnerability telemetry has
389
+ different noise characteristics — in particular, the
390
+ structural-oracle patterns documented in
391
+ `leakage_diagnostic.json` (CVSS temporal multipliers,
392
+ sentinel-coded time fields, lifecycle state-machine determinism)
393
+ would not be present in real data with comparable density. Real
394
+ data has stochastic transitions and observation noise.
395
+
396
+ 5. **2,638 vulnerabilities is a modest training set for 8 classes.**
397
+ The 396-vulnerability test fold yields stable multi-seed metrics
398
+ (std 0.023) but per-class confidence intervals are wide. The full
399
+ ~487k-row product has materially more data per class.
400
+
401
+ ## Notes on dataset schema
402
+
403
+ The CYB009 sample dataset README describes some fields differently
404
+ from the actual schema. This note helps buyers reconcile what they
405
+ read with what they receive.
406
+
407
+ | What the README says | What the data actually contains |
408
+ |---|---|
409
+ | `vulnerability_records` has 19 columns | Data has **16 columns** |
410
+ | `vulnerability_records` includes `severity`, `exploited_in_wild_flag`, `cisa_kev_listed_flag`, `zero_day_flag`, `supply_chain_flag`, `internet_exposed`, `sla_breached_flag` | **None of these columns exist** in vulnerability_records. Per-vuln flags are only on vuln_summary. |
411
+ | `vuln_class` has 10 values (incl. `race_condition`, `web_application`, `configuration`) | **8 values** in the data; differs in: `misconfiguration` (not `configuration`), `auth_access_control` (not `authentication_bypass`), `logic_flaw` (new); no `race_condition`, no `web_application`, no `deserialization` |
412
+ | 8 lifecycle phases | **12 phases** in the data, adding `residual_risk_review` (45% of all rows), `false_positive_closed`, `sla_breach`, `accepted_risk`, `discovery`, `organisational_triage`, `exploitation_in_wild` |
413
+ | `patch_status` has 4 values | **6 values** in the data: adds `vendor_notified`, `patch_in_development`, `patch_validated` |
414
+ | `severity` has 5 values (incl. `none`) | **4 values** in the data (`severity_class`): low, medium, high, critical only |
415
+ | `vuln_summary` has 15 columns | Data has **21 columns** |
416
+ | Field renames | `severity_final` → `severity_class`; `cvss_base_score_final` → `cvss_base_score`; `cisa_kev_listed` → `cisa_kev_flag`; `exploited_in_wild` → `exploitation_occurred_flag`; `supply_chain_compromise` → `supply_chain_propagation_flag` |
417
+ | Semantic inversion | README's `sla_breached` (True = bad) ↔ data's `sla_compliance_flag` (True = good) |
418
+ | `remediation_outcome` categorical (patched/mitigated/accepted/unpatched) | Replaced with `remediation_success_flag` (binary) plus per-timestep `remediation_status` |
419
+ | Not in README | New fields: `risk_score_composite`, `compensating_control_flag`, `time_to_exploit_days`, `time_to_remediate_days`, `patch_lag_days` |
420
+
421
+ None of these affects model correctness — the feature pipeline uses
422
+ the actual column names. If you build your own pipeline against the
423
+ dataset, use the actual columns.
424
+
425
+ ## Intended use
426
+
427
+ - **Reading the leakage diagnostic** — the primary value of this repo.
428
+ Reusable methodology for any synthetic vulnerability dataset.
429
+ - **Evaluating fit** of the CYB009 dataset for your research, with
430
+ open knowledge of the structural-oracle patterns
431
+ - **Honest baseline reference** for the only README-suggested target
432
+ that learns on the sample
433
+ - **Feature engineering reference** for per-vulnerability ML
434
+
435
+ ## Out-of-scope use
436
+
437
+ - **Production vulnerability triage** on real telemetry
438
+ - **Exploit maturity prediction** — README headline target,
439
+ unlearnable on the sample after honest leak removal
440
+ - **Zero-day / KEV / supply-chain prediction** — README headline
441
+ targets, unlearnable as rare-event binaries on the sample
442
+ - **SLA breach prediction** — README headline target, unlearnable
443
+ after honest leak removal
444
+ - Any operational security decision without further validation on
445
+ real data
446
+
447
+ ## Reproducibility
448
+
449
+ Outputs above were produced with `seed = 42` (published artifact),
450
+ nested `StratifiedShuffleSplit` (70/15/15), on the published sample
451
+ (`xpertsystems/cyb009-sample`, version 1.0.0, generated 2026-05-16).
452
+ The feature pipeline in `feature_engineering.py` is deterministic and
453
+ the trained weights in this repo correspond exactly to the metrics
454
+ above.
455
+
456
+ Multi-seed results (seeds 42, 7, 13, 17, 23, 31, 45, 99, 123, 200)
457
+ in `multi_seed_results.json` confirm robust performance across splits
458
+ (std 0.023 on accuracy).
459
+
460
+ The training script itself is private to XpertSystems.
461
+
462
+ ## Files in this repo
463
+
464
+ | File | Purpose |
465
+ |---|---|
466
+ | **`leakage_diagnostic.json`** | **PRIMARY ARTIFACT — 8 oracle paths + 6 unlearnable targets** |
467
+ | `model_xgb.json` | XGBoost weights (seed 42) |
468
+ | `model_mlp.safetensors` | PyTorch MLP weights (seed 42) |
469
+ | `feature_engineering.py` | Feature pipeline |
470
+ | `feature_meta.json` | Feature column order + categorical levels |
471
+ | `feature_scaler.json` | MLP input mean/std (XGBoost ignores) |
472
+ | `validation_results.json` | Per-class metrics, confusion matrix, architecture |
473
+ | `ablation_results.json` | Per-feature-group ablation |
474
+ | `multi_seed_results.json` | XGBoost metrics across 10 seeds |
475
+ | `inference_example.ipynb` | End-to-end inference demo notebook |
476
+ | `README.md` | This file |
477
+
478
+ ## Contact and full product
479
+
480
+ The full **CYB009** dataset contains **~487,000 vulnerability records**
481
+ across four files, with calibrated benchmark validation against 12
482
+ metrics drawn from authoritative vulnerability intelligence sources
483
+ (NIST NVD, EPSS v3, CISA KEV, Mandiant, Verizon DBIR, Rapid7, Qualys,
484
+ Tenable). The full XpertSystems.ai synthetic data catalogue spans 41
485
+ SKUs across Cybersecurity, Healthcare, Insurance & Risk, Oil & Gas,
486
+ and Materials & Energy.
487
+
488
+ - 📧 **pradeep@xpertsystems.ai**
489
+ - 🌐 **https://xpertsystems.ai**
490
+ - 🗂 Dataset: https://huggingface.co/datasets/xpertsystems/cyb009-sample
491
+ - 🤖 Companion models:
492
+ - https://huggingface.co/xpertsystems/cyb001-baseline-classifier (network traffic)
493
+ - https://huggingface.co/xpertsystems/cyb002-baseline-classifier (ATT&CK kill-chain)
494
+ - https://huggingface.co/xpertsystems/cyb003-baseline-classifier (malware execution phase)
495
+ - https://huggingface.co/xpertsystems/cyb004-baseline-classifier (phishing campaign phase)
496
+ - https://huggingface.co/xpertsystems/cyb005-baseline-classifier (ransomware actor-tier attribution)
497
+ - https://huggingface.co/xpertsystems/cyb006-baseline-classifier (user risk tier + leakage diagnostic)
498
+ - https://huggingface.co/xpertsystems/cyb007-baseline-classifier (insider threat type)
499
+ - https://huggingface.co/xpertsystems/cyb008-baseline-classifier (SOC alert triage + leakage diagnostic)
500
+
501
+ ## Citation
502
+
503
+ ```bibtex
504
+ @misc{xpertsystems_cyb009_baseline_2026,
505
+ title = {CYB009 Baseline Classifier: XGBoost and MLP for Vulnerability Classification, with the XpertSystems Catalog's Most Comprehensive Structural-Leakage Audit},
506
+ author = {XpertSystems.ai},
507
+ year = {2026},
508
+ url = {https://huggingface.co/xpertsystems/cyb009-baseline-classifier},
509
+ note = {Reference baseline + 8-oracle-path leakage diagnostic on xpertsystems/cyb009-sample}
510
+ }
511
+ ```
ablation_results.json ADDED
@@ -0,0 +1,818 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "purpose": "Quantify how much each feature group contributes to the honest XGBoost score. Identical architecture, same stratified split, with one feature group dropped at a time.",
3
+ "full_model_metrics": {
4
+ "model": "xgboost",
5
+ "accuracy": 0.23737373737373738,
6
+ "macro_f1": 0.22437482872901052,
7
+ "weighted_f1": 0.23213786276177156,
8
+ "per_class_f1": {
9
+ "auth_access_control": 0.14583333333333334,
10
+ "cryptographic_failure": 0.21686746987951808,
11
+ "information_disclosure": 0.2909090909090909,
12
+ "injection_family": 0.23728813559322035,
13
+ "logic_flaw": 0.08955223880597014,
14
+ "memory_corruption": 0.3333333333333333,
15
+ "misconfiguration": 0.2589928057553957,
16
+ "supply_chain_weakness": 0.2222222222222222
17
+ },
18
+ "confusion_matrix": {
19
+ "labels": [
20
+ "auth_access_control",
21
+ "cryptographic_failure",
22
+ "information_disclosure",
23
+ "injection_family",
24
+ "logic_flaw",
25
+ "memory_corruption",
26
+ "misconfiguration",
27
+ "supply_chain_weakness"
28
+ ],
29
+ "matrix": [
30
+ [
31
+ 7,
32
+ 7,
33
+ 0,
34
+ 11,
35
+ 6,
36
+ 10,
37
+ 7,
38
+ 5
39
+ ],
40
+ [
41
+ 4,
42
+ 9,
43
+ 3,
44
+ 5,
45
+ 3,
46
+ 5,
47
+ 16,
48
+ 0
49
+ ],
50
+ [
51
+ 3,
52
+ 0,
53
+ 8,
54
+ 1,
55
+ 4,
56
+ 0,
57
+ 7,
58
+ 0
59
+ ],
60
+ [
61
+ 3,
62
+ 6,
63
+ 1,
64
+ 14,
65
+ 8,
66
+ 20,
67
+ 6,
68
+ 7
69
+ ],
70
+ [
71
+ 4,
72
+ 4,
73
+ 5,
74
+ 3,
75
+ 3,
76
+ 2,
77
+ 13,
78
+ 0
79
+ ],
80
+ [
81
+ 11,
82
+ 3,
83
+ 0,
84
+ 13,
85
+ 3,
86
+ 27,
87
+ 5,
88
+ 8
89
+ ],
90
+ [
91
+ 6,
92
+ 9,
93
+ 15,
94
+ 2,
95
+ 5,
96
+ 7,
97
+ 18,
98
+ 3
99
+ ],
100
+ [
101
+ 5,
102
+ 0,
103
+ 0,
104
+ 4,
105
+ 1,
106
+ 21,
107
+ 2,
108
+ 8
109
+ ]
110
+ ]
111
+ },
112
+ "macro_roc_auc_ovr": 0.6837125710196055
113
+ },
114
+ "ablations": {
115
+ "no_cvss": {
116
+ "n_features": 55,
117
+ "dropped_count": 2,
118
+ "metrics": {
119
+ "model": "xgboost_no_cvss",
120
+ "accuracy": 0.21212121212121213,
121
+ "macro_f1": 0.19261691542621184,
122
+ "weighted_f1": 0.20621456669040633,
123
+ "per_class_f1": {
124
+ "auth_access_control": 0.14285714285714285,
125
+ "cryptographic_failure": 0.09523809523809523,
126
+ "information_disclosure": 0.14705882352941177,
127
+ "injection_family": 0.23728813559322035,
128
+ "logic_flaw": 0.16216216216216217,
129
+ "memory_corruption": 0.33121019108280253,
130
+ "misconfiguration": 0.2028985507246377,
131
+ "supply_chain_weakness": 0.2222222222222222
132
+ },
133
+ "confusion_matrix": {
134
+ "labels": [
135
+ "auth_access_control",
136
+ "cryptographic_failure",
137
+ "information_disclosure",
138
+ "injection_family",
139
+ "logic_flaw",
140
+ "memory_corruption",
141
+ "misconfiguration",
142
+ "supply_chain_weakness"
143
+ ],
144
+ "matrix": [
145
+ [
146
+ 6,
147
+ 3,
148
+ 0,
149
+ 13,
150
+ 7,
151
+ 12,
152
+ 7,
153
+ 5
154
+ ],
155
+ [
156
+ 3,
157
+ 3,
158
+ 8,
159
+ 3,
160
+ 7,
161
+ 5,
162
+ 12,
163
+ 4
164
+ ],
165
+ [
166
+ 2,
167
+ 1,
168
+ 5,
169
+ 0,
170
+ 5,
171
+ 2,
172
+ 8,
173
+ 0
174
+ ],
175
+ [
176
+ 1,
177
+ 3,
178
+ 3,
179
+ 14,
180
+ 2,
181
+ 20,
182
+ 10,
183
+ 12
184
+ ],
185
+ [
186
+ 1,
187
+ 2,
188
+ 7,
189
+ 2,
190
+ 6,
191
+ 1,
192
+ 15,
193
+ 0
194
+ ],
195
+ [
196
+ 10,
197
+ 2,
198
+ 1,
199
+ 13,
200
+ 2,
201
+ 26,
202
+ 5,
203
+ 11
204
+ ],
205
+ [
206
+ 4,
207
+ 3,
208
+ 20,
209
+ 3,
210
+ 9,
211
+ 5,
212
+ 14,
213
+ 7
214
+ ],
215
+ [
216
+ 4,
217
+ 1,
218
+ 1,
219
+ 5,
220
+ 2,
221
+ 16,
222
+ 2,
223
+ 10
224
+ ]
225
+ ]
226
+ },
227
+ "macro_roc_auc_ovr": 0.669002340507073
228
+ },
229
+ "delta_accuracy": 0.02525252525252525,
230
+ "delta_macro_f1": 0.031757913302798674
231
+ },
232
+ "no_epss": {
233
+ "n_features": 54,
234
+ "dropped_count": 3,
235
+ "metrics": {
236
+ "model": "xgboost_no_epss",
237
+ "accuracy": 0.2474747474747475,
238
+ "macro_f1": 0.2237319833172186,
239
+ "weighted_f1": 0.24186505327006125,
240
+ "per_class_f1": {
241
+ "auth_access_control": 0.17204301075268819,
242
+ "cryptographic_failure": 0.08,
243
+ "information_disclosure": 0.25,
244
+ "injection_family": 0.3089430894308943,
245
+ "logic_flaw": 0.11904761904761904,
246
+ "memory_corruption": 0.4050632911392405,
247
+ "misconfiguration": 0.25757575757575757,
248
+ "supply_chain_weakness": 0.19718309859154928
249
+ },
250
+ "confusion_matrix": {
251
+ "labels": [
252
+ "auth_access_control",
253
+ "cryptographic_failure",
254
+ "information_disclosure",
255
+ "injection_family",
256
+ "logic_flaw",
257
+ "memory_corruption",
258
+ "misconfiguration",
259
+ "supply_chain_weakness"
260
+ ],
261
+ "matrix": [
262
+ [
263
+ 8,
264
+ 6,
265
+ 0,
266
+ 12,
267
+ 7,
268
+ 11,
269
+ 5,
270
+ 4
271
+ ],
272
+ [
273
+ 6,
274
+ 3,
275
+ 3,
276
+ 5,
277
+ 10,
278
+ 4,
279
+ 12,
280
+ 2
281
+ ],
282
+ [
283
+ 2,
284
+ 2,
285
+ 7,
286
+ 2,
287
+ 3,
288
+ 0,
289
+ 7,
290
+ 0
291
+ ],
292
+ [
293
+ 2,
294
+ 5,
295
+ 2,
296
+ 19,
297
+ 6,
298
+ 20,
299
+ 6,
300
+ 5
301
+ ],
302
+ [
303
+ 2,
304
+ 3,
305
+ 5,
306
+ 2,
307
+ 5,
308
+ 1,
309
+ 15,
310
+ 1
311
+ ],
312
+ [
313
+ 9,
314
+ 6,
315
+ 0,
316
+ 10,
317
+ 2,
318
+ 32,
319
+ 4,
320
+ 7
321
+ ],
322
+ [
323
+ 6,
324
+ 2,
325
+ 16,
326
+ 1,
327
+ 15,
328
+ 4,
329
+ 17,
330
+ 4
331
+ ],
332
+ [
333
+ 5,
334
+ 3,
335
+ 0,
336
+ 7,
337
+ 2,
338
+ 16,
339
+ 1,
340
+ 7
341
+ ]
342
+ ]
343
+ },
344
+ "macro_roc_auc_ovr": 0.6925718594708064
345
+ },
346
+ "delta_accuracy": -0.01010101010101011,
347
+ "delta_macro_f1": 0.0006428454117919091
348
+ },
349
+ "no_flags": {
350
+ "n_features": 48,
351
+ "dropped_count": 9,
352
+ "metrics": {
353
+ "model": "xgboost_no_flags",
354
+ "accuracy": 0.22727272727272727,
355
+ "macro_f1": 0.21140688534448485,
356
+ "weighted_f1": 0.2214593080677342,
357
+ "per_class_f1": {
358
+ "auth_access_control": 0.13186813186813187,
359
+ "cryptographic_failure": 0.1686746987951807,
360
+ "information_disclosure": 0.3333333333333333,
361
+ "injection_family": 0.2764227642276423,
362
+ "logic_flaw": 0.08450704225352113,
363
+ "memory_corruption": 0.34838709677419355,
364
+ "misconfiguration": 0.24806201550387597,
365
+ "supply_chain_weakness": 0.1
366
+ },
367
+ "confusion_matrix": {
368
+ "labels": [
369
+ "auth_access_control",
370
+ "cryptographic_failure",
371
+ "information_disclosure",
372
+ "injection_family",
373
+ "logic_flaw",
374
+ "memory_corruption",
375
+ "misconfiguration",
376
+ "supply_chain_weakness"
377
+ ],
378
+ "matrix": [
379
+ [
380
+ 6,
381
+ 6,
382
+ 1,
383
+ 9,
384
+ 5,
385
+ 10,
386
+ 6,
387
+ 10
388
+ ],
389
+ [
390
+ 5,
391
+ 7,
392
+ 3,
393
+ 5,
394
+ 5,
395
+ 4,
396
+ 14,
397
+ 2
398
+ ],
399
+ [
400
+ 3,
401
+ 0,
402
+ 10,
403
+ 1,
404
+ 4,
405
+ 0,
406
+ 5,
407
+ 0
408
+ ],
409
+ [
410
+ 3,
411
+ 7,
412
+ 1,
413
+ 17,
414
+ 7,
415
+ 18,
416
+ 4,
417
+ 8
418
+ ],
419
+ [
420
+ 3,
421
+ 5,
422
+ 6,
423
+ 2,
424
+ 3,
425
+ 2,
426
+ 13,
427
+ 0
428
+ ],
429
+ [
430
+ 8,
431
+ 3,
432
+ 0,
433
+ 14,
434
+ 3,
435
+ 27,
436
+ 4,
437
+ 11
438
+ ],
439
+ [
440
+ 4,
441
+ 10,
442
+ 16,
443
+ 2,
444
+ 7,
445
+ 6,
446
+ 16,
447
+ 4
448
+ ],
449
+ [
450
+ 6,
451
+ 0,
452
+ 0,
453
+ 8,
454
+ 3,
455
+ 18,
456
+ 2,
457
+ 4
458
+ ]
459
+ ]
460
+ },
461
+ "macro_roc_auc_ovr": 0.6776398959263554
462
+ },
463
+ "delta_accuracy": 0.01010101010101011,
464
+ "delta_macro_f1": 0.01296794338452567
465
+ },
466
+ "no_asset": {
467
+ "n_features": 18,
468
+ "dropped_count": 39,
469
+ "metrics": {
470
+ "model": "xgboost_no_asset",
471
+ "accuracy": 0.21717171717171718,
472
+ "macro_f1": 0.19672873773465777,
473
+ "weighted_f1": 0.2140924517062793,
474
+ "per_class_f1": {
475
+ "auth_access_control": 0.10526315789473684,
476
+ "cryptographic_failure": 0.13043478260869565,
477
+ "information_disclosure": 0.13793103448275862,
478
+ "injection_family": 0.17857142857142858,
479
+ "logic_flaw": 0.08695652173913043,
480
+ "memory_corruption": 0.37333333333333335,
481
+ "misconfiguration": 0.26865671641791045,
482
+ "supply_chain_weakness": 0.2926829268292683
483
+ },
484
+ "confusion_matrix": {
485
+ "labels": [
486
+ "auth_access_control",
487
+ "cryptographic_failure",
488
+ "information_disclosure",
489
+ "injection_family",
490
+ "logic_flaw",
491
+ "memory_corruption",
492
+ "misconfiguration",
493
+ "supply_chain_weakness"
494
+ ],
495
+ "matrix": [
496
+ [
497
+ 5,
498
+ 6,
499
+ 1,
500
+ 8,
501
+ 5,
502
+ 16,
503
+ 7,
504
+ 5
505
+ ],
506
+ [
507
+ 5,
508
+ 6,
509
+ 6,
510
+ 5,
511
+ 5,
512
+ 3,
513
+ 13,
514
+ 2
515
+ ],
516
+ [
517
+ 2,
518
+ 2,
519
+ 4,
520
+ 1,
521
+ 4,
522
+ 1,
523
+ 8,
524
+ 1
525
+ ],
526
+ [
527
+ 11,
528
+ 7,
529
+ 1,
530
+ 10,
531
+ 8,
532
+ 15,
533
+ 7,
534
+ 6
535
+ ],
536
+ [
537
+ 1,
538
+ 6,
539
+ 8,
540
+ 2,
541
+ 3,
542
+ 1,
543
+ 12,
544
+ 1
545
+ ],
546
+ [
547
+ 9,
548
+ 9,
549
+ 0,
550
+ 9,
551
+ 2,
552
+ 28,
553
+ 3,
554
+ 10
555
+ ],
556
+ [
557
+ 4,
558
+ 10,
559
+ 15,
560
+ 7,
561
+ 5,
562
+ 2,
563
+ 18,
564
+ 4
565
+ ],
566
+ [
567
+ 5,
568
+ 1,
569
+ 0,
570
+ 5,
571
+ 3,
572
+ 14,
573
+ 1,
574
+ 12
575
+ ]
576
+ ]
577
+ },
578
+ "macro_roc_auc_ovr": 0.6869647093980484
579
+ },
580
+ "delta_accuracy": 0.020202020202020193,
581
+ "delta_macro_f1": 0.02764609099435275
582
+ },
583
+ "no_severity": {
584
+ "n_features": 53,
585
+ "dropped_count": 4,
586
+ "metrics": {
587
+ "model": "xgboost_no_severity",
588
+ "accuracy": 0.22727272727272727,
589
+ "macro_f1": 0.21747488568762768,
590
+ "weighted_f1": 0.2268764018926795,
591
+ "per_class_f1": {
592
+ "auth_access_control": 0.14893617021276595,
593
+ "cryptographic_failure": 0.19047619047619047,
594
+ "information_disclosure": 0.23333333333333334,
595
+ "injection_family": 0.288135593220339,
596
+ "logic_flaw": 0.12658227848101267,
597
+ "memory_corruption": 0.28205128205128205,
598
+ "misconfiguration": 0.24806201550387597,
599
+ "supply_chain_weakness": 0.2222222222222222
600
+ },
601
+ "confusion_matrix": {
602
+ "labels": [
603
+ "auth_access_control",
604
+ "cryptographic_failure",
605
+ "information_disclosure",
606
+ "injection_family",
607
+ "logic_flaw",
608
+ "memory_corruption",
609
+ "misconfiguration",
610
+ "supply_chain_weakness"
611
+ ],
612
+ "matrix": [
613
+ [
614
+ 7,
615
+ 7,
616
+ 0,
617
+ 9,
618
+ 7,
619
+ 12,
620
+ 7,
621
+ 4
622
+ ],
623
+ [
624
+ 5,
625
+ 8,
626
+ 3,
627
+ 2,
628
+ 8,
629
+ 5,
630
+ 14,
631
+ 0
632
+ ],
633
+ [
634
+ 3,
635
+ 0,
636
+ 7,
637
+ 1,
638
+ 7,
639
+ 0,
640
+ 5,
641
+ 0
642
+ ],
643
+ [
644
+ 3,
645
+ 6,
646
+ 2,
647
+ 17,
648
+ 5,
649
+ 20,
650
+ 7,
651
+ 5
652
+ ],
653
+ [
654
+ 3,
655
+ 5,
656
+ 7,
657
+ 3,
658
+ 5,
659
+ 2,
660
+ 9,
661
+ 0
662
+ ],
663
+ [
664
+ 10,
665
+ 7,
666
+ 0,
667
+ 13,
668
+ 4,
669
+ 22,
670
+ 4,
671
+ 10
672
+ ],
673
+ [
674
+ 5,
675
+ 6,
676
+ 18,
677
+ 2,
678
+ 8,
679
+ 6,
680
+ 16,
681
+ 4
682
+ ],
683
+ [
684
+ 5,
685
+ 0,
686
+ 0,
687
+ 6,
688
+ 1,
689
+ 19,
690
+ 2,
691
+ 8
692
+ ]
693
+ ]
694
+ },
695
+ "macro_roc_auc_ovr": 0.6857295225029008
696
+ },
697
+ "delta_accuracy": 0.01010101010101011,
698
+ "delta_macro_f1": 0.006899943041382833
699
+ },
700
+ "no_engineered": {
701
+ "n_features": 52,
702
+ "dropped_count": 5,
703
+ "metrics": {
704
+ "model": "xgboost_no_engineered",
705
+ "accuracy": 0.23232323232323232,
706
+ "macro_f1": 0.22158389829583944,
707
+ "weighted_f1": 0.22713804092389037,
708
+ "per_class_f1": {
709
+ "auth_access_control": 0.15053763440860216,
710
+ "cryptographic_failure": 0.14285714285714285,
711
+ "information_disclosure": 0.3157894736842105,
712
+ "injection_family": 0.23931623931623933,
713
+ "logic_flaw": 0.12987012987012986,
714
+ "memory_corruption": 0.345679012345679,
715
+ "misconfiguration": 0.23809523809523808,
716
+ "supply_chain_weakness": 0.21052631578947367
717
+ },
718
+ "confusion_matrix": {
719
+ "labels": [
720
+ "auth_access_control",
721
+ "cryptographic_failure",
722
+ "information_disclosure",
723
+ "injection_family",
724
+ "logic_flaw",
725
+ "memory_corruption",
726
+ "misconfiguration",
727
+ "supply_chain_weakness"
728
+ ],
729
+ "matrix": [
730
+ [
731
+ 7,
732
+ 5,
733
+ 0,
734
+ 9,
735
+ 9,
736
+ 13,
737
+ 6,
738
+ 4
739
+ ],
740
+ [
741
+ 5,
742
+ 6,
743
+ 2,
744
+ 3,
745
+ 7,
746
+ 4,
747
+ 15,
748
+ 3
749
+ ],
750
+ [
751
+ 3,
752
+ 1,
753
+ 9,
754
+ 1,
755
+ 6,
756
+ 0,
757
+ 3,
758
+ 0
759
+ ],
760
+ [
761
+ 5,
762
+ 8,
763
+ 2,
764
+ 14,
765
+ 6,
766
+ 19,
767
+ 3,
768
+ 8
769
+ ],
770
+ [
771
+ 2,
772
+ 4,
773
+ 4,
774
+ 3,
775
+ 5,
776
+ 2,
777
+ 14,
778
+ 0
779
+ ],
780
+ [
781
+ 8,
782
+ 6,
783
+ 0,
784
+ 13,
785
+ 3,
786
+ 28,
787
+ 4,
788
+ 8
789
+ ],
790
+ [
791
+ 5,
792
+ 9,
793
+ 17,
794
+ 2,
795
+ 6,
796
+ 7,
797
+ 15,
798
+ 4
799
+ ],
800
+ [
801
+ 5,
802
+ 0,
803
+ 0,
804
+ 7,
805
+ 1,
806
+ 19,
807
+ 1,
808
+ 8
809
+ ]
810
+ ]
811
+ },
812
+ "macro_roc_auc_ovr": 0.6871096699405611
813
+ },
814
+ "delta_accuracy": 0.005050505050505055,
815
+ "delta_macro_f1": 0.0027909304331710794
816
+ }
817
+ }
818
+ }
feature_engineering.py ADDED
@@ -0,0 +1,401 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ feature_engineering.py
3
+ ======================
4
+
5
+ Feature pipeline for the CYB009 baseline classifier.
6
+
7
+ Predicts `vulnerability_class` (8-class vulnerability classification)
8
+ from per-vulnerability features on the CYB009 sample dataset.
9
+
10
+ CSV inputs:
11
+ vuln_summary.csv (primary, one row per vulnerability,
12
+ 2,638 vulnerabilities)
13
+ asset_inventory.csv (per-asset registry, joined for asset
14
+ context features)
15
+ vulnerability_records.csv (per-timestep trajectory; reserved)
16
+ vuln_lifecycle_events.csv (discrete event log; reserved)
17
+
18
+ Target classes (8):
19
+ auth_access_control, cryptographic_failure, information_disclosure,
20
+ injection_family, logic_flaw, memory_corruption, misconfiguration,
21
+ supply_chain_weakness
22
+
23
+ Why this task (and why not the more obvious targets)
24
+ ----------------------------------------------------
25
+ The CYB009 README lists 11 suggested use cases. We piloted every
26
+ README-headline target on the sample dataset and found the sample
27
+ has pervasive structural leakage that makes most targets either
28
+ trivially solvable via oracle features or unlearnable after honest
29
+ leakage removal:
30
+
31
+ - `exploit_maturity_final` (4-class) is structurally leaky via
32
+ `cvss_temporal_score_final`: CVSS v3.1 computes temporal score from
33
+ base score using Exploit Code Maturity multipliers (0.91 / 0.94 /
34
+ 0.97 / 1.00 for unproven / PoC / functional / weaponised), so the
35
+ cvss_temporal/cvss_base ratio clusters near-deterministically per
36
+ maturity tier (0.80 / 0.83 / 0.85 / 0.88 in the data). Drop
37
+ cvss_temporal -> accuracy collapses from 0.74 to 0.31 (below
38
+ majority 0.36).
39
+
40
+ - `remediation_status` / `patch_status` / `lifecycle_phase`
41
+ (per-timestep) form a tightly-coupled state machine. lifecycle_phase
42
+ = `residual_risk_review` -> 100% `remediated`. `patch_status =
43
+ deployed` -> 100% `remediated`. Any two of the three deterministically
44
+ pin the third.
45
+
46
+ - `severity_class` is 100% derived from `cvss_base_score` via CVSS
47
+ v3.1 boundaries (low=0.1-3.9, medium=4.0-6.9, high=7.0-8.9,
48
+ critical=9.0-10.0). Trivial if cvss_base included; below majority
49
+ (acc 0.55 vs majority 0.51) without it.
50
+
51
+ - All seven binary flags (`exploitation_occurred_flag`, `zero_day_flag`,
52
+ `cisa_kev_flag`, `supply_chain_propagation_flag`,
53
+ `remediation_success_flag`, `sla_compliance_flag`,
54
+ `false_positive_flag`) are at-or-below majority after honest
55
+ leakage removal of the event-time sentinels
56
+ (`time_to_exploit_days`, `time_to_remediate_days`, `patch_lag_days`,
57
+ `risk_score_composite`). See leakage_diagnostic.json.
58
+
59
+ `vulnerability_class` is the only README-suggested target that learns
60
+ honestly on the sample: acc 0.24, macro-F1 0.22, ROC-AUC 0.69 vs
61
+ majority baseline 0.18. Modest +6pp lift over majority - the weakest
62
+ baseline in the XpertSystems CYB catalog by design. The full ~487k-row
63
+ product would tighten per-class signal materially.
64
+
65
+ The model card frames this honestly: the strongest finding on CYB009
66
+ is the comprehensive leakage diagnostic rather than the modest
67
+ classifier performance. Buyers planning CYB009 ML work should read
68
+ the diagnostic first.
69
+
70
+ Leakage audit
71
+ -------------
72
+ Excluded as outcome leaks for this target:
73
+
74
+ 1. `exploit_maturity_final` - the target's natural pair via the CVSS
75
+ v3.1 temporal-score machinery.
76
+
77
+ 2. Event-time sentinel oracles dropped as precaution (not directly
78
+ leaky for vulnerability_class but indirectly via flag fields):
79
+ `time_to_exploit_days`, `time_to_remediate_days`, `patch_lag_days`,
80
+ `risk_score_composite`.
81
+
82
+ 3. `cvss_temporal_score_final` excluded because of the CVSS v3.1
83
+ maturity-multiplier structural encoding.
84
+
85
+ `severity_class` is KEPT as a one-hot feature because it's a derived
86
+ view of `cvss_base_score` rather than the target.
87
+
88
+ Binary post-hoc flags are KEPT as legitimate observables that a SOC
89
+ analyst would have at decision time. They contribute modest real
90
+ signal (a few pp accuracy).
91
+
92
+ Public API
93
+ ----------
94
+ build_features(vuln_summary_path, asset_inventory_path)
95
+ -> (X, y, ids, meta)
96
+ transform_single(record, meta, asset_lookup=None) -> np.ndarray
97
+ save_meta(meta, path) / load_meta(path)
98
+ build_asset_lookup(asset_inventory_path) -> dict
99
+
100
+ License
101
+ -------
102
+ Ships with the public model on Hugging Face under CC-BY-NC-4.0,
103
+ matching the dataset license. See README.md.
104
+ """
105
+
106
+ from __future__ import annotations
107
+
108
+ import json
109
+ from pathlib import Path
110
+ from typing import Any
111
+
112
+ import numpy as np
113
+ import pandas as pd
114
+
115
+ # ---------------------------------------------------------------------------
116
+ # Label space
117
+ # ---------------------------------------------------------------------------
118
+
119
+ # Eight vulnerability classes from the CYB009 sample. The README claims
120
+ # 10 classes but only 8 exist in the sample data.
121
+ LABEL_ORDER = [
122
+ "auth_access_control",
123
+ "cryptographic_failure",
124
+ "information_disclosure",
125
+ "injection_family",
126
+ "logic_flaw",
127
+ "memory_corruption",
128
+ "misconfiguration",
129
+ "supply_chain_weakness",
130
+ ]
131
+ LABEL_TO_INT = {lbl: i for i, lbl in enumerate(LABEL_ORDER)}
132
+ INT_TO_LABEL = {i: lbl for lbl, i in LABEL_TO_INT.items()}
133
+
134
+ # ---------------------------------------------------------------------------
135
+ # Identifier and target columns
136
+ # ---------------------------------------------------------------------------
137
+
138
+ ID_COLUMNS = ["vuln_id", "asset_id", "org_id"]
139
+ TARGET_COLUMN = "vulnerability_class"
140
+
141
+ # Outcome-leak columns excluded from features.
142
+ EXCLUDED_FROM_FEATURES = [
143
+ "time_to_exploit_days", # -1 sentinel oracle
144
+ "time_to_remediate_days", # 120 sentinel oracle
145
+ "patch_lag_days", # likely similar sentinel
146
+ "risk_score_composite", # computed from flag fields
147
+ "exploit_maturity_final", # indirect leak via CVSS temporal
148
+ "cvss_temporal_score_final", # near-deterministic per maturity tier
149
+ ]
150
+
151
+ # ---------------------------------------------------------------------------
152
+ # Per-vulnerability numeric features
153
+ # ---------------------------------------------------------------------------
154
+
155
+ VULN_NUMERIC_FEATURES = [
156
+ "cvss_base_score",
157
+ "epss_score_final",
158
+ "exploitation_occurred_flag",
159
+ "zero_day_flag",
160
+ "cisa_kev_flag",
161
+ "supply_chain_propagation_flag",
162
+ "compensating_control_flag",
163
+ "false_positive_flag",
164
+ "remediation_success_flag",
165
+ "sla_compliance_flag",
166
+ ]
167
+
168
+ VULN_CATEGORICAL_FEATURES = [
169
+ "severity_class", # 4 values; CVSS-derived but useful as feature
170
+ ]
171
+
172
+ # ---------------------------------------------------------------------------
173
+ # Asset features (joined on asset_id from asset_inventory.csv)
174
+ # ---------------------------------------------------------------------------
175
+
176
+ ASSET_NUMERIC_FEATURES = [
177
+ "scanner_coverage",
178
+ "patch_mgmt_maturity",
179
+ "mean_time_to_remediate_days",
180
+ "sla_critical_days",
181
+ "sla_high_days",
182
+ "sla_medium_days",
183
+ "internet_exposed_flag",
184
+ "sbom_depth_score",
185
+ ]
186
+
187
+ ASSET_CATEGORICAL_FEATURES = [
188
+ "asset_type", # 12 values
189
+ "criticality_tier", # 4 values
190
+ "environment_type", # 8 values
191
+ "os_family", # 6 values
192
+ ]
193
+
194
+
195
+ # ---------------------------------------------------------------------------
196
+ # Engineered features
197
+ # ---------------------------------------------------------------------------
198
+
199
+ def _add_engineered_features(df: pd.DataFrame) -> pd.DataFrame:
200
+ """
201
+ Five engineered features for vulnerability_class discrimination.
202
+ Note: no temporal-CVSS-derived features (those leak via the CVSS
203
+ v3.1 exploit-code-maturity machinery).
204
+ """
205
+ df = df.copy()
206
+
207
+ # 1. Log-scaled EPSS. EPSS is heavy-tailed.
208
+ df["log_epss"] = np.log1p(
209
+ df["epss_score_final"].clip(lower=0)
210
+ ).astype(float)
211
+
212
+ # 2. High-CVSS indicator. CVSS >= 7.0 (high or critical).
213
+ df["is_high_cvss"] = (df["cvss_base_score"] >= 7.0).astype(int)
214
+
215
+ # 3. Exposure x severity composite. Internet-exposed high-severity
216
+ # vulns are often weighted differently per class.
217
+ df["exposure_severity_composite"] = (
218
+ df.get("internet_exposed_flag", 0) * df["cvss_base_score"]
219
+ ).astype(float)
220
+
221
+ # 4. Flag count: total number of risk flags raised. Different vuln
222
+ # classes have different baseline flag patterns.
223
+ flag_cols = [
224
+ "exploitation_occurred_flag", "zero_day_flag", "cisa_kev_flag",
225
+ "supply_chain_propagation_flag", "compensating_control_flag",
226
+ "false_positive_flag",
227
+ ]
228
+ df["risk_flag_count"] = sum(df.get(c, 0) for c in flag_cols)
229
+
230
+ # 5. EPSS x CVSS composite.
231
+ df["epss_x_base"] = (
232
+ df["epss_score_final"] * df["cvss_base_score"]
233
+ ).astype(float)
234
+
235
+ return df
236
+
237
+
238
+ # ---------------------------------------------------------------------------
239
+ # Public API
240
+ # ---------------------------------------------------------------------------
241
+
242
+ def build_features(
243
+ vuln_summary_path: str | Path,
244
+ asset_inventory_path: str | Path,
245
+ ) -> tuple[pd.DataFrame, pd.Series, pd.Series, dict[str, Any]]:
246
+ """
247
+ Load vuln_summary.csv, join asset_inventory.csv, drop target +
248
+ identifiers + outcome leaks, engineer features, one-hot encode,
249
+ return (X, y, ids, meta).
250
+ """
251
+ vulns = pd.read_csv(vuln_summary_path)
252
+ assets = pd.read_csv(asset_inventory_path)
253
+
254
+ y = vulns[TARGET_COLUMN].map(LABEL_TO_INT)
255
+ if y.isna().any():
256
+ bad = vulns.loc[y.isna(), TARGET_COLUMN].unique()
257
+ raise ValueError(f"Unknown vulnerability_class values: {bad}")
258
+ y = y.astype(int)
259
+ ids = vulns["vuln_id"].copy()
260
+
261
+ asset_cols_needed = (
262
+ ["asset_id"] + ASSET_NUMERIC_FEATURES + ASSET_CATEGORICAL_FEATURES
263
+ )
264
+ vulns = vulns.merge(
265
+ assets[asset_cols_needed], on="asset_id", how="left",
266
+ )
267
+
268
+ vulns = vulns.drop(
269
+ columns=ID_COLUMNS + [TARGET_COLUMN] + EXCLUDED_FROM_FEATURES,
270
+ errors="ignore",
271
+ )
272
+
273
+ vulns = _add_engineered_features(vulns)
274
+
275
+ numeric_features = (
276
+ VULN_NUMERIC_FEATURES
277
+ + ASSET_NUMERIC_FEATURES
278
+ + [
279
+ "log_epss", "is_high_cvss", "exposure_severity_composite",
280
+ "risk_flag_count", "epss_x_base",
281
+ ]
282
+ )
283
+ numeric_features = [c for c in numeric_features if c in vulns.columns]
284
+ X_numeric = vulns[numeric_features].astype(float)
285
+
286
+ all_categorical = VULN_CATEGORICAL_FEATURES + ASSET_CATEGORICAL_FEATURES
287
+ categorical_levels: dict[str, list[str]] = {}
288
+ blocks: list[pd.DataFrame] = []
289
+ for col in all_categorical:
290
+ if col not in vulns.columns:
291
+ continue
292
+ levels = sorted(vulns[col].dropna().unique().tolist())
293
+ categorical_levels[col] = levels
294
+ block = pd.get_dummies(
295
+ vulns[col].astype("category").cat.set_categories(levels),
296
+ prefix=col, dummy_na=False,
297
+ ).astype(int)
298
+ blocks.append(block)
299
+
300
+ X = pd.concat(
301
+ [X_numeric.reset_index(drop=True)]
302
+ + [b.reset_index(drop=True) for b in blocks],
303
+ axis=1,
304
+ ).fillna(0.0)
305
+
306
+ meta = {
307
+ "feature_names": X.columns.tolist(),
308
+ "numeric_features": numeric_features,
309
+ "categorical_levels": categorical_levels,
310
+ "label_to_int": LABEL_TO_INT,
311
+ "int_to_label": INT_TO_LABEL,
312
+ "outcome_leak_excluded": EXCLUDED_FROM_FEATURES,
313
+ }
314
+ return X, y, ids, meta
315
+
316
+
317
+ def transform_single(
318
+ record: dict | pd.DataFrame,
319
+ meta: dict[str, Any],
320
+ asset_lookup: dict | None = None,
321
+ ) -> np.ndarray:
322
+ """Encode a single vulnerability record for inference."""
323
+ if isinstance(record, dict):
324
+ df = pd.DataFrame([record.copy()])
325
+ else:
326
+ df = record.copy()
327
+
328
+ if asset_lookup is not None and "asset_id" in df.columns:
329
+ asset_id = df["asset_id"].iloc[0]
330
+ asset_feats = asset_lookup.get(asset_id, {})
331
+ for k, v in asset_feats.items():
332
+ if k not in df.columns:
333
+ df[k] = v
334
+
335
+ df = _add_engineered_features(df)
336
+
337
+ numeric = pd.DataFrame({
338
+ col: df.get(col, pd.Series([0.0] * len(df))).astype(float).values
339
+ for col in meta["numeric_features"]
340
+ })
341
+ blocks: list[pd.DataFrame] = [numeric]
342
+ for col, levels in meta["categorical_levels"].items():
343
+ val = df.get(col, pd.Series([None] * len(df)))
344
+ block = pd.get_dummies(
345
+ val.astype("category").cat.set_categories(levels),
346
+ prefix=col, dummy_na=False,
347
+ ).astype(int)
348
+ for lvl in levels:
349
+ cname = f"{col}_{lvl}"
350
+ if cname not in block.columns:
351
+ block[cname] = 0
352
+ block = block[[f"{col}_{lvl}" for lvl in levels]]
353
+ blocks.append(block)
354
+
355
+ X = pd.concat(blocks, axis=1).fillna(0.0)
356
+ X = X.reindex(columns=meta["feature_names"], fill_value=0.0)
357
+ return X.values.astype(np.float32)
358
+
359
+
360
+ def save_meta(meta: dict[str, Any], path: str | Path) -> None:
361
+ serializable = {
362
+ "feature_names": meta["feature_names"],
363
+ "numeric_features": meta["numeric_features"],
364
+ "categorical_levels": meta["categorical_levels"],
365
+ "label_to_int": meta["label_to_int"],
366
+ "int_to_label": {str(k): v for k, v in meta["int_to_label"].items()},
367
+ "outcome_leak_excluded": meta.get("outcome_leak_excluded", []),
368
+ }
369
+ with open(path, "w") as f:
370
+ json.dump(serializable, f, indent=2)
371
+
372
+
373
+ def load_meta(path: str | Path) -> dict[str, Any]:
374
+ with open(path) as f:
375
+ meta = json.load(f)
376
+ meta["int_to_label"] = {int(k): v for k, v in meta["int_to_label"].items()}
377
+ return meta
378
+
379
+
380
+ def build_asset_lookup(asset_inventory_path: str | Path) -> dict[str, dict]:
381
+ """Build {asset_id: {asset feature values}} for inference-time lookup."""
382
+ assets = pd.read_csv(asset_inventory_path)
383
+ cols = ASSET_NUMERIC_FEATURES + ASSET_CATEGORICAL_FEATURES
384
+ out = {}
385
+ for _, row in assets.iterrows():
386
+ out[row["asset_id"]] = {c: row[c] for c in cols if c in assets.columns}
387
+ return out
388
+
389
+
390
+ if __name__ == "__main__":
391
+ import sys
392
+ base = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("/mnt/user-data/uploads")
393
+ X, y, ids, meta = build_features(
394
+ base / "vuln_summary.csv",
395
+ base / "asset_inventory.csv",
396
+ )
397
+ print(f"X shape: {X.shape}")
398
+ print(f"y shape: {y.shape}")
399
+ print(f"n_features: {len(meta['feature_names'])}")
400
+ print(f"label distribution:\n{y.map(INT_TO_LABEL).value_counts()}")
401
+ print(f"X has NaN: {X.isnull().any().any()}")
feature_meta.json ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "feature_names": [
3
+ "cvss_base_score",
4
+ "epss_score_final",
5
+ "exploitation_occurred_flag",
6
+ "zero_day_flag",
7
+ "cisa_kev_flag",
8
+ "supply_chain_propagation_flag",
9
+ "compensating_control_flag",
10
+ "false_positive_flag",
11
+ "remediation_success_flag",
12
+ "sla_compliance_flag",
13
+ "scanner_coverage",
14
+ "patch_mgmt_maturity",
15
+ "mean_time_to_remediate_days",
16
+ "sla_critical_days",
17
+ "sla_high_days",
18
+ "sla_medium_days",
19
+ "internet_exposed_flag",
20
+ "sbom_depth_score",
21
+ "log_epss",
22
+ "is_high_cvss",
23
+ "exposure_severity_composite",
24
+ "risk_flag_count",
25
+ "epss_x_base",
26
+ "severity_class_critical",
27
+ "severity_class_high",
28
+ "severity_class_low",
29
+ "severity_class_medium",
30
+ "asset_type_api_gateway",
31
+ "asset_type_cloud_vm",
32
+ "asset_type_container_workload",
33
+ "asset_type_database_server",
34
+ "asset_type_endpoint_workstation",
35
+ "asset_type_iot_firmware_device",
36
+ "asset_type_network_service",
37
+ "asset_type_ot_ics_controller",
38
+ "asset_type_saas_integration",
39
+ "asset_type_server_on_premises",
40
+ "asset_type_supply_chain_dependency",
41
+ "asset_type_web_application",
42
+ "criticality_tier_critical",
43
+ "criticality_tier_high",
44
+ "criticality_tier_low",
45
+ "criticality_tier_medium",
46
+ "environment_type_edge_iot_fleet",
47
+ "environment_type_hybrid_cloud",
48
+ "environment_type_on_premises_datacenter",
49
+ "environment_type_ot_ics_network",
50
+ "environment_type_public_cloud_aws",
51
+ "environment_type_public_cloud_azure",
52
+ "environment_type_public_cloud_gcp",
53
+ "environment_type_saas_dependent",
54
+ "os_family_android_iot",
55
+ "os_family_embedded_rtos",
56
+ "os_family_freebsd",
57
+ "os_family_linux",
58
+ "os_family_macos",
59
+ "os_family_windows"
60
+ ],
61
+ "numeric_features": [
62
+ "cvss_base_score",
63
+ "epss_score_final",
64
+ "exploitation_occurred_flag",
65
+ "zero_day_flag",
66
+ "cisa_kev_flag",
67
+ "supply_chain_propagation_flag",
68
+ "compensating_control_flag",
69
+ "false_positive_flag",
70
+ "remediation_success_flag",
71
+ "sla_compliance_flag",
72
+ "scanner_coverage",
73
+ "patch_mgmt_maturity",
74
+ "mean_time_to_remediate_days",
75
+ "sla_critical_days",
76
+ "sla_high_days",
77
+ "sla_medium_days",
78
+ "internet_exposed_flag",
79
+ "sbom_depth_score",
80
+ "log_epss",
81
+ "is_high_cvss",
82
+ "exposure_severity_composite",
83
+ "risk_flag_count",
84
+ "epss_x_base"
85
+ ],
86
+ "categorical_levels": {
87
+ "severity_class": [
88
+ "critical",
89
+ "high",
90
+ "low",
91
+ "medium"
92
+ ],
93
+ "asset_type": [
94
+ "api_gateway",
95
+ "cloud_vm",
96
+ "container_workload",
97
+ "database_server",
98
+ "endpoint_workstation",
99
+ "iot_firmware_device",
100
+ "network_service",
101
+ "ot_ics_controller",
102
+ "saas_integration",
103
+ "server_on_premises",
104
+ "supply_chain_dependency",
105
+ "web_application"
106
+ ],
107
+ "criticality_tier": [
108
+ "critical",
109
+ "high",
110
+ "low",
111
+ "medium"
112
+ ],
113
+ "environment_type": [
114
+ "edge_iot_fleet",
115
+ "hybrid_cloud",
116
+ "on_premises_datacenter",
117
+ "ot_ics_network",
118
+ "public_cloud_aws",
119
+ "public_cloud_azure",
120
+ "public_cloud_gcp",
121
+ "saas_dependent"
122
+ ],
123
+ "os_family": [
124
+ "android_iot",
125
+ "embedded_rtos",
126
+ "freebsd",
127
+ "linux",
128
+ "macos",
129
+ "windows"
130
+ ]
131
+ },
132
+ "label_to_int": {
133
+ "auth_access_control": 0,
134
+ "cryptographic_failure": 1,
135
+ "information_disclosure": 2,
136
+ "injection_family": 3,
137
+ "logic_flaw": 4,
138
+ "memory_corruption": 5,
139
+ "misconfiguration": 6,
140
+ "supply_chain_weakness": 7
141
+ },
142
+ "int_to_label": {
143
+ "0": "auth_access_control",
144
+ "1": "cryptographic_failure",
145
+ "2": "information_disclosure",
146
+ "3": "injection_family",
147
+ "4": "logic_flaw",
148
+ "5": "memory_corruption",
149
+ "6": "misconfiguration",
150
+ "7": "supply_chain_weakness"
151
+ },
152
+ "outcome_leak_excluded": [
153
+ "time_to_exploit_days",
154
+ "time_to_remediate_days",
155
+ "patch_lag_days",
156
+ "risk_score_composite",
157
+ "exploit_maturity_final",
158
+ "cvss_temporal_score_final"
159
+ ]
160
+ }
feature_scaler.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"mean": [7.285319609967497, 0.1090479414951246, 0.07150595882990249, 0.03304442036836403, 0.005958829902491874, 0.007042253521126761, 0.09425785482123511, 0.08017334777898158, 0.8169014084507042, 0.7497291440953413, 0.7142526002166847, 0.5597418201516792, 61.792957746478876, 55.768689057421454, 167.30606717226436, 334.6121343445287, 0.34615384615384615, 0.4373639219934995, 0.09744259094329327, 0.6208017334777898, 2.522881906825569, 0.29198266522210187, 0.8318568651137596, 0.09425785482123511, 0.5254604550379198, 0.013542795232936078, 0.366738894907909, 0.06229685807150596, 0.08342361863488625, 0.08125677139761647, 0.09967497291440953, 0.09859154929577464, 0.07367280606717226, 0.0790899241603467, 0.09588299024918744, 0.06879739978331528, 0.09588299024918744, 0.0790899241603467, 0.08234019501625135, 0.10130010834236186, 0.2502708559046587, 0.28819068255687974, 0.36023835319609965, 0.12459371614301191, 0.12838569880823403, 0.14626218851570963, 0.12838569880823403, 0.10725893824485373, 0.13217768147345613, 0.1256771397616468, 0.10725893824485373, 0.15113759479956662, 0.15113759479956662, 0.1706392199349946, 0.1771397616468039, 0.18309859154929578, 0.1668472372697725], "std": [1.3818122818772989, 0.12908583650215913, 0.25773793273380063, 0.17880102089146993, 0.07698397704359367, 0.08364478612676705, 0.29226629026174605, 0.2716349620172026, 0.3868521254539063, 0.4332863417748008, 0.11297622294038147, 0.11921698382007824, 29.329761686203444, 27.416611488095498, 82.2498344642865, 164.499668928573, 0.4758718669674715, 0.12938965829339574, 0.10723050270659185, 0.4853190013040946, 3.5582862465806073, 0.6006416367944402, 1.0477891599805764, 0.2922662902617461, 0.49948665171105067, 0.11561413750437974, 0.4820449709408879, 0.24175942859093658, 0.27659638908598905, 0.27330307614640587, 0.29964731299698594, 0.2981936022209592, 0.2613084632006893, 0.2699521899581333, 0.29451048975837896, 0.253177883658354, 0.29451048975837896, 0.2699521899581333, 0.27495679912705384, 0.3018074546734477, 0.4332863417748008, 0.4530430424460816, 0.48019953797717524, 0.33034714867544795, 0.3346094186957652, 0.3534646244039396, 0.3346094186957652, 0.3095260212759078, 0.3387756096144394, 0.3315749585800216, 0.3095260212759078, 0.35828000060732873, 0.35828000060732873, 0.37629533874646653, 0.38189038988561785, 0.3868521254539063, 0.37294045160714023]}
inference_example.ipynb ADDED
@@ -0,0 +1,345 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# CYB009 Baseline Classifier — Inference Example\n",
8
+ "\n",
9
+ "End-to-end demo: load the trained XGBoost and PyTorch MLP models from the Hugging Face repo and predict the **vulnerability class** (8-class CWE-style family) for a vulnerability record.\n",
10
+ "\n",
11
+ "**Models predict one of 8 vulnerability classes:** `auth_access_control`, `cryptographic_failure`, `information_disclosure`, `injection_family`, `logic_flaw`, `memory_corruption`, `misconfiguration`, `supply_chain_weakness`.\n",
12
+ "\n",
13
+ "**Read `leakage_diagnostic.json` first.** This is the most extensive structural-leakage audit in the XpertSystems catalog. Eight oracle paths were found across CYB009's targets; vulnerability_class is the only README-suggested target that learns honestly on the sample, and it gives the catalog's weakest baseline (acc 0.24 vs majority 0.18). The primary artifact value of this repo is the diagnostic, not the classifier."
14
+ ]
15
+ },
16
+ {
17
+ "cell_type": "markdown",
18
+ "metadata": {},
19
+ "source": [
20
+ "## 1. Install dependencies"
21
+ ]
22
+ },
23
+ {
24
+ "cell_type": "code",
25
+ "execution_count": null,
26
+ "metadata": {},
27
+ "outputs": [],
28
+ "source": [
29
+ "%pip install --quiet xgboost torch safetensors pandas numpy huggingface_hub"
30
+ ]
31
+ },
32
+ {
33
+ "cell_type": "markdown",
34
+ "metadata": {},
35
+ "source": [
36
+ "## 2. Download model artifacts from Hugging Face"
37
+ ]
38
+ },
39
+ {
40
+ "cell_type": "code",
41
+ "execution_count": null,
42
+ "metadata": {},
43
+ "outputs": [],
44
+ "source": [
45
+ "from huggingface_hub import hf_hub_download\n",
46
+ "\n",
47
+ "REPO_ID = \"xpertsystems/cyb009-baseline-classifier\"\n",
48
+ "\n",
49
+ "files = {}\n",
50
+ "for name in [\"model_xgb.json\", \"model_mlp.safetensors\",\n",
51
+ " \"feature_engineering.py\", \"feature_meta.json\",\n",
52
+ " \"feature_scaler.json\"]:\n",
53
+ " files[name] = hf_hub_download(repo_id=REPO_ID, filename=name)\n",
54
+ " print(f\" downloaded: {name}\")"
55
+ ]
56
+ },
57
+ {
58
+ "cell_type": "code",
59
+ "execution_count": null,
60
+ "metadata": {},
61
+ "outputs": [],
62
+ "source": [
63
+ "import sys, os\n",
64
+ "fe_dir = os.path.dirname(files[\"feature_engineering.py\"])\n",
65
+ "if fe_dir not in sys.path:\n",
66
+ " sys.path.insert(0, fe_dir)\n",
67
+ "\n",
68
+ "from feature_engineering import (\n",
69
+ " transform_single, load_meta, build_asset_lookup, INT_TO_LABEL,\n",
70
+ ")"
71
+ ]
72
+ },
73
+ {
74
+ "cell_type": "markdown",
75
+ "metadata": {},
76
+ "source": [
77
+ "## 3. Load models and metadata"
78
+ ]
79
+ },
80
+ {
81
+ "cell_type": "code",
82
+ "execution_count": null,
83
+ "metadata": {},
84
+ "outputs": [],
85
+ "source": [
86
+ "import json\n",
87
+ "import numpy as np\n",
88
+ "import torch\n",
89
+ "import torch.nn as nn\n",
90
+ "import xgboost as xgb\n",
91
+ "from safetensors.torch import load_file\n",
92
+ "\n",
93
+ "meta = load_meta(files[\"feature_meta.json\"])\n",
94
+ "with open(files[\"feature_scaler.json\"]) as f:\n",
95
+ " scaler = json.load(f)\n",
96
+ "\n",
97
+ "N_FEATURES = len(meta[\"feature_names\"])\n",
98
+ "N_CLASSES = len(meta[\"int_to_label\"])\n",
99
+ "print(f\"feature count: {N_FEATURES}\")\n",
100
+ "print(f\"class count: {N_CLASSES}\")\n",
101
+ "print(f\"label classes: {list(meta['int_to_label'].values())}\")\n",
102
+ "print(f\"\\noutcome-leak columns excluded from features:\")\n",
103
+ "for c in meta.get(\"outcome_leak_excluded\", []):\n",
104
+ " print(f\" - {c}\")"
105
+ ]
106
+ },
107
+ {
108
+ "cell_type": "code",
109
+ "execution_count": null,
110
+ "metadata": {},
111
+ "outputs": [],
112
+ "source": [
113
+ "xgb_model = xgb.XGBClassifier()\n",
114
+ "xgb_model.load_model(files[\"model_xgb.json\"])\n",
115
+ "\n",
116
+ "# MLP architecture (must match training)\n",
117
+ "class VulnClassMLP(nn.Module):\n",
118
+ " def __init__(self, n_features, n_classes=8, hidden1=128, hidden2=64, dropout=0.3):\n",
119
+ " super().__init__()\n",
120
+ " self.net = nn.Sequential(\n",
121
+ " nn.Linear(n_features, hidden1),\n",
122
+ " nn.BatchNorm1d(hidden1),\n",
123
+ " nn.ReLU(),\n",
124
+ " nn.Dropout(dropout),\n",
125
+ " nn.Linear(hidden1, hidden2),\n",
126
+ " nn.BatchNorm1d(hidden2),\n",
127
+ " nn.ReLU(),\n",
128
+ " nn.Dropout(dropout),\n",
129
+ " nn.Linear(hidden2, n_classes),\n",
130
+ " )\n",
131
+ " def forward(self, x):\n",
132
+ " return self.net(x)\n",
133
+ "\n",
134
+ "mlp_model = VulnClassMLP(N_FEATURES, n_classes=N_CLASSES)\n",
135
+ "mlp_model.load_state_dict(load_file(files[\"model_mlp.safetensors\"]))\n",
136
+ "mlp_model.eval()\n",
137
+ "print(\"models loaded\")"
138
+ ]
139
+ },
140
+ {
141
+ "cell_type": "markdown",
142
+ "metadata": {},
143
+ "source": [
144
+ "## 4. Load asset inventory for asset-feature lookup\n",
145
+ "\n",
146
+ "The model uses asset context (asset_type, criticality, environment, OS, scanner_coverage, etc.) as features. To predict on a new vulnerability, we look up its asset features from the asset_inventory."
147
+ ]
148
+ },
149
+ {
150
+ "cell_type": "code",
151
+ "execution_count": null,
152
+ "metadata": {},
153
+ "outputs": [],
154
+ "source": [
155
+ "from huggingface_hub import snapshot_download\n",
156
+ "\n",
157
+ "ds_path = snapshot_download(repo_id=\"xpertsystems/cyb009-sample\", repo_type=\"dataset\")\n",
158
+ "asset_lookup = build_asset_lookup(f\"{ds_path}/asset_inventory.csv\")\n",
159
+ "print(f\"loaded {len(asset_lookup)} asset records\")"
160
+ ]
161
+ },
162
+ {
163
+ "cell_type": "markdown",
164
+ "metadata": {},
165
+ "source": [
166
+ "## 5. Prediction helper"
167
+ ]
168
+ },
169
+ {
170
+ "cell_type": "code",
171
+ "execution_count": null,
172
+ "metadata": {},
173
+ "outputs": [],
174
+ "source": [
175
+ "MU = np.array(scaler[\"mean\"], dtype=np.float32)\n",
176
+ "SD = np.array(scaler[\"std\"], dtype=np.float32)\n",
177
+ "\n",
178
+ "def predict_vuln_class(record: dict) -> dict:\n",
179
+ " \"\"\"Predict the vulnerability class for one record.\n",
180
+ "\n",
181
+ " Note: do NOT include exploit_maturity_final, cvss_temporal_score_final,\n",
182
+ " time_to_exploit_days, time_to_remediate_days, patch_lag_days, or\n",
183
+ " risk_score_composite in the record. These were outcome leaks in\n",
184
+ " the training data and are excluded from the feature set.\n",
185
+ "\n",
186
+ " Asset features (asset_type, criticality, etc.) are looked up\n",
187
+ " from asset_inventory by asset_id.\n",
188
+ " \"\"\"\n",
189
+ " X = transform_single(record, meta, asset_lookup=asset_lookup)\n",
190
+ "\n",
191
+ " xgb_proba = xgb_model.predict_proba(X)[0]\n",
192
+ " xgb_label = INT_TO_LABEL[int(np.argmax(xgb_proba))]\n",
193
+ "\n",
194
+ " Xs = ((X - MU) / SD).astype(np.float32)\n",
195
+ " with torch.no_grad():\n",
196
+ " logits = mlp_model(torch.tensor(Xs))\n",
197
+ " mlp_proba = torch.softmax(logits, dim=1).numpy()[0]\n",
198
+ " mlp_label = INT_TO_LABEL[int(np.argmax(mlp_proba))]\n",
199
+ "\n",
200
+ " return {\n",
201
+ " \"xgboost\": {\n",
202
+ " \"label\": xgb_label,\n",
203
+ " \"probabilities\": {INT_TO_LABEL[i]: float(p) for i, p in enumerate(xgb_proba)},\n",
204
+ " },\n",
205
+ " \"mlp\": {\n",
206
+ " \"label\": mlp_label,\n",
207
+ " \"probabilities\": {INT_TO_LABEL[i]: float(p) for i, p in enumerate(mlp_proba)},\n",
208
+ " },\n",
209
+ " }"
210
+ ]
211
+ },
212
+ {
213
+ "cell_type": "markdown",
214
+ "metadata": {},
215
+ "source": [
216
+ "## 6. Run on an example record\n",
217
+ "\n",
218
+ "Real critical-severity vulnerability from the CYB009 sample. True class is `memory_corruption` (CVSS 9.9, exploitation hasn't yet occurred, compensating control in place). On this kind of high-CVSS critical vulnerability the model has its strongest signal."
219
+ ]
220
+ },
221
+ {
222
+ "cell_type": "code",
223
+ "execution_count": null,
224
+ "metadata": {},
225
+ "outputs": [],
226
+ "source": [
227
+ "# Real vulnerability from the sample dataset (true class: memory_corruption)\n",
228
+ "# Note: asset_id is supplied so asset features are auto-looked-up\n",
229
+ "example_record = {\n",
230
+ " \"asset_id\": \"ASSET000001\",\n",
231
+ " \"severity_class\": \"critical\",\n",
232
+ " \"cvss_base_score\": 9.9,\n",
233
+ " \"epss_score_final\": 0.2397,\n",
234
+ " \"sla_compliance_flag\": 1,\n",
235
+ " \"exploitation_occurred_flag\": 0,\n",
236
+ " \"zero_day_flag\": 0,\n",
237
+ " \"remediation_success_flag\": 1,\n",
238
+ " \"compensating_control_flag\": 1,\n",
239
+ " \"supply_chain_propagation_flag\": 0,\n",
240
+ " \"cisa_kev_flag\": 0,\n",
241
+ " \"false_positive_flag\": 0,\n",
242
+ "}\n",
243
+ "\n",
244
+ "result = predict_vuln_class(example_record)\n",
245
+ "\n",
246
+ "print(f\"XGBoost -> {result['xgboost']['label']}\")\n",
247
+ "for lbl, p in sorted(result['xgboost']['probabilities'].items(), key=lambda x: -x[1]):\n",
248
+ " print(f\" P({lbl:30s}) = {p:.4f}\")\n",
249
+ "\n",
250
+ "print(f\"\\nMLP -> {result['mlp']['label']}\")\n",
251
+ "for lbl, p in sorted(result['mlp']['probabilities'].items(), key=lambda x: -x[1]):\n",
252
+ " print(f\" P({lbl:30s}) = {p:.4f}\")"
253
+ ]
254
+ },
255
+ {
256
+ "cell_type": "markdown",
257
+ "metadata": {},
258
+ "source": [
259
+ "### Modest, honest confidence\n",
260
+ "\n",
261
+ "The model's confidence on individual predictions is modest (top-1 typically 0.2-0.4) because vulnerability_class is a genuinely hard task on this sample. The per-class feature distributions overlap heavily — different vuln classes have similar CVSS, EPSS, and asset distributions.\n",
262
+ "\n",
263
+ "The model is a useful baseline (acc 0.24 vs majority 0.18, AUC 0.69) but not a production classifier. Read `leakage_diagnostic.json` for the structural reasons why every other CYB009 README-suggested target is either trivially solvable via oracle features or unlearnable after honest leak removal."
264
+ ]
265
+ },
266
+ {
267
+ "cell_type": "markdown",
268
+ "metadata": {},
269
+ "source": [
270
+ "## 7. Batch prediction on the sample dataset"
271
+ ]
272
+ },
273
+ {
274
+ "cell_type": "code",
275
+ "execution_count": null,
276
+ "metadata": {},
277
+ "outputs": [],
278
+ "source": [
279
+ "import pandas as pd\n",
280
+ "\n",
281
+ "vulns = pd.read_csv(f\"{ds_path}/vuln_summary.csv\")\n",
282
+ "\n",
283
+ "# Score the first 500 vulnerabilities\n",
284
+ "sample = vulns.head(500).copy()\n",
285
+ "preds = [predict_vuln_class(row.to_dict())[\"xgboost\"][\"label\"] for _, row in sample.iterrows()]\n",
286
+ "sample[\"xgb_pred\"] = preds\n",
287
+ "\n",
288
+ "ct = pd.crosstab(sample[\"vulnerability_class\"], sample[\"xgb_pred\"],\n",
289
+ " rownames=[\"true\"], colnames=[\"pred\"])\n",
290
+ "print(\"Confusion on first 500 sample vulnerabilities (XGBoost):\")\n",
291
+ "print(ct)\n",
292
+ "acc = (sample[\"vulnerability_class\"] == sample[\"xgb_pred\"]).mean()\n",
293
+ "print(f\"\\nbatch accuracy on first 500 vulns (in-distribution): {acc:.4f}\")\n",
294
+ "print(\"\\nNote: this includes training-set vulnerabilities. See validation_results.json\\n\"\n",
295
+ " \"for proper held-out test metrics.\")"
296
+ ]
297
+ },
298
+ {
299
+ "cell_type": "markdown",
300
+ "metadata": {},
301
+ "source": [
302
+ "## 8. Important reading: the leakage diagnostic\n",
303
+ "\n",
304
+ "Before using CYB009 sample data to train your own models, read **`leakage_diagnostic.json`** in this repo. It documents **8 oracle paths** across the sample's targets:\n",
305
+ "\n",
306
+ "1. **`cvss_temporal_score_final`** is a near-deterministic function of `exploit_maturity_final` (via CVSS v3.1 multipliers 0.91/0.94/0.97/1.00).\n",
307
+ "2. **`time_to_exploit_days`** uses a -1 sentinel that perfectly identifies `exploitation_occurred_flag = 0`.\n",
308
+ "3. **`time_to_remediate_days`** uses a 120 sentinel that perfectly identifies `remediation_success_flag = 0`.\n",
309
+ "4. **`severity_class`** is a 100% mechanical function of `cvss_base_score` (CVSS v3.1 boundaries).\n",
310
+ "5. **`lifecycle_phase`** has 5+ phases that deterministically pin `remediation_status` (e.g. `residual_risk_review` → 100% `remediated`).\n",
311
+ "6. **`patch_status`** has 5 of 6 values that pin `remediation_status` (e.g. `deployed` → 100% `remediated`).\n",
312
+ "7. **`risk_score_composite`** is computed from flag fields (indirect oracle).\n",
313
+ "8. **`patch_lag_days`** is suspected to have similar sentinel structure (precaution).\n",
314
+ "\n",
315
+ "It also documents **6 README-suggested headline targets that are unlearnable on the sample** after honest leak removal: `exploitation_occurred_flag`, `zero_day_flag`, `cisa_kev_flag`, `supply_chain_propagation_flag`, `false_positive_flag`, and `exploit_maturity_final`."
316
+ ]
317
+ },
318
+ {
319
+ "cell_type": "markdown",
320
+ "metadata": {},
321
+ "source": [
322
+ "## 9. Next steps\n",
323
+ "\n",
324
+ "- See `validation_results.json` for held-out test metrics (396 vulnerabilities).\n",
325
+ "- See `multi_seed_results.json` for the across-10-seeds picture (accuracy 0.244 ± 0.023, ROC-AUC 0.687 ± 0.014).\n",
326
+ "- See `ablation_results.json` — every feature group contributes 1-3pp accuracy, indicating spread-out modest signal across the feature set.\n",
327
+ "- See **`leakage_diagnostic.json`** for the comprehensive structural-leakage audit (8 oracle paths + 6 unlearnable targets).\n",
328
+ "- For the full ~487k-row CYB009 dataset and commercial licensing, contact **pradeep@xpertsystems.ai**."
329
+ ]
330
+ }
331
+ ],
332
+ "metadata": {
333
+ "kernelspec": {
334
+ "display_name": "Python 3",
335
+ "language": "python",
336
+ "name": "python3"
337
+ },
338
+ "language_info": {
339
+ "name": "python",
340
+ "version": "3.10"
341
+ }
342
+ },
343
+ "nbformat": 4,
344
+ "nbformat_minor": 5
345
+ }
leakage_diagnostic.json ADDED
@@ -0,0 +1,218 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "purpose": "CYB009 sample has the most pervasive structural leakage of any SKU in the XpertSystems catalog. Eight oracle paths were discovered, and five of the README's headline targets are unlearnable on the sample after honest leak removal. The primary baseline that ships with this repo (vulnerability_class 8-class) is the only README-suggested target that learns honestly - and it is the WEAKEST baseline in the catalog by design (acc 0.24 vs majority 0.18). The headline finding for CYB009 is this diagnostic, not the classifier.",
3
+ "primary_target": "vulnerability_class (8-class)",
4
+ "split": "StratifiedShuffleSplit, 70/15/15 nested",
5
+ "oracle_paths_documented": {
6
+ "P1_cvss_temporal_ratio": {
7
+ "target": "exploit_maturity_final",
8
+ "leak_column": "cvss_temporal_score_final",
9
+ "mechanism": "CVSS v3.1 computes Temporal Score from Base Score using an Exploit Code Maturity multiplier (0.91 unproven, 0.94 PoC, 0.97 functional, 1.00 high/weaponised). The cvss_temporal/cvss_base ratio in the sample clusters near these multipliers per maturity tier, making it a near-deterministic oracle for the target.",
10
+ "observed_ratios_by_tier": {
11
+ "functional": {
12
+ "min": 0.8516,
13
+ "median": 0.8537,
14
+ "max": 0.8843,
15
+ "std": 0.0113
16
+ },
17
+ "proof_of_concept": {
18
+ "min": 0.8255,
19
+ "median": 0.8274,
20
+ "max": 0.8567,
21
+ "std": 0.0114
22
+ },
23
+ "unproven": {
24
+ "min": 0.7991,
25
+ "median": 0.801,
26
+ "max": 0.8302,
27
+ "std": 0.011
28
+ },
29
+ "weaponised": {
30
+ "min": 0.878,
31
+ "median": 0.88,
32
+ "max": 0.9116,
33
+ "std": 0.0115
34
+ }
35
+ },
36
+ "impact": "With cvss_temporal_score_final included, XGBoost achieves test accuracy 0.74 (mF1 0.72, AUC 0.91). With it excluded, accuracy collapses to 0.31 (mF1 0.31, AUC 0.58) - below majority baseline of 0.36. The target is structurally unlearnable on the sample after honest leak removal."
37
+ },
38
+ "P2_time_to_exploit_sentinel": {
39
+ "target": "exploitation_occurred_flag (and zero_day_flag)",
40
+ "leak_column": "time_to_exploit_days",
41
+ "mechanism": "Sentinel-coded post-hoc field: -1 when no exploitation occurred; positive (0-95 days) when exploitation occurred. Perfect oracle.",
42
+ "evidence": {
43
+ "time_to_exploit_minus1_AND_flag_0": 2435,
44
+ "time_to_exploit_positive_AND_flag_1": 197,
45
+ "time_to_exploit_positive_AND_flag_0": 0
46
+ },
47
+ "impact": "Perfect oracle for exploitation_occurred_flag and zero_day_flag."
48
+ },
49
+ "P3_time_to_remediate_sentinel": {
50
+ "target": "remediation_success_flag, sla_compliance_flag",
51
+ "leak_column": "time_to_remediate_days",
52
+ "mechanism": "Sentinel-coded post-hoc field: 120 (the timeline horizon) when not remediated; lower values (3-113) when remediated. Perfect oracle.",
53
+ "evidence": {
54
+ "remediation_flag_0_time_mean": 120.0,
55
+ "remediation_flag_0_time_min": 120,
56
+ "remediation_flag_1_time_mean": 41.77892756349953,
57
+ "remediation_flag_1_time_max": 113
58
+ },
59
+ "impact": "Perfect oracle for remediation_success_flag and near-perfect for sla_compliance_flag."
60
+ },
61
+ "P4_severity_class_cvss_boundaries": {
62
+ "target": "severity_class",
63
+ "leak_column": "cvss_base_score",
64
+ "mechanism": "severity_class is computed as a CVSS v3.1 boundary function of cvss_base_score (low=0.1-3.9, medium=4.0-6.9, high=7.0-8.9, critical=9.0-10.0). Including cvss_base_score makes severity prediction trivial; excluding it leaves only weak signal (acc 0.55 vs majority 0.51 = barely above).",
65
+ "observed_cvss_ranges_per_severity": {
66
+ "critical": {
67
+ "min": 9.0,
68
+ "max": 10.0
69
+ },
70
+ "high": {
71
+ "min": 7.0,
72
+ "max": 9.0
73
+ },
74
+ "low": {
75
+ "min": 1.77,
76
+ "max": 4.0
77
+ },
78
+ "medium": {
79
+ "min": 4.02,
80
+ "max": 7.0
81
+ }
82
+ },
83
+ "impact": "100% mechanical encoding. severity_class is not a useful ML target on this dataset."
84
+ },
85
+ "P5_lifecycle_to_remediation": {
86
+ "target": "remediation_status (per-timestep)",
87
+ "leak_column": "lifecycle_phase",
88
+ "mechanism": "The 12-phase lifecycle state machine has multiple phases that deterministically pin remediation_status. ~83% of per-timestep rows have lifecycle_phase that determines remediation_status exactly.",
89
+ "deterministic_phase_mappings": {
90
+ "accepted_risk": {
91
+ "maps_to": "in_remediation",
92
+ "purity": 1.0,
93
+ "n_rows": 16
94
+ },
95
+ "discovery": {
96
+ "maps_to": "undetected",
97
+ "purity": 1.0,
98
+ "n_rows": 327
99
+ },
100
+ "false_positive_closed": {
101
+ "maps_to": "in_remediation",
102
+ "purity": 0.9944,
103
+ "n_rows": 1421
104
+ },
105
+ "organisational_triage": {
106
+ "maps_to": "triaged",
107
+ "purity": 1.0,
108
+ "n_rows": 18
109
+ },
110
+ "patch_release": {
111
+ "maps_to": "undetected",
112
+ "purity": 1.0,
113
+ "n_rows": 33
114
+ },
115
+ "remediation_deployment": {
116
+ "maps_to": "in_remediation",
117
+ "purity": 1.0,
118
+ "n_rows": 4362
119
+ },
120
+ "residual_risk_review": {
121
+ "maps_to": "remediated",
122
+ "purity": 1.0,
123
+ "n_rows": 8921
124
+ }
125
+ },
126
+ "impact": "Per-timestep targets remediation_status, patch_status, and lifecycle_phase form a tightly-coupled state machine; any two pin the third. All three appear as 0.95-0.98 accuracy in naive evaluation but are mechanically determined."
127
+ },
128
+ "P6_patch_to_remediation": {
129
+ "target": "remediation_status (per-timestep)",
130
+ "leak_column": "patch_status",
131
+ "mechanism": "Of 6 patch_status values, at least 5 map near-deterministically to a single remediation_status value. `patch_status=deployed` -> 100% `remediated`; `patch_validated`/`vendor_notified`/`patch_in_development`/`patch_released` -> ~99% `in_remediation`.",
132
+ "deterministic_status_mappings": {
133
+ "deployed": {
134
+ "maps_to": "remediated",
135
+ "purity": 1.0,
136
+ "n_rows": 8958
137
+ },
138
+ "patch_validated": {
139
+ "maps_to": "in_remediation",
140
+ "purity": 0.9941,
141
+ "n_rows": 5293
142
+ }
143
+ },
144
+ "impact": "patch_status alone is a near-oracle for remediation_status."
145
+ },
146
+ "P7_risk_score_composite": {
147
+ "target": "all binary flag fields (indirect)",
148
+ "leak_column": "risk_score_composite",
149
+ "mechanism": "risk_score_composite is computed in the generator from cvss_base_score, epss_score_final, and the flag fields. Including it in features would launder flag information into the model via this composite.",
150
+ "evidence": "Generator-side composite; correlation with all flag fields > 0.3.",
151
+ "impact": "Precautionary drop. Affects all binary flag targets."
152
+ },
153
+ "P8_patch_lag_days": {
154
+ "target": "remediation_success_flag (suspected)",
155
+ "leak_column": "patch_lag_days",
156
+ "mechanism": "Likely same sentinel-coding structure as time_to_remediate_days (120 sentinel for unpatched; lower values when patched). Dropped as precaution; not separately validated.",
157
+ "impact": "Precautionary drop."
158
+ }
159
+ },
160
+ "unlearnable_targets": [
161
+ {
162
+ "target": "exploitation_occurred_flag",
163
+ "n_positives": 203,
164
+ "majority_baseline": 0.9230477634571645,
165
+ "honest_accuracy": 0.8569023569023568,
166
+ "honest_roc_auc": 0.6534304796599878,
167
+ "verdict": "below_majority"
168
+ },
169
+ {
170
+ "target": "zero_day_flag",
171
+ "n_positives": 76,
172
+ "majority_baseline": 0.9711902956785443,
173
+ "honest_accuracy": 0.9486531986531986,
174
+ "honest_roc_auc": 0.6040141676505313,
175
+ "verdict": "below_majority"
176
+ },
177
+ {
178
+ "target": "cisa_kev_flag",
179
+ "n_positives": 14,
180
+ "majority_baseline": 0.9946929492039424,
181
+ "honest_accuracy": 0.9924242424242425,
182
+ "honest_roc_auc": 0.6125211505922166,
183
+ "verdict": "below_majority"
184
+ },
185
+ {
186
+ "target": "supply_chain_propagation_flag",
187
+ "n_positives": 20,
188
+ "majority_baseline": 0.9924184988627748,
189
+ "honest_accuracy": 0.9915824915824917,
190
+ "honest_roc_auc": 0.7950240316652529,
191
+ "verdict": "below_majority"
192
+ },
193
+ {
194
+ "target": "false_positive_flag",
195
+ "n_positives": 205,
196
+ "majority_baseline": 0.922289613343442,
197
+ "honest_accuracy": 0.8661616161616162,
198
+ "honest_roc_auc": 0.5172779496243923,
199
+ "verdict": "below_majority"
200
+ },
201
+ {
202
+ "target": "exploit_maturity_final (after cvss_temporal_score_final dropped)",
203
+ "n_classes": 4,
204
+ "majority_baseline": 0.35898407884761185,
205
+ "honest_accuracy": 0.30639730639730645,
206
+ "honest_roc_auc": 0.5731243306339614,
207
+ "verdict": "below_majority"
208
+ }
209
+ ],
210
+ "unlearnable_summary": "Six of the README's headline use cases are unlearnable on the sample after honest leak removal: exploitation_occurred_flag, zero_day_flag, cisa_kev_flag, supply_chain_propagation_flag, false_positive_flag, and exploit_maturity_final (the original primary candidate target before the cvss_temporal_score_final leakage was discovered). Only vulnerability_class learns honestly, and it gives the weakest baseline in the catalog (acc 0.24 vs majority 0.18).",
211
+ "recommendations_to_dataset_author": [
212
+ "Remove the deterministic CVSS v3.1 exploit-code-maturity multiplier from cvss_temporal_score_final calculation, or add per-vulnerability noise so the cvss_temporal/cvss_base ratio overlaps across maturity tiers. As shipped, the ratio uniquely identifies the tier.",
213
+ "Replace -1 / 120 / etc. sentinel values in time_to_exploit_days, time_to_remediate_days, and patch_lag_days with probabilistic censoring that doesn't perfectly identify the outcome class. For example, use the latest observed time on partially-complete trajectories rather than a fixed sentinel.",
214
+ "Decouple the lifecycle_phase -> remediation_status -> patch_status state machine. Real telemetry has noisy intermediate states (e.g. a vuln can move to patch_released without immediately being remediated). The current sample has 5+ pure deterministic edges in this graph.",
215
+ "Add per-vulnerability-class feature signatures. The 8 classes differ in cvss_base_score means (5.4-8.3) but per-class feature distributions overlap heavily. Add class-specific EPSS distributions, asset-affinity, and disclosure-timeline patterns to make class prediction tractable from features.",
216
+ "Increase positive-class counts for rare-event binaries in the sample: 14 cisa_kev positives, 20 supply_chain positives, and 76 zero_day positives are below the threshold for reliable minority-class ML evaluation at n=2638. Either upsample these in the sample or document them as full-product-only signals."
217
+ ]
218
+ }
model_mlp.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b67a4b451d808d4c8574681f1a59fa67647c20c1f50dfea4aa2e495b14a67cba
3
+ size 69096
model_xgb.json ADDED
The diff for this file is too large to render. See raw diff
 
multi_seed_results.json ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "purpose": "Multi-seed evaluation across 10 stratified splits of the 2,638-vulnerability sample.",
3
+ "seeds_evaluated": [
4
+ 42,
5
+ 7,
6
+ 13,
7
+ 17,
8
+ 23,
9
+ 31,
10
+ 45,
11
+ 99,
12
+ 123,
13
+ 200
14
+ ],
15
+ "per_seed": [
16
+ {
17
+ "seed": 42,
18
+ "test_n_classes": 8,
19
+ "accuracy": 0.23737373737373738,
20
+ "macro_f1": 0.22437482872901052,
21
+ "macro_roc_auc_ovr": 0.6837125710196055
22
+ },
23
+ {
24
+ "seed": 7,
25
+ "test_n_classes": 8,
26
+ "accuracy": 0.2222222222222222,
27
+ "macro_f1": 0.2093010862619929,
28
+ "macro_roc_auc_ovr": 0.6598529124901316
29
+ },
30
+ {
31
+ "seed": 13,
32
+ "test_n_classes": 8,
33
+ "accuracy": 0.2398989898989899,
34
+ "macro_f1": 0.2307013362941505,
35
+ "macro_roc_auc_ovr": 0.6859754559014113
36
+ },
37
+ {
38
+ "seed": 17,
39
+ "test_n_classes": 8,
40
+ "accuracy": 0.2828282828282828,
41
+ "macro_f1": 0.2641998881222478,
42
+ "macro_roc_auc_ovr": 0.7001133264273626
43
+ },
44
+ {
45
+ "seed": 23,
46
+ "test_n_classes": 8,
47
+ "accuracy": 0.22474747474747475,
48
+ "macro_f1": 0.20938909311730927,
49
+ "macro_roc_auc_ovr": 0.6952258894131303
50
+ },
51
+ {
52
+ "seed": 31,
53
+ "test_n_classes": 8,
54
+ "accuracy": 0.25252525252525254,
55
+ "macro_f1": 0.23228517698591994,
56
+ "macro_roc_auc_ovr": 0.6868917272897719
57
+ },
58
+ {
59
+ "seed": 45,
60
+ "test_n_classes": 8,
61
+ "accuracy": 0.2601010101010101,
62
+ "macro_f1": 0.23328085381091487,
63
+ "macro_roc_auc_ovr": 0.6955734168438206
64
+ },
65
+ {
66
+ "seed": 99,
67
+ "test_n_classes": 8,
68
+ "accuracy": 0.21717171717171718,
69
+ "macro_f1": 0.2064102665659866,
70
+ "macro_roc_auc_ovr": 0.700000049204532
71
+ },
72
+ {
73
+ "seed": 123,
74
+ "test_n_classes": 8,
75
+ "accuracy": 0.2222222222222222,
76
+ "macro_f1": 0.20983049912880922,
77
+ "macro_roc_auc_ovr": 0.662519489088299
78
+ },
79
+ {
80
+ "seed": 200,
81
+ "test_n_classes": 8,
82
+ "accuracy": 0.2828282828282828,
83
+ "macro_f1": 0.2801905278759914,
84
+ "macro_roc_auc_ovr": 0.6954305041778505
85
+ }
86
+ ],
87
+ "aggregate": {
88
+ "accuracy_mean": 0.2441919191919192,
89
+ "accuracy_std": 0.023337760304165702,
90
+ "accuracy_min": 0.21717171717171718,
91
+ "accuracy_max": 0.2828282828282828,
92
+ "macro_f1_mean": 0.22999635568923332,
93
+ "macro_f1_std": 0.023565611735295866,
94
+ "roc_auc_mean": 0.6865295341855916,
95
+ "roc_auc_std": 0.013780848086567432
96
+ },
97
+ "published_artifact_seed": 42
98
+ }
validation_results.json ADDED
@@ -0,0 +1,290 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0.0",
3
+ "dataset": "xpertsystems/cyb009-sample",
4
+ "task": "8-class vulnerability_class classification (CWE-style families)",
5
+ "baselines": {
6
+ "always_predict_majority_accuracy": 0.17676767676767677,
7
+ "majority_class": "memory_corruption",
8
+ "random_guess_accuracy": 0.125
9
+ },
10
+ "split": {
11
+ "strategy": "stratified (StratifiedShuffleSplit, nested 70/15/15)",
12
+ "rationale": "Per-vulnerability task (n=2638), one row per vuln. Stratified random splitting preserves class distribution. No row-correlation structure to leak.",
13
+ "vulns_train": 1846,
14
+ "vulns_val": 396,
15
+ "vulns_test": 396,
16
+ "seed": 42
17
+ },
18
+ "n_features": 57,
19
+ "label_classes": [
20
+ "auth_access_control",
21
+ "cryptographic_failure",
22
+ "information_disclosure",
23
+ "injection_family",
24
+ "logic_flaw",
25
+ "memory_corruption",
26
+ "misconfiguration",
27
+ "supply_chain_weakness"
28
+ ],
29
+ "class_distribution_train": {
30
+ "memory_corruption": 325,
31
+ "injection_family": 305,
32
+ "misconfiguration": 305,
33
+ "auth_access_control": 245,
34
+ "cryptographic_failure": 211,
35
+ "supply_chain_weakness": 189,
36
+ "logic_flaw": 160,
37
+ "information_disclosure": 106
38
+ },
39
+ "class_distribution_test": {
40
+ "memory_corruption": 70,
41
+ "misconfiguration": 65,
42
+ "injection_family": 65,
43
+ "auth_access_control": 53,
44
+ "cryptographic_failure": 45,
45
+ "supply_chain_weakness": 41,
46
+ "logic_flaw": 34,
47
+ "information_disclosure": 23
48
+ },
49
+ "outcome_leak_excluded_features": [
50
+ "exploit_maturity_final (indirect leak via CVSS temporal multiplier)",
51
+ "cvss_temporal_score_final (near-deterministic per exploit_maturity_final tier)",
52
+ "time_to_exploit_days (sentinel -1 / positive)",
53
+ "time_to_remediate_days (sentinel 120 / lower)",
54
+ "patch_lag_days (suspected similar sentinel - precaution)",
55
+ "risk_score_composite (computed from flag fields - precaution)"
56
+ ],
57
+ "leakage_audit_note": "CYB009 has the most pervasive structural leakage of any SKU in the XpertSystems catalog. See leakage_diagnostic.json for the full 8-oracle-path audit. Six of the README's headline use cases are unlearnable on the sample after honest leak removal; vulnerability_class is the only viable target and gives the catalog's weakest baseline by design.",
58
+ "models": {
59
+ "xgboost": {
60
+ "architecture": "Gradient-boosted decision trees, multi:softprob, 8 classes",
61
+ "framework": "xgboost",
62
+ "test_metrics": {
63
+ "model": "xgboost",
64
+ "accuracy": 0.23737373737373738,
65
+ "macro_f1": 0.22437482872901052,
66
+ "weighted_f1": 0.23213786276177156,
67
+ "per_class_f1": {
68
+ "auth_access_control": 0.14583333333333334,
69
+ "cryptographic_failure": 0.21686746987951808,
70
+ "information_disclosure": 0.2909090909090909,
71
+ "injection_family": 0.23728813559322035,
72
+ "logic_flaw": 0.08955223880597014,
73
+ "memory_corruption": 0.3333333333333333,
74
+ "misconfiguration": 0.2589928057553957,
75
+ "supply_chain_weakness": 0.2222222222222222
76
+ },
77
+ "confusion_matrix": {
78
+ "labels": [
79
+ "auth_access_control",
80
+ "cryptographic_failure",
81
+ "information_disclosure",
82
+ "injection_family",
83
+ "logic_flaw",
84
+ "memory_corruption",
85
+ "misconfiguration",
86
+ "supply_chain_weakness"
87
+ ],
88
+ "matrix": [
89
+ [
90
+ 7,
91
+ 7,
92
+ 0,
93
+ 11,
94
+ 6,
95
+ 10,
96
+ 7,
97
+ 5
98
+ ],
99
+ [
100
+ 4,
101
+ 9,
102
+ 3,
103
+ 5,
104
+ 3,
105
+ 5,
106
+ 16,
107
+ 0
108
+ ],
109
+ [
110
+ 3,
111
+ 0,
112
+ 8,
113
+ 1,
114
+ 4,
115
+ 0,
116
+ 7,
117
+ 0
118
+ ],
119
+ [
120
+ 3,
121
+ 6,
122
+ 1,
123
+ 14,
124
+ 8,
125
+ 20,
126
+ 6,
127
+ 7
128
+ ],
129
+ [
130
+ 4,
131
+ 4,
132
+ 5,
133
+ 3,
134
+ 3,
135
+ 2,
136
+ 13,
137
+ 0
138
+ ],
139
+ [
140
+ 11,
141
+ 3,
142
+ 0,
143
+ 13,
144
+ 3,
145
+ 27,
146
+ 5,
147
+ 8
148
+ ],
149
+ [
150
+ 6,
151
+ 9,
152
+ 15,
153
+ 2,
154
+ 5,
155
+ 7,
156
+ 18,
157
+ 3
158
+ ],
159
+ [
160
+ 5,
161
+ 0,
162
+ 0,
163
+ 4,
164
+ 1,
165
+ 21,
166
+ 2,
167
+ 8
168
+ ]
169
+ ]
170
+ },
171
+ "macro_roc_auc_ovr": 0.6837125710196055
172
+ }
173
+ },
174
+ "mlp": {
175
+ "architecture": "PyTorch MLP, 57 -> 128 -> 64 -> 8, BatchNorm1d + ReLU + Dropout, weighted cross-entropy loss",
176
+ "framework": "pytorch",
177
+ "test_metrics": {
178
+ "model": "mlp",
179
+ "accuracy": 0.23232323232323232,
180
+ "macro_f1": 0.22092024769409177,
181
+ "weighted_f1": 0.22940625794114217,
182
+ "per_class_f1": {
183
+ "auth_access_control": 0.16279069767441862,
184
+ "cryptographic_failure": 0.16842105263157894,
185
+ "information_disclosure": 0.15384615384615385,
186
+ "injection_family": 0.23529411764705882,
187
+ "logic_flaw": 0.22784810126582278,
188
+ "memory_corruption": 0.36486486486486486,
189
+ "misconfiguration": 0.16216216216216217,
190
+ "supply_chain_weakness": 0.29213483146067415
191
+ },
192
+ "confusion_matrix": {
193
+ "labels": [
194
+ "auth_access_control",
195
+ "cryptographic_failure",
196
+ "information_disclosure",
197
+ "injection_family",
198
+ "logic_flaw",
199
+ "memory_corruption",
200
+ "misconfiguration",
201
+ "supply_chain_weakness"
202
+ ],
203
+ "matrix": [
204
+ [
205
+ 7,
206
+ 8,
207
+ 1,
208
+ 12,
209
+ 6,
210
+ 12,
211
+ 4,
212
+ 3
213
+ ],
214
+ [
215
+ 5,
216
+ 8,
217
+ 4,
218
+ 3,
219
+ 5,
220
+ 5,
221
+ 14,
222
+ 1
223
+ ],
224
+ [
225
+ 1,
226
+ 3,
227
+ 5,
228
+ 2,
229
+ 5,
230
+ 1,
231
+ 6,
232
+ 0
233
+ ],
234
+ [
235
+ 3,
236
+ 7,
237
+ 3,
238
+ 14,
239
+ 6,
240
+ 17,
241
+ 2,
242
+ 13
243
+ ],
244
+ [
245
+ 1,
246
+ 5,
247
+ 9,
248
+ 3,
249
+ 9,
250
+ 1,
251
+ 6,
252
+ 0
253
+ ],
254
+ [
255
+ 8,
256
+ 7,
257
+ 0,
258
+ 9,
259
+ 3,
260
+ 27,
261
+ 2,
262
+ 14
263
+ ],
264
+ [
265
+ 3,
266
+ 10,
267
+ 20,
268
+ 5,
269
+ 10,
270
+ 4,
271
+ 9,
272
+ 4
273
+ ],
274
+ [
275
+ 5,
276
+ 2,
277
+ 0,
278
+ 6,
279
+ 1,
280
+ 11,
281
+ 3,
282
+ 13
283
+ ]
284
+ ]
285
+ },
286
+ "macro_roc_auc_ovr": 0.6899177016524518
287
+ }
288
+ }
289
+ }
290
+ }