Initial release: vulnerability_class baseline + comprehensive 8-oracle-path leakage diagnostic on CYB009 sample

Browse files

Files changed (11) hide show

README.md +511 -0
ablation_results.json +818 -0
feature_engineering.py +401 -0
feature_meta.json +160 -0
feature_scaler.json +1 -0
inference_example.ipynb +345 -0
leakage_diagnostic.json +218 -0
model_mlp.safetensors +3 -0
model_xgb.json +0 -0
multi_seed_results.json +98 -0
validation_results.json +290 -0

README.md ADDED Viewed

	@@ -0,0 +1,511 @@

+---
+license: cc-by-nc-4.0
+library_name: pytorch
+tags:
+  - cybersecurity
+  - vulnerability-management
+  - cve
+  - cvss
+  - epss
+  - cisa-kev
+  - tabular-classification
+  - synthetic-data
+  - xgboost
+  - baseline
+  - leakage-diagnostic
+  - data-quality-audit
+pipeline_tag: tabular-classification
+base_model: []
+datasets:
+  - xpertsystems/cyb009-sample
+metrics:
+  - accuracy
+  - f1
+  - roc_auc
+model-index:
+  - name: cyb009-baseline-classifier
+    results:
+      - task:
+          type: tabular-classification
+          name: 8-class vulnerability classification (CWE-style families)
+        dataset:
+          type: xpertsystems/cyb009-sample
+          name: CYB009 Synthetic Vulnerability Intelligence Dataset (Sample)
+        metrics:
+          - type: roc_auc
+            value: 0.6837
+            name: Test macro ROC-AUC OvR (XGBoost, seed 42)
+          - type: accuracy
+            value: 0.2374
+            name: Test accuracy (XGBoost, seed 42)
+          - type: f1
+            value: 0.2244
+            name: Test macro-F1 (XGBoost, seed 42)
+          - type: accuracy
+            value: 0.244
+            name: Multi-seed accuracy mean ± 0.023 (XGBoost, 10 seeds)
+          - type: roc_auc
+            value: 0.687
+            name: Multi-seed ROC-AUC mean ± 0.014 (XGBoost, 10 seeds)
+---
+# CYB009 Baseline Classifier
+**Vulnerability classification baseline (8-class) trained on the CYB009
+synthetic vulnerability intelligence sample. The primary artifact value
+of this repo is `leakage_diagnostic.json` — the most comprehensive
+structural-leakage audit in the XpertSystems baseline catalog,
+documenting 8 oracle paths and 6 unlearnable README-suggested targets.
+The classifier itself is the catalog's weakest baseline by design (acc
+0.244 vs majority 0.176), included to show that vulnerability_class is
+the ONLY README-headline target that learns honestly on this sample.**
+> **Read this first.** This repo ships three artifacts in priority
+> order:
+> 1. **`leakage_diagnostic.json`** — comprehensive audit of 8 oracle
+>    paths discovered on CYB009 and 6 README-suggested targets that
+>    are unlearnable on the sample after honest leak removal.
+> 2. A working classifier for `vulnerability_class` 8-class — the
+>    only README target that learns honestly on this sample, and the
+>    weakest baseline in the XpertSystems catalog by design.
+> 3. A feature engineering reference (`feature_engineering.py`).
+>
+> If you came here looking for a strong baseline, you will be
+> disappointed. If you came here to understand why the CYB009 sample
+> has hard-to-detect structural label-feature determinism, the
+> diagnostic is exactly the artifact you need.
+## Model overview
+| Property | Value |
+|---|---|
+| Primary task | 8-class `vulnerability_class` classification (CWE-style families) |
+| Primary artifact | **`leakage_diagnostic.json`** — 8 oracle paths + 6 unlearnable targets |
+| Training data | `xpertsystems/cyb009-sample` (2,638 vulnerabilities) |
+| Models | XGBoost + PyTorch MLP |
+| Input features | 57 (after one-hot encoding) |
+| Split | Stratified random (per-vulnerability, no group structure to leak) |
+| Validation | Single seed (artifact) + multi-seed aggregate across 10 seeds |
+| License | CC-BY-NC-4.0 (matches dataset) |
+| Status | Reference baseline + comprehensive leakage diagnostic |
+## Why this task — and the journey to get here
+The CYB009 README lists 11 suggested use cases. We piloted every
+README-headline target and found pervasive structural leakage. The
+abandoned candidates, in order of how we discovered them:
+### Initial candidate: `exploit_maturity_final` 4-class (ABANDONED)
+The most natural target — 4-class (unproven/PoC/functional/weaponised),
+n=2638 well-balanced (36/27/25/12%), maps directly to EPSS calibration.
+Initial feasibility hit **acc 0.74, macro-F1 0.72, ROC-AUC 0.91 vs
+majority 0.36** — a +38pp lift looked excellent.
+**Then we found the leak.** `cvss_temporal_score_final` divided by
+`cvss_base_score` clusters near-deterministically per maturity tier:
+| Maturity tier | Observed ratio (median ± std) | CVSS v3.1 multiplier |
+|---|---:|---:|
+| unproven | 0.801 ± 0.011 | 0.91 × (other Temporal factors) |
+| proof_of_concept | 0.827 ± 0.011 | 0.94 × (other Temporal factors) |
+| functional | 0.854 ± 0.011 | 0.97 × (other Temporal factors) |
+| weaponised | 0.880 ± 0.012 | 1.00 × (other Temporal factors) |
+This is exactly the CVSS v3.1 Exploit Code Maturity multiplier
+(unproven 0.91 / PoC 0.94 / functional 0.97 / high or weaponised 1.00),
+combined with other near-constant Temporal factors (Remediation Level,
+Report Confidence). **The cvss_temporal/cvss_base ratio uniquely
+identifies the maturity tier.**
+Drop `cvss_temporal_score_final` → accuracy collapses to **0.31**
+(below majority 0.36). The target is structurally unlearnable on the
+sample once the oracle is removed.
+### Other 5 candidates: also unlearnable after honest leak removal
+| Target | n_positive | Maj baseline | Honest acc | Honest AUC | Verdict |
+|---|---:|---:|---:|---:|---|
+| `exploitation_occurred_flag` | 203 | 0.923 | 0.857 | 0.65 | Below majority |
+| `zero_day_flag` | 76 | 0.971 | 0.949 | 0.60 | Below majority |
+| `cisa_kev_flag` | 14 | 0.995 | 0.992 | 0.61 | Below majority |
+| `supply_chain_propagation_flag` | 20 | 0.992 | 0.992 | 0.80 | Below majority |
+| `false_positive_flag` | 205 | 0.922 | 0.866 | 0.52 | Below majority |
+All five rare-event binaries are oracled by `time_to_exploit_days`
+(-1 sentinel) or `time_to_remediate_days` (120 sentinel) at full
+features; after honest leak removal, all are at-or-below majority.
+### Per-timestep multi-class targets: state-machine oracles
+`lifecycle_phase`, `patch_status`, and `remediation_status` on
+`vulnerability_records.csv` form a tightly-coupled state machine:
+- `lifecycle_phase = residual_risk_review` → 100% `remediated`
+- `lifecycle_phase = discovery` → 100% `undetected`
+- `lifecycle_phase = remediation_deployment` → 100% `in_remediation`
+- `patch_status = deployed` → 100% `remediated`
+Naive evaluation on these targets reaches accuracy 0.95-0.98, but any
+two of the three deterministically pin the third. None of these is a
+viable independent ML target on the sample.
+### `severity_class`: 100% mechanical CVSS function
+Observed `cvss_base_score` ranges per severity match CVSS v3.1 exactly:
+critical [9.0, 10.0], high [7.0, 9.0], medium [4.0, 7.0], low [1.8, 4.0].
+Predicting severity is trivial with CVSS; below majority (acc 0.55 vs
+0.51) without it.
+### `vulnerability_class` 8-class: the only honest target — and the baseline ships
+After exhausting the README-suggested targets, `vulnerability_class`
+is the only one that learns honestly:
+- **acc 0.244 ± 0.023, macro-F1 0.230 ± 0.024, ROC-AUC 0.687 ± 0.014**
+- **+7pp lift over majority** (the catalog's smallest)
+- **All 8 classes represented** (per-class F1 0.09-0.33)
+- **No oracle feature** — modest signal genuinely spread across CVSS,
+  EPSS, asset context, and binary flags
+This is the **weakest baseline in the XpertSystems catalog by design**.
+The full ~487k-row product would tighten per-class signal materially.
+The dataset roadmap recommendations in `leakage_diagnostic.json`
+describe what would make CYB009's headline targets viable on the
+sample.
+## Quick start
+```bash
+pip install xgboost torch safetensors pandas huggingface_hub
+```
+```python
+from huggingface_hub import hf_hub_download, snapshot_download
+import json, numpy as np, torch, xgboost as xgb
+from safetensors.torch import load_file
+REPO = "xpertsystems/cyb009-baseline-classifier"
+paths = {n: hf_hub_download(REPO, n) for n in [
+    "model_xgb.json", "model_mlp.safetensors",
+    "feature_engineering.py", "feature_meta.json", "feature_scaler.json",
+]}
+import sys, os
+sys.path.insert(0, os.path.dirname(paths["feature_engineering.py"]))
+from feature_engineering import (
+    transform_single, load_meta, build_asset_lookup, INT_TO_LABEL,
+)
+meta = load_meta(paths["feature_meta.json"])
+# Asset features are joined from asset_inventory.csv at inference time
+ds = snapshot_download("xpertsystems/cyb009-sample", repo_type="dataset")
+asset_lookup = build_asset_lookup(f"{ds}/asset_inventory.csv")
+xgb_model = xgb.XGBClassifier(); xgb_model.load_model(paths["model_xgb.json"])
+# Predict (see inference_example.ipynb for the full pattern)
+# Note: do NOT include exploit_maturity_final, cvss_temporal_score_final,
+# time_to_exploit_days, time_to_remediate_days, patch_lag_days, or
+# risk_score_composite - those were the outcome-leak columns.
+X = transform_single(my_vuln_record, meta, asset_lookup=asset_lookup)
+proba = xgb_model.predict_proba(X)[0]
+print(INT_TO_LABEL[int(np.argmax(proba))])
+```
+See [`inference_example.ipynb`](./inference_example.ipynb) for the full
+copy-paste demo.
+## Training data
+Trained on the public sample of CYB009, 2,638 per-vulnerability records:
+| Vulnerability class | Vulns | Class share |
+|---|---:|---:|
+| `memory_corruption` | 465 | 17.6% |
+| `injection_family` | 436 | 16.5% |
+| `misconfiguration` | 435 | 16.5% |
+| `auth_access_control` | 350 | 13.3% |
+| `cryptographic_failure` | 301 | 11.4% |
+| `supply_chain_weakness` | 271 | 10.3% |
+| `logic_flaw` | 228 | 8.6% |
+| `information_disclosure` | 152 | 5.8% |
+### Stratified split
+Per-vulnerability task (one row per vuln in `vuln_summary.csv`),
+**StratifiedShuffleSplit** nested 70/15/15:
+| Fold | Vulns |
+|---|---:|
+| Train | 1,846 |
+| Validation | 396 |
+| Test | 396 |
+Class imbalance addressed with `class_weight='balanced'` (XGBoost
+`sample_weight`) and weighted cross-entropy (MLP).
+## Feature pipeline
+The bundled `feature_engineering.py` is the canonical recipe. 57
+features survive after encoding, drawn from:
+- **Per-vulnerability numeric** (10): `cvss_base_score`,
+  `epss_score_final`, plus 8 binary post-hoc flags
+- **Per-vulnerability categorical** (1, one-hot): `severity_class`
+  (4 values, CVSS-derived but useful as feature)
+- **Asset features** (joined from `asset_inventory.csv`): 8 numeric
+  + 4 categorical (asset_type, criticality_tier, environment_type,
+  os_family)
+- **Engineered** (5): `log_epss`, `is_high_cvss`,
+  `exposure_severity_composite`, `risk_flag_count`, `epss_x_base`
+### Excluded columns (outcome leaks)
+| Column | Why excluded |
+|---|---|
+| `exploit_maturity_final` | Indirect leak via CVSS temporal multiplier (would reintroduce the 0.91/0.94/0.97/1.00 oracle) |
+| `cvss_temporal_score_final` | Near-deterministic per `exploit_maturity_final` tier (the primary leak we discovered) |
+| `time_to_exploit_days` | -1 sentinel oracle for `exploitation_occurred_flag` |
+| `time_to_remediate_days` | 120 sentinel oracle for `remediation_success_flag` |
+| `patch_lag_days` | Suspected similar sentinel (precaution) |
+| `risk_score_composite` | Computed from flag fields (indirect oracle) |
+## Evaluation
+### Test-set metrics, seed 42 (n = 396 vulnerabilities)
+**XGBoost** (the published `model_xgb.json` artifact)
+| Metric | Value |
+|---|---:|
+| Macro ROC-AUC (OvR) | **0.6837** |
+| Accuracy | **0.2374** |
+| Macro-F1 | 0.2244 |
+| Weighted-F1 | 0.2407 |
+**MLP** (the published `model_mlp.safetensors` artifact)
+| Metric | Value |
+|---|---:|
+| Macro ROC-AUC (OvR) | **0.6899** |
+| Accuracy | **0.2323** |
+| Macro-F1 | 0.2209 |
+| Weighted-F1 | 0.2362 |
+MLP and XGBoost are within noise of each other on this task — both
+are publishing the same modest honest signal.
+### Multi-seed robustness (XGBoost, 10 seeds)
+| Metric | Mean | Std | Min | Max |
+|---|---:|---:|---:|---:|
+| Accuracy | 0.244 | 0.023 | 0.217 | 0.283 |
+| Macro-F1 | 0.230 | 0.024 | 0.206 | 0.280 |
+| Macro ROC-AUC OvR | 0.687 | 0.014 | 0.660 | 0.700 |
+All 10 seeds yielded all 8 classes in the test fold (stratified split
+guarantees this). Full per-seed results in
+[`multi_seed_results.json`](./multi_seed_results.json).
+### Per-class F1 (seed 42)
+| Vulnerability class | Class share | XGBoost F1 | MLP F1 |
+|---|---:|---:|---:|
+| `memory_corruption` | 17.6% | **0.333** | 0.365 |
+| `information_disclosure` | 5.8% | 0.291 | 0.154 |
+| `misconfiguration` | 16.5% | 0.259 | 0.162 |
+| `injection_family` | 16.5% | 0.237 | 0.235 |
+| `supply_chain_weakness` | 10.3% | 0.222 | 0.292 |
+| `cryptographic_failure` | 11.4% | 0.217 | 0.168 |
+| `auth_access_control` | 13.3% | 0.146 | 0.163 |
+| `logic_flaw` | 8.6% | **0.090** | 0.228 |
+`memory_corruption` (highest mean CVSS at 8.3) and
+`information_disclosure` (lowest mean CVSS at 5.4) are the most
+distinctive classes. `logic_flaw` is the hardest — its feature
+distribution overlaps closely with everything else.
+### Ablation: which feature groups matter
+| Configuration | Accuracy | Macro-F1 | ROC-AUC | Δ accuracy |
+|---|---:|---:|---:|---:|
+| Full feature set (published) | 0.2374 | 0.2244 | 0.6837 | — |
+| No CVSS features | 0.2121 | 0.1926 | 0.6690 | **−0.0253** |
+| No asset features | 0.2172 | 0.1967 | 0.6870 | −0.0202 |
+| No engineered features | 0.2323 | 0.2216 | 0.6871 | −0.0051 |
+| No severity (one-hot) | 0.2273 | 0.2175 | 0.6857 | −0.0101 |
+| No EPSS features | 0.2475 | 0.2237 | 0.6926 | +0.0101 |
+| No binary flags | 0.2273 | 0.2114 | 0.6776 | −0.0101 |
+Three findings:
+1. **No feature group is dominant.** Largest single drop is 2.5pp
+   (CVSS features). Every group contributes a little; nothing
+   contributes a lot. The signal is genuinely diffuse.
+2. **CVSS and asset features carry the most signal** (~2pp each),
+   consistent with the observation that per-class CVSS means
+   differ (5.4 to 8.3) and asset features modestly inform class.
+3. **EPSS features slightly *hurt*** on this task (+1pp without
+   them). EPSS is intended for exploitation prediction, not class
+   prediction; on this sample it acts as small additional noise.
+### Architecture
+**XGBoost:** multi-class gradient boosting (`multi:softprob`, 8 classes),
+`hist` tree method, class-balanced sample weights, early stopping on
+validation mlogloss.
+**MLP:** `57 → 128 → 64 → 8`, each hidden layer followed by
+`BatchNorm1d` → `ReLU` → `Dropout(0.3)`, weighted cross-entropy loss,
+AdamW optimizer, early stopping on validation macro-F1.
+Training hyperparameters are held internally by XpertSystems.
+## Limitations
+**This is a baseline reference, not a production vulnerability
+classifier.**
+1. **The headline finding is the leakage diagnostic, not the
+   classifier.** Read `leakage_diagnostic.json` first. The classifier
+   demonstrates that vulnerability_class is the only README-suggested
+   target that learns honestly on the sample.
+2. **Per-class F1 ranges 0.09–0.33.** The model is more confident on
+   memory_corruption and information_disclosure than on logic_flaw
+   and auth_access_control. For production use, expect different
+   error patterns by class.
+3. **No feature group contributes more than 3pp accuracy.** The
+   model has no single decisive signal; instead it integrates many
+   weakly-informative features. Removing any one group has minimal
+   impact.
+4. **Synthetic-vs-real transfer.** The dataset is synthetic, calibrated
+   to 12 benchmarks from authoritative vulnerability intelligence
+   sources (NIST NVD, EPSS v3, CISA KEV, Mandiant, Verizon DBIR,
+   Rapid7, Qualys, Tenable). Real vulnerability telemetry has
+   different noise characteristics — in particular, the
+   structural-oracle patterns documented in
+   `leakage_diagnostic.json` (CVSS temporal multipliers,
+   sentinel-coded time fields, lifecycle state-machine determinism)
+   would not be present in real data with comparable density. Real
+   data has stochastic transitions and observation noise.
+5. **2,638 vulnerabilities is a modest training set for 8 classes.**
+   The 396-vulnerability test fold yields stable multi-seed metrics
+   (std 0.023) but per-class confidence intervals are wide. The full
+   ~487k-row product has materially more data per class.
+## Notes on dataset schema
+The CYB009 sample dataset README describes some fields differently
+from the actual schema. This note helps buyers reconcile what they
+read with what they receive.
+| What the README says | What the data actually contains |
+|---|---|
+| `vulnerability_records` has 19 columns | Data has **16 columns** |
+| `vulnerability_records` includes `severity`, `exploited_in_wild_flag`, `cisa_kev_listed_flag`, `zero_day_flag`, `supply_chain_flag`, `internet_exposed`, `sla_breached_flag` | **None of these columns exist** in vulnerability_records. Per-vuln flags are only on vuln_summary. |
+| `vuln_class` has 10 values (incl. `race_condition`, `web_application`, `configuration`) | **8 values** in the data; differs in: `misconfiguration` (not `configuration`), `auth_access_control` (not `authentication_bypass`), `logic_flaw` (new); no `race_condition`, no `web_application`, no `deserialization` |
+| 8 lifecycle phases | **12 phases** in the data, adding `residual_risk_review` (45% of all rows), `false_positive_closed`, `sla_breach`, `accepted_risk`, `discovery`, `organisational_triage`, `exploitation_in_wild` |
+| `patch_status` has 4 values | **6 values** in the data: adds `vendor_notified`, `patch_in_development`, `patch_validated` |
+| `severity` has 5 values (incl. `none`) | **4 values** in the data (`severity_class`): low, medium, high, critical only |
+| `vuln_summary` has 15 columns | Data has **21 columns** |
+| Field renames | `severity_final` → `severity_class`; `cvss_base_score_final` → `cvss_base_score`; `cisa_kev_listed` → `cisa_kev_flag`; `exploited_in_wild` → `exploitation_occurred_flag`; `supply_chain_compromise` → `supply_chain_propagation_flag` |
+| Semantic inversion | README's `sla_breached` (True = bad) ↔ data's `sla_compliance_flag` (True = good) |
+| `remediation_outcome` categorical (patched/mitigated/accepted/unpatched) | Replaced with `remediation_success_flag` (binary) plus per-timestep `remediation_status` |
+| Not in README | New fields: `risk_score_composite`, `compensating_control_flag`, `time_to_exploit_days`, `time_to_remediate_days`, `patch_lag_days` |
+None of these affects model correctness — the feature pipeline uses
+the actual column names. If you build your own pipeline against the
+dataset, use the actual columns.
+## Intended use
+- **Reading the leakage diagnostic** — the primary value of this repo.
+  Reusable methodology for any synthetic vulnerability dataset.
+- **Evaluating fit** of the CYB009 dataset for your research, with
+  open knowledge of the structural-oracle patterns
+- **Honest baseline reference** for the only README-suggested target
+  that learns on the sample
+- **Feature engineering reference** for per-vulnerability ML
+## Out-of-scope use
+- **Production vulnerability triage** on real telemetry
+- **Exploit maturity prediction** — README headline target,
+  unlearnable on the sample after honest leak removal
+- **Zero-day / KEV / supply-chain prediction** — README headline
+  targets, unlearnable as rare-event binaries on the sample
+- **SLA breach prediction** — README headline target, unlearnable
+  after honest leak removal
+- Any operational security decision without further validation on
+  real data
+## Reproducibility
+Outputs above were produced with `seed = 42` (published artifact),
+nested `StratifiedShuffleSplit` (70/15/15), on the published sample
+(`xpertsystems/cyb009-sample`, version 1.0.0, generated 2026-05-16).
+The feature pipeline in `feature_engineering.py` is deterministic and
+the trained weights in this repo correspond exactly to the metrics
+above.
+Multi-seed results (seeds 42, 7, 13, 17, 23, 31, 45, 99, 123, 200)
+in `multi_seed_results.json` confirm robust performance across splits
+(std 0.023 on accuracy).
+The training script itself is private to XpertSystems.
+## Files in this repo
+| File | Purpose |
+|---|---|
+| **`leakage_diagnostic.json`** | **PRIMARY ARTIFACT — 8 oracle paths + 6 unlearnable targets** |
+| `model_xgb.json` | XGBoost weights (seed 42) |
+| `model_mlp.safetensors` | PyTorch MLP weights (seed 42) |
+| `feature_engineering.py` | Feature pipeline |
+| `feature_meta.json` | Feature column order + categorical levels |
+| `feature_scaler.json` | MLP input mean/std (XGBoost ignores) |
+| `validation_results.json` | Per-class metrics, confusion matrix, architecture |
+| `ablation_results.json` | Per-feature-group ablation |
+| `multi_seed_results.json` | XGBoost metrics across 10 seeds |
+| `inference_example.ipynb` | End-to-end inference demo notebook |
+| `README.md` | This file |
+## Contact and full product
+The full **CYB009** dataset contains **~487,000 vulnerability records**
+across four files, with calibrated benchmark validation against 12
+metrics drawn from authoritative vulnerability intelligence sources
+(NIST NVD, EPSS v3, CISA KEV, Mandiant, Verizon DBIR, Rapid7, Qualys,
+Tenable). The full XpertSystems.ai synthetic data catalogue spans 41
+SKUs across Cybersecurity, Healthcare, Insurance & Risk, Oil & Gas,
+and Materials & Energy.
+- 📧 **pradeep@xpertsystems.ai**
+- 🌐 **https://xpertsystems.ai**
+- 🗂  Dataset: https://huggingface.co/datasets/xpertsystems/cyb009-sample
+- 🤖 Companion models:
+  - https://huggingface.co/xpertsystems/cyb001-baseline-classifier (network traffic)
+  - https://huggingface.co/xpertsystems/cyb002-baseline-classifier (ATT&CK kill-chain)
+  - https://huggingface.co/xpertsystems/cyb003-baseline-classifier (malware execution phase)
+  - https://huggingface.co/xpertsystems/cyb004-baseline-classifier (phishing campaign phase)
+  - https://huggingface.co/xpertsystems/cyb005-baseline-classifier (ransomware actor-tier attribution)
+  - https://huggingface.co/xpertsystems/cyb006-baseline-classifier (user risk tier + leakage diagnostic)
+  - https://huggingface.co/xpertsystems/cyb007-baseline-classifier (insider threat type)
+  - https://huggingface.co/xpertsystems/cyb008-baseline-classifier (SOC alert triage + leakage diagnostic)
+## Citation
+```bibtex
+@misc{xpertsystems_cyb009_baseline_2026,
+  title  = {CYB009 Baseline Classifier: XGBoost and MLP for Vulnerability Classification, with the XpertSystems Catalog's Most Comprehensive Structural-Leakage Audit},
+  author = {XpertSystems.ai},
+  year   = {2026},
+  url    = {https://huggingface.co/xpertsystems/cyb009-baseline-classifier},
+  note   = {Reference baseline + 8-oracle-path leakage diagnostic on xpertsystems/cyb009-sample}
+}
+```

ablation_results.json ADDED Viewed

	@@ -0,0 +1,818 @@

+{
+  "purpose": "Quantify how much each feature group contributes to the honest XGBoost score. Identical architecture, same stratified split, with one feature group dropped at a time.",
+  "full_model_metrics": {
+    "model": "xgboost",
+    "accuracy": 0.23737373737373738,
+    "macro_f1": 0.22437482872901052,
+    "weighted_f1": 0.23213786276177156,
+    "per_class_f1": {
+      "auth_access_control": 0.14583333333333334,
+      "cryptographic_failure": 0.21686746987951808,
+      "information_disclosure": 0.2909090909090909,
+      "injection_family": 0.23728813559322035,
+      "logic_flaw": 0.08955223880597014,
+      "memory_corruption": 0.3333333333333333,
+      "misconfiguration": 0.2589928057553957,
+      "supply_chain_weakness": 0.2222222222222222
+    },
+    "confusion_matrix": {
+      "labels": [
+        "auth_access_control",
+        "cryptographic_failure",
+        "information_disclosure",
+        "injection_family",
+        "logic_flaw",
+        "memory_corruption",
+        "misconfiguration",
+        "supply_chain_weakness"
+      ],
+      "matrix": [
+        [
+          7,
+          7,
+          0,
+          11,
+          6,
+          10,
+          7,
+          5
+        ],
+        [
+          4,
+          9,
+          3,
+          5,
+          3,
+          5,
+          16,
+          0
+        ],
+        [
+          3,
+          0,
+          8,
+          1,
+          4,
+          0,
+          7,
+          0
+        ],
+        [
+          3,
+          6,
+          1,
+          14,
+          8,
+          20,
+          6,
+          7
+        ],
+        [
+          4,
+          4,
+          5,
+          3,
+          3,
+          2,
+          13,
+          0
+        ],
+        [
+          11,
+          3,
+          0,
+          13,
+          3,
+          27,
+          5,
+          8
+        ],
+        [
+          6,
+          9,
+          15,
+          2,
+          5,
+          7,
+          18,
+          3
+        ],
+        [
+          5,
+          0,
+          0,
+          4,
+          1,
+          21,
+          2,
+          8
+        ]
+      ]
+    },
+    "macro_roc_auc_ovr": 0.6837125710196055
+  },
+  "ablations": {
+    "no_cvss": {
+      "n_features": 55,
+      "dropped_count": 2,
+      "metrics": {
+        "model": "xgboost_no_cvss",
+        "accuracy": 0.21212121212121213,
+        "macro_f1": 0.19261691542621184,
+        "weighted_f1": 0.20621456669040633,
+        "per_class_f1": {
+          "auth_access_control": 0.14285714285714285,
+          "cryptographic_failure": 0.09523809523809523,
+          "information_disclosure": 0.14705882352941177,
+          "injection_family": 0.23728813559322035,
+          "logic_flaw": 0.16216216216216217,
+          "memory_corruption": 0.33121019108280253,
+          "misconfiguration": 0.2028985507246377,
+          "supply_chain_weakness": 0.2222222222222222
+        },
+        "confusion_matrix": {
+          "labels": [
+            "auth_access_control",
+            "cryptographic_failure",
+            "information_disclosure",
+            "injection_family",
+            "logic_flaw",
+            "memory_corruption",
+            "misconfiguration",
+            "supply_chain_weakness"
+          ],
+          "matrix": [
+            [
+              6,
+              3,
+              0,
+              13,
+              7,
+              12,
+              7,
+              5
+            ],
+            [
+              3,
+              3,
+              8,
+              3,
+              7,
+              5,
+              12,
+              4
+            ],
+            [
+              2,
+              1,
+              5,
+              0,
+              5,
+              2,
+              8,
+              0
+            ],
+            [
+              1,
+              3,
+              3,
+              14,
+              2,
+              20,
+              10,
+              12
+            ],
+            [
+              1,
+              2,
+              7,
+              2,
+              6,
+              1,
+              15,
+              0
+            ],
+            [
+              10,
+              2,
+              1,
+              13,
+              2,
+              26,
+              5,
+              11
+            ],
+            [
+              4,
+              3,
+              20,
+              3,
+              9,
+              5,
+              14,
+              7
+            ],
+            [
+              4,
+              1,
+              1,
+              5,
+              2,
+              16,
+              2,
+              10
+            ]
+          ]
+        },
+        "macro_roc_auc_ovr": 0.669002340507073
+      },
+      "delta_accuracy": 0.02525252525252525,
+      "delta_macro_f1": 0.031757913302798674
+    },
+    "no_epss": {
+      "n_features": 54,
+      "dropped_count": 3,
+      "metrics": {
+        "model": "xgboost_no_epss",
+        "accuracy": 0.2474747474747475,
+        "macro_f1": 0.2237319833172186,
+        "weighted_f1": 0.24186505327006125,
+        "per_class_f1": {
+          "auth_access_control": 0.17204301075268819,
+          "cryptographic_failure": 0.08,
+          "information_disclosure": 0.25,
+          "injection_family": 0.3089430894308943,
+          "logic_flaw": 0.11904761904761904,
+          "memory_corruption": 0.4050632911392405,
+          "misconfiguration": 0.25757575757575757,
+          "supply_chain_weakness": 0.19718309859154928
+        },
+        "confusion_matrix": {
+          "labels": [
+            "auth_access_control",
+            "cryptographic_failure",
+            "information_disclosure",
+            "injection_family",
+            "logic_flaw",
+            "memory_corruption",
+            "misconfiguration",
+            "supply_chain_weakness"
+          ],
+          "matrix": [
+            [
+              8,
+              6,
+              0,
+              12,
+              7,
+              11,
+              5,
+              4
+            ],
+            [
+              6,
+              3,
+              3,
+              5,
+              10,
+              4,
+              12,
+              2
+            ],
+            [
+              2,
+              2,
+              7,
+              2,
+              3,
+              0,
+              7,
+              0
+            ],
+            [
+              2,
+              5,
+              2,
+              19,
+              6,
+              20,
+              6,
+              5
+            ],
+            [
+              2,
+              3,
+              5,
+              2,
+              5,
+              1,
+              15,
+              1
+            ],
+            [
+              9,
+              6,
+              0,
+              10,
+              2,
+              32,
+              4,
+              7
+            ],
+            [
+              6,
+              2,
+              16,
+              1,
+              15,
+              4,
+              17,
+              4
+            ],
+            [
+              5,
+              3,
+              0,
+              7,
+              2,
+              16,
+              1,
+              7
+            ]
+          ]
+        },
+        "macro_roc_auc_ovr": 0.6925718594708064
+      },
+      "delta_accuracy": -0.01010101010101011,
+      "delta_macro_f1": 0.0006428454117919091
+    },
+    "no_flags": {
+      "n_features": 48,
+      "dropped_count": 9,
+      "metrics": {
+        "model": "xgboost_no_flags",
+        "accuracy": 0.22727272727272727,
+        "macro_f1": 0.21140688534448485,
+        "weighted_f1": 0.2214593080677342,
+        "per_class_f1": {
+          "auth_access_control": 0.13186813186813187,
+          "cryptographic_failure": 0.1686746987951807,
+          "information_disclosure": 0.3333333333333333,
+          "injection_family": 0.2764227642276423,
+          "logic_flaw": 0.08450704225352113,
+          "memory_corruption": 0.34838709677419355,
+          "misconfiguration": 0.24806201550387597,
+          "supply_chain_weakness": 0.1
+        },
+        "confusion_matrix": {
+          "labels": [
+            "auth_access_control",
+            "cryptographic_failure",
+            "information_disclosure",
+            "injection_family",
+            "logic_flaw",
+            "memory_corruption",
+            "misconfiguration",
+            "supply_chain_weakness"
+          ],
+          "matrix": [
+            [
+              6,
+              6,
+              1,
+              9,
+              5,
+              10,
+              6,
+              10
+            ],
+            [
+              5,
+              7,
+              3,
+              5,
+              5,
+              4,
+              14,
+              2
+            ],
+            [
+              3,
+              0,
+              10,
+              1,
+              4,
+              0,
+              5,
+              0
+            ],
+            [
+              3,
+              7,
+              1,
+              17,
+              7,
+              18,
+              4,
+              8
+            ],
+            [
+              3,
+              5,
+              6,
+              2,
+              3,
+              2,
+              13,
+              0
+            ],
+            [
+              8,
+              3,
+              0,
+              14,
+              3,
+              27,
+              4,
+              11
+            ],
+            [
+              4,
+              10,
+              16,
+              2,
+              7,
+              6,
+              16,
+              4
+            ],
+            [
+              6,
+              0,
+              0,
+              8,
+              3,
+              18,
+              2,
+              4
+            ]
+          ]
+        },
+        "macro_roc_auc_ovr": 0.6776398959263554
+      },
+      "delta_accuracy": 0.01010101010101011,
+      "delta_macro_f1": 0.01296794338452567
+    },
+    "no_asset": {
+      "n_features": 18,
+      "dropped_count": 39,
+      "metrics": {
+        "model": "xgboost_no_asset",
+        "accuracy": 0.21717171717171718,
+        "macro_f1": 0.19672873773465777,
+        "weighted_f1": 0.2140924517062793,
+        "per_class_f1": {
+          "auth_access_control": 0.10526315789473684,
+          "cryptographic_failure": 0.13043478260869565,
+          "information_disclosure": 0.13793103448275862,
+          "injection_family": 0.17857142857142858,
+          "logic_flaw": 0.08695652173913043,
+          "memory_corruption": 0.37333333333333335,
+          "misconfiguration": 0.26865671641791045,
+          "supply_chain_weakness": 0.2926829268292683
+        },
+        "confusion_matrix": {
+          "labels": [
+            "auth_access_control",
+            "cryptographic_failure",
+            "information_disclosure",
+            "injection_family",
+            "logic_flaw",
+            "memory_corruption",
+            "misconfiguration",
+            "supply_chain_weakness"
+          ],
+          "matrix": [
+            [
+              5,
+              6,
+              1,
+              8,
+              5,
+              16,
+              7,
+              5
+            ],
+            [
+              5,
+              6,
+              6,
+              5,
+              5,
+              3,
+              13,
+              2
+            ],
+            [
+              2,
+              2,
+              4,
+              1,
+              4,
+              1,
+              8,
+              1
+            ],
+            [
+              11,
+              7,
+              1,
+              10,
+              8,
+              15,
+              7,
+              6
+            ],
+            [
+              1,
+              6,
+              8,
+              2,
+              3,
+              1,
+              12,
+              1
+            ],
+            [
+              9,
+              9,
+              0,
+              9,
+              2,
+              28,
+              3,
+              10
+            ],
+            [
+              4,
+              10,
+              15,
+              7,
+              5,
+              2,
+              18,
+              4
+            ],
+            [
+              5,
+              1,
+              0,
+              5,
+              3,
+              14,
+              1,
+              12
+            ]
+          ]
+        },
+        "macro_roc_auc_ovr": 0.6869647093980484
+      },
+      "delta_accuracy": 0.020202020202020193,
+      "delta_macro_f1": 0.02764609099435275
+    },
+    "no_severity": {
+      "n_features": 53,
+      "dropped_count": 4,
+      "metrics": {
+        "model": "xgboost_no_severity",
+        "accuracy": 0.22727272727272727,
+        "macro_f1": 0.21747488568762768,
+        "weighted_f1": 0.2268764018926795,
+        "per_class_f1": {
+          "auth_access_control": 0.14893617021276595,
+          "cryptographic_failure": 0.19047619047619047,
+          "information_disclosure": 0.23333333333333334,
+          "injection_family": 0.288135593220339,
+          "logic_flaw": 0.12658227848101267,
+          "memory_corruption": 0.28205128205128205,
+          "misconfiguration": 0.24806201550387597,
+          "supply_chain_weakness": 0.2222222222222222
+        },
+        "confusion_matrix": {
+          "labels": [
+            "auth_access_control",
+            "cryptographic_failure",
+            "information_disclosure",
+            "injection_family",
+            "logic_flaw",
+            "memory_corruption",
+            "misconfiguration",
+            "supply_chain_weakness"
+          ],
+          "matrix": [
+            [
+              7,
+              7,
+              0,
+              9,
+              7,
+              12,
+              7,
+              4
+            ],
+            [
+              5,
+              8,
+              3,
+              2,
+              8,
+              5,
+              14,
+              0
+            ],
+            [
+              3,
+              0,
+              7,
+              1,
+              7,
+              0,
+              5,
+              0
+            ],
+            [
+              3,
+              6,
+              2,
+              17,
+              5,
+              20,
+              7,
+              5
+            ],
+            [
+              3,
+              5,
+              7,
+              3,
+              5,
+              2,
+              9,
+              0
+            ],
+            [
+              10,
+              7,
+              0,
+              13,
+              4,
+              22,
+              4,
+              10
+            ],
+            [
+              5,
+              6,
+              18,
+              2,
+              8,
+              6,
+              16,
+              4
+            ],
+            [
+              5,
+              0,
+              0,
+              6,
+              1,
+              19,
+              2,
+              8
+            ]
+          ]
+        },
+        "macro_roc_auc_ovr": 0.6857295225029008
+      },
+      "delta_accuracy": 0.01010101010101011,
+      "delta_macro_f1": 0.006899943041382833
+    },
+    "no_engineered": {
+      "n_features": 52,
+      "dropped_count": 5,
+      "metrics": {
+        "model": "xgboost_no_engineered",
+        "accuracy": 0.23232323232323232,
+        "macro_f1": 0.22158389829583944,
+        "weighted_f1": 0.22713804092389037,
+        "per_class_f1": {
+          "auth_access_control": 0.15053763440860216,
+          "cryptographic_failure": 0.14285714285714285,
+          "information_disclosure": 0.3157894736842105,
+          "injection_family": 0.23931623931623933,
+          "logic_flaw": 0.12987012987012986,
+          "memory_corruption": 0.345679012345679,
+          "misconfiguration": 0.23809523809523808,
+          "supply_chain_weakness": 0.21052631578947367
+        },
+        "confusion_matrix": {
+          "labels": [
+            "auth_access_control",
+            "cryptographic_failure",
+            "information_disclosure",
+            "injection_family",
+            "logic_flaw",
+            "memory_corruption",
+            "misconfiguration",
+            "supply_chain_weakness"
+          ],
+          "matrix": [
+            [
+              7,
+              5,
+              0,
+              9,
+              9,
+              13,
+              6,
+              4
+            ],
+            [
+              5,
+              6,
+              2,
+              3,
+              7,
+              4,
+              15,
+              3
+            ],
+            [
+              3,
+              1,
+              9,
+              1,
+              6,
+              0,
+              3,
+              0
+            ],
+            [
+              5,
+              8,
+              2,
+              14,
+              6,
+              19,
+              3,
+              8
+            ],
+            [
+              2,
+              4,
+              4,
+              3,
+              5,
+              2,
+              14,
+              0
+            ],
+            [
+              8,
+              6,
+              0,
+              13,
+              3,
+              28,
+              4,
+              8
+            ],
+            [
+              5,
+              9,
+              17,
+              2,
+              6,
+              7,
+              15,
+              4
+            ],
+            [
+              5,
+              0,
+              0,
+              7,
+              1,
+              19,
+              1,
+              8
+            ]
+          ]
+        },
+        "macro_roc_auc_ovr": 0.6871096699405611
+      },
+      "delta_accuracy": 0.005050505050505055,
+      "delta_macro_f1": 0.0027909304331710794
+    }
+  }
+}

feature_engineering.py ADDED Viewed

	@@ -0,0 +1,401 @@

+"""
+feature_engineering.py
+======================
+Feature pipeline for the CYB009 baseline classifier.
+Predicts `vulnerability_class` (8-class vulnerability classification)
+from per-vulnerability features on the CYB009 sample dataset.
+CSV inputs:
+    vuln_summary.csv          (primary, one row per vulnerability,
+                               2,638 vulnerabilities)
+    asset_inventory.csv       (per-asset registry, joined for asset
+                               context features)
+    vulnerability_records.csv (per-timestep trajectory; reserved)
+    vuln_lifecycle_events.csv (discrete event log; reserved)
+Target classes (8):
+    auth_access_control, cryptographic_failure, information_disclosure,
+    injection_family, logic_flaw, memory_corruption, misconfiguration,
+    supply_chain_weakness
+Why this task (and why not the more obvious targets)
+----------------------------------------------------
+The CYB009 README lists 11 suggested use cases. We piloted every
+README-headline target on the sample dataset and found the sample
+has pervasive structural leakage that makes most targets either
+trivially solvable via oracle features or unlearnable after honest
+leakage removal:
+- `exploit_maturity_final` (4-class) is structurally leaky via
+  `cvss_temporal_score_final`: CVSS v3.1 computes temporal score from
+  base score using Exploit Code Maturity multipliers (0.91 / 0.94 /
+  0.97 / 1.00 for unproven / PoC / functional / weaponised), so the
+  cvss_temporal/cvss_base ratio clusters near-deterministically per
+  maturity tier (0.80 / 0.83 / 0.85 / 0.88 in the data). Drop
+  cvss_temporal -> accuracy collapses from 0.74 to 0.31 (below
+  majority 0.36).
+- `remediation_status` / `patch_status` / `lifecycle_phase`
+  (per-timestep) form a tightly-coupled state machine. lifecycle_phase
+  = `residual_risk_review` -> 100% `remediated`. `patch_status =
+  deployed` -> 100% `remediated`. Any two of the three deterministically
+  pin the third.
+- `severity_class` is 100% derived from `cvss_base_score` via CVSS
+  v3.1 boundaries (low=0.1-3.9, medium=4.0-6.9, high=7.0-8.9,
+  critical=9.0-10.0). Trivial if cvss_base included; below majority
+  (acc 0.55 vs majority 0.51) without it.
+- All seven binary flags (`exploitation_occurred_flag`, `zero_day_flag`,
+  `cisa_kev_flag`, `supply_chain_propagation_flag`,
+  `remediation_success_flag`, `sla_compliance_flag`,
+  `false_positive_flag`) are at-or-below majority after honest
+  leakage removal of the event-time sentinels
+  (`time_to_exploit_days`, `time_to_remediate_days`, `patch_lag_days`,
+  `risk_score_composite`). See leakage_diagnostic.json.
+`vulnerability_class` is the only README-suggested target that learns
+honestly on the sample: acc 0.24, macro-F1 0.22, ROC-AUC 0.69 vs
+majority baseline 0.18. Modest +6pp lift over majority - the weakest
+baseline in the XpertSystems CYB catalog by design. The full ~487k-row
+product would tighten per-class signal materially.
+The model card frames this honestly: the strongest finding on CYB009
+is the comprehensive leakage diagnostic rather than the modest
+classifier performance. Buyers planning CYB009 ML work should read
+the diagnostic first.
+Leakage audit
+-------------
+Excluded as outcome leaks for this target:
+1. `exploit_maturity_final` - the target's natural pair via the CVSS
+   v3.1 temporal-score machinery.
+2. Event-time sentinel oracles dropped as precaution (not directly
+   leaky for vulnerability_class but indirectly via flag fields):
+   `time_to_exploit_days`, `time_to_remediate_days`, `patch_lag_days`,
+   `risk_score_composite`.
+3. `cvss_temporal_score_final` excluded because of the CVSS v3.1
+   maturity-multiplier structural encoding.
+`severity_class` is KEPT as a one-hot feature because it's a derived
+view of `cvss_base_score` rather than the target.
+Binary post-hoc flags are KEPT as legitimate observables that a SOC
+analyst would have at decision time. They contribute modest real
+signal (a few pp accuracy).
+Public API
+----------
+    build_features(vuln_summary_path, asset_inventory_path)
+        -> (X, y, ids, meta)
+    transform_single(record, meta, asset_lookup=None) -> np.ndarray
+    save_meta(meta, path) / load_meta(path)
+    build_asset_lookup(asset_inventory_path) -> dict
+License
+-------
+Ships with the public model on Hugging Face under CC-BY-NC-4.0,
+matching the dataset license. See README.md.
+"""
+from __future__ import annotations
+import json
+from pathlib import Path
+from typing import Any
+import numpy as np
+import pandas as pd
+# ---------------------------------------------------------------------------
+# Label space
+# ---------------------------------------------------------------------------
+# Eight vulnerability classes from the CYB009 sample. The README claims
+# 10 classes but only 8 exist in the sample data.
+LABEL_ORDER = [
+    "auth_access_control",
+    "cryptographic_failure",
+    "information_disclosure",
+    "injection_family",
+    "logic_flaw",
+    "memory_corruption",
+    "misconfiguration",
+    "supply_chain_weakness",
+]
+LABEL_TO_INT = {lbl: i for i, lbl in enumerate(LABEL_ORDER)}
+INT_TO_LABEL = {i: lbl for lbl, i in LABEL_TO_INT.items()}
+# ---------------------------------------------------------------------------
+# Identifier and target columns
+# ---------------------------------------------------------------------------
+ID_COLUMNS = ["vuln_id", "asset_id", "org_id"]
+TARGET_COLUMN = "vulnerability_class"
+# Outcome-leak columns excluded from features.
+EXCLUDED_FROM_FEATURES = [
+    "time_to_exploit_days",         # -1 sentinel oracle
+    "time_to_remediate_days",       # 120 sentinel oracle
+    "patch_lag_days",                # likely similar sentinel
+    "risk_score_composite",          # computed from flag fields
+    "exploit_maturity_final",        # indirect leak via CVSS temporal
+    "cvss_temporal_score_final",     # near-deterministic per maturity tier
+]
+# ---------------------------------------------------------------------------
+# Per-vulnerability numeric features
+# ---------------------------------------------------------------------------
+VULN_NUMERIC_FEATURES = [
+    "cvss_base_score",
+    "epss_score_final",
+    "exploitation_occurred_flag",
+    "zero_day_flag",
+    "cisa_kev_flag",
+    "supply_chain_propagation_flag",
+    "compensating_control_flag",
+    "false_positive_flag",
+    "remediation_success_flag",
+    "sla_compliance_flag",
+]
+VULN_CATEGORICAL_FEATURES = [
+    "severity_class",   # 4 values; CVSS-derived but useful as feature
+]
+# ---------------------------------------------------------------------------
+# Asset features (joined on asset_id from asset_inventory.csv)
+# ---------------------------------------------------------------------------
+ASSET_NUMERIC_FEATURES = [
+    "scanner_coverage",
+    "patch_mgmt_maturity",
+    "mean_time_to_remediate_days",
+    "sla_critical_days",
+    "sla_high_days",
+    "sla_medium_days",
+    "internet_exposed_flag",
+    "sbom_depth_score",
+]
+ASSET_CATEGORICAL_FEATURES = [
+    "asset_type",          # 12 values
+    "criticality_tier",    # 4 values
+    "environment_type",    # 8 values
+    "os_family",           # 6 values
+]
+# ---------------------------------------------------------------------------
+# Engineered features
+# ---------------------------------------------------------------------------
+def _add_engineered_features(df: pd.DataFrame) -> pd.DataFrame:
+    """
+    Five engineered features for vulnerability_class discrimination.
+    Note: no temporal-CVSS-derived features (those leak via the CVSS
+    v3.1 exploit-code-maturity machinery).
+    """
+    df = df.copy()
+    # 1. Log-scaled EPSS. EPSS is heavy-tailed.
+    df["log_epss"] = np.log1p(
+        df["epss_score_final"].clip(lower=0)
+    ).astype(float)
+    # 2. High-CVSS indicator. CVSS >= 7.0 (high or critical).
+    df["is_high_cvss"] = (df["cvss_base_score"] >= 7.0).astype(int)
+    # 3. Exposure x severity composite. Internet-exposed high-severity
+    #    vulns are often weighted differently per class.
+    df["exposure_severity_composite"] = (
+        df.get("internet_exposed_flag", 0) * df["cvss_base_score"]
+    ).astype(float)
+    # 4. Flag count: total number of risk flags raised. Different vuln
+    #    classes have different baseline flag patterns.
+    flag_cols = [
+        "exploitation_occurred_flag", "zero_day_flag", "cisa_kev_flag",
+        "supply_chain_propagation_flag", "compensating_control_flag",
+        "false_positive_flag",
+    ]
+    df["risk_flag_count"] = sum(df.get(c, 0) for c in flag_cols)
+    # 5. EPSS x CVSS composite.
+    df["epss_x_base"] = (
+        df["epss_score_final"] * df["cvss_base_score"]
+    ).astype(float)
+    return df
+# ---------------------------------------------------------------------------
+# Public API
+# ---------------------------------------------------------------------------
+def build_features(
+    vuln_summary_path: str | Path,
+    asset_inventory_path: str | Path,
+) -> tuple[pd.DataFrame, pd.Series, pd.Series, dict[str, Any]]:
+    """
+    Load vuln_summary.csv, join asset_inventory.csv, drop target +
+    identifiers + outcome leaks, engineer features, one-hot encode,
+    return (X, y, ids, meta).
+    """
+    vulns = pd.read_csv(vuln_summary_path)
+    assets = pd.read_csv(asset_inventory_path)
+    y = vulns[TARGET_COLUMN].map(LABEL_TO_INT)
+    if y.isna().any():
+        bad = vulns.loc[y.isna(), TARGET_COLUMN].unique()
+        raise ValueError(f"Unknown vulnerability_class values: {bad}")
+    y = y.astype(int)
+    ids = vulns["vuln_id"].copy()
+    asset_cols_needed = (
+        ["asset_id"] + ASSET_NUMERIC_FEATURES + ASSET_CATEGORICAL_FEATURES
+    )
+    vulns = vulns.merge(
+        assets[asset_cols_needed], on="asset_id", how="left",
+    )
+    vulns = vulns.drop(
+        columns=ID_COLUMNS + [TARGET_COLUMN] + EXCLUDED_FROM_FEATURES,
+        errors="ignore",
+    )
+    vulns = _add_engineered_features(vulns)
+    numeric_features = (
+        VULN_NUMERIC_FEATURES
+        + ASSET_NUMERIC_FEATURES
+        + [
+            "log_epss", "is_high_cvss", "exposure_severity_composite",
+            "risk_flag_count", "epss_x_base",
+        ]
+    )
+    numeric_features = [c for c in numeric_features if c in vulns.columns]
+    X_numeric = vulns[numeric_features].astype(float)
+    all_categorical = VULN_CATEGORICAL_FEATURES + ASSET_CATEGORICAL_FEATURES
+    categorical_levels: dict[str, list[str]] = {}
+    blocks: list[pd.DataFrame] = []
+    for col in all_categorical:
+        if col not in vulns.columns:
+            continue
+        levels = sorted(vulns[col].dropna().unique().tolist())
+        categorical_levels[col] = levels
+        block = pd.get_dummies(
+            vulns[col].astype("category").cat.set_categories(levels),
+            prefix=col, dummy_na=False,
+        ).astype(int)
+        blocks.append(block)
+    X = pd.concat(
+        [X_numeric.reset_index(drop=True)]
+        + [b.reset_index(drop=True) for b in blocks],
+        axis=1,
+    ).fillna(0.0)
+    meta = {
+        "feature_names": X.columns.tolist(),
+        "numeric_features": numeric_features,
+        "categorical_levels": categorical_levels,
+        "label_to_int": LABEL_TO_INT,
+        "int_to_label": INT_TO_LABEL,
+        "outcome_leak_excluded": EXCLUDED_FROM_FEATURES,
+    }
+    return X, y, ids, meta
+def transform_single(
+    record: dict | pd.DataFrame,
+    meta: dict[str, Any],
+    asset_lookup: dict | None = None,
+) -> np.ndarray:
+    """Encode a single vulnerability record for inference."""
+    if isinstance(record, dict):
+        df = pd.DataFrame([record.copy()])
+    else:
+        df = record.copy()
+    if asset_lookup is not None and "asset_id" in df.columns:
+        asset_id = df["asset_id"].iloc[0]
+        asset_feats = asset_lookup.get(asset_id, {})
+        for k, v in asset_feats.items():
+            if k not in df.columns:
+                df[k] = v
+    df = _add_engineered_features(df)
+    numeric = pd.DataFrame({
+        col: df.get(col, pd.Series([0.0] * len(df))).astype(float).values
+        for col in meta["numeric_features"]
+    })
+    blocks: list[pd.DataFrame] = [numeric]
+    for col, levels in meta["categorical_levels"].items():
+        val = df.get(col, pd.Series([None] * len(df)))
+        block = pd.get_dummies(
+            val.astype("category").cat.set_categories(levels),
+            prefix=col, dummy_na=False,
+        ).astype(int)
+        for lvl in levels:
+            cname = f"{col}_{lvl}"
+            if cname not in block.columns:
+                block[cname] = 0
+        block = block[[f"{col}_{lvl}" for lvl in levels]]
+        blocks.append(block)
+    X = pd.concat(blocks, axis=1).fillna(0.0)
+    X = X.reindex(columns=meta["feature_names"], fill_value=0.0)
+    return X.values.astype(np.float32)
+def save_meta(meta: dict[str, Any], path: str | Path) -> None:
+    serializable = {
+        "feature_names": meta["feature_names"],
+        "numeric_features": meta["numeric_features"],
+        "categorical_levels": meta["categorical_levels"],
+        "label_to_int": meta["label_to_int"],
+        "int_to_label": {str(k): v for k, v in meta["int_to_label"].items()},
+        "outcome_leak_excluded": meta.get("outcome_leak_excluded", []),
+    }
+    with open(path, "w") as f:
+        json.dump(serializable, f, indent=2)
+def load_meta(path: str | Path) -> dict[str, Any]:
+    with open(path) as f:
+        meta = json.load(f)
+    meta["int_to_label"] = {int(k): v for k, v in meta["int_to_label"].items()}
+    return meta
+def build_asset_lookup(asset_inventory_path: str | Path) -> dict[str, dict]:
+    """Build {asset_id: {asset feature values}} for inference-time lookup."""
+    assets = pd.read_csv(asset_inventory_path)
+    cols = ASSET_NUMERIC_FEATURES + ASSET_CATEGORICAL_FEATURES
+    out = {}
+    for _, row in assets.iterrows():
+        out[row["asset_id"]] = {c: row[c] for c in cols if c in assets.columns}
+    return out
+if __name__ == "__main__":
+    import sys
+    base = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("/mnt/user-data/uploads")
+    X, y, ids, meta = build_features(
+        base / "vuln_summary.csv",
+        base / "asset_inventory.csv",
+    )
+    print(f"X shape: {X.shape}")
+    print(f"y shape: {y.shape}")
+    print(f"n_features: {len(meta['feature_names'])}")
+    print(f"label distribution:\n{y.map(INT_TO_LABEL).value_counts()}")
+    print(f"X has NaN: {X.isnull().any().any()}")

feature_meta.json ADDED Viewed

	@@ -0,0 +1,160 @@

+{
+  "feature_names": [
+    "cvss_base_score",
+    "epss_score_final",
+    "exploitation_occurred_flag",
+    "zero_day_flag",
+    "cisa_kev_flag",
+    "supply_chain_propagation_flag",
+    "compensating_control_flag",
+    "false_positive_flag",
+    "remediation_success_flag",
+    "sla_compliance_flag",
+    "scanner_coverage",
+    "patch_mgmt_maturity",
+    "mean_time_to_remediate_days",
+    "sla_critical_days",
+    "sla_high_days",
+    "sla_medium_days",
+    "internet_exposed_flag",
+    "sbom_depth_score",
+    "log_epss",
+    "is_high_cvss",
+    "exposure_severity_composite",
+    "risk_flag_count",
+    "epss_x_base",
+    "severity_class_critical",
+    "severity_class_high",
+    "severity_class_low",
+    "severity_class_medium",
+    "asset_type_api_gateway",
+    "asset_type_cloud_vm",
+    "asset_type_container_workload",
+    "asset_type_database_server",
+    "asset_type_endpoint_workstation",
+    "asset_type_iot_firmware_device",
+    "asset_type_network_service",
+    "asset_type_ot_ics_controller",
+    "asset_type_saas_integration",
+    "asset_type_server_on_premises",
+    "asset_type_supply_chain_dependency",
+    "asset_type_web_application",
+    "criticality_tier_critical",
+    "criticality_tier_high",
+    "criticality_tier_low",
+    "criticality_tier_medium",
+    "environment_type_edge_iot_fleet",
+    "environment_type_hybrid_cloud",
+    "environment_type_on_premises_datacenter",
+    "environment_type_ot_ics_network",
+    "environment_type_public_cloud_aws",
+    "environment_type_public_cloud_azure",
+    "environment_type_public_cloud_gcp",
+    "environment_type_saas_dependent",
+    "os_family_android_iot",
+    "os_family_embedded_rtos",
+    "os_family_freebsd",
+    "os_family_linux",
+    "os_family_macos",
+    "os_family_windows"
+  ],
+  "numeric_features": [
+    "cvss_base_score",
+    "epss_score_final",
+    "exploitation_occurred_flag",
+    "zero_day_flag",
+    "cisa_kev_flag",
+    "supply_chain_propagation_flag",
+    "compensating_control_flag",
+    "false_positive_flag",
+    "remediation_success_flag",
+    "sla_compliance_flag",
+    "scanner_coverage",
+    "patch_mgmt_maturity",
+    "mean_time_to_remediate_days",
+    "sla_critical_days",
+    "sla_high_days",
+    "sla_medium_days",
+    "internet_exposed_flag",
+    "sbom_depth_score",
+    "log_epss",
+    "is_high_cvss",
+    "exposure_severity_composite",
+    "risk_flag_count",
+    "epss_x_base"
+  ],
+  "categorical_levels": {
+    "severity_class": [
+      "critical",
+      "high",
+      "low",
+      "medium"
+    ],
+    "asset_type": [
+      "api_gateway",
+      "cloud_vm",
+      "container_workload",
+      "database_server",
+      "endpoint_workstation",
+      "iot_firmware_device",
+      "network_service",
+      "ot_ics_controller",
+      "saas_integration",
+      "server_on_premises",
+      "supply_chain_dependency",
+      "web_application"
+    ],
+    "criticality_tier": [
+      "critical",
+      "high",
+      "low",
+      "medium"
+    ],
+    "environment_type": [
+      "edge_iot_fleet",
+      "hybrid_cloud",
+      "on_premises_datacenter",
+      "ot_ics_network",
+      "public_cloud_aws",
+      "public_cloud_azure",
+      "public_cloud_gcp",
+      "saas_dependent"
+    ],
+    "os_family": [
+      "android_iot",
+      "embedded_rtos",
+      "freebsd",
+      "linux",
+      "macos",
+      "windows"
+    ]
+  },
+  "label_to_int": {
+    "auth_access_control": 0,
+    "cryptographic_failure": 1,
+    "information_disclosure": 2,
+    "injection_family": 3,
+    "logic_flaw": 4,
+    "memory_corruption": 5,
+    "misconfiguration": 6,
+    "supply_chain_weakness": 7
+  },
+  "int_to_label": {
+    "0": "auth_access_control",
+    "1": "cryptographic_failure",
+    "2": "information_disclosure",
+    "3": "injection_family",
+    "4": "logic_flaw",
+    "5": "memory_corruption",
+    "6": "misconfiguration",
+    "7": "supply_chain_weakness"
+  },
+  "outcome_leak_excluded": [
+    "time_to_exploit_days",
+    "time_to_remediate_days",
+    "patch_lag_days",
+    "risk_score_composite",
+    "exploit_maturity_final",
+    "cvss_temporal_score_final"
+  ]
+}

feature_scaler.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"mean": [7.285319609967497, 0.1090479414951246, 0.07150595882990249, 0.03304442036836403, 0.005958829902491874, 0.007042253521126761, 0.09425785482123511, 0.08017334777898158, 0.8169014084507042, 0.7497291440953413, 0.7142526002166847, 0.5597418201516792, 61.792957746478876, 55.768689057421454, 167.30606717226436, 334.6121343445287, 0.34615384615384615, 0.4373639219934995, 0.09744259094329327, 0.6208017334777898, 2.522881906825569, 0.29198266522210187, 0.8318568651137596, 0.09425785482123511, 0.5254604550379198, 0.013542795232936078, 0.366738894907909, 0.06229685807150596, 0.08342361863488625, 0.08125677139761647, 0.09967497291440953, 0.09859154929577464, 0.07367280606717226, 0.0790899241603467, 0.09588299024918744, 0.06879739978331528, 0.09588299024918744, 0.0790899241603467, 0.08234019501625135, 0.10130010834236186, 0.2502708559046587, 0.28819068255687974, 0.36023835319609965, 0.12459371614301191, 0.12838569880823403, 0.14626218851570963, 0.12838569880823403, 0.10725893824485373, 0.13217768147345613, 0.1256771397616468, 0.10725893824485373, 0.15113759479956662, 0.15113759479956662, 0.1706392199349946, 0.1771397616468039, 0.18309859154929578, 0.1668472372697725], "std": [1.3818122818772989, 0.12908583650215913, 0.25773793273380063, 0.17880102089146993, 0.07698397704359367, 0.08364478612676705, 0.29226629026174605, 0.2716349620172026, 0.3868521254539063, 0.4332863417748008, 0.11297622294038147, 0.11921698382007824, 29.329761686203444, 27.416611488095498, 82.2498344642865, 164.499668928573, 0.4758718669674715, 0.12938965829339574, 0.10723050270659185, 0.4853190013040946, 3.5582862465806073, 0.6006416367944402, 1.0477891599805764, 0.2922662902617461, 0.49948665171105067, 0.11561413750437974, 0.4820449709408879, 0.24175942859093658, 0.27659638908598905, 0.27330307614640587, 0.29964731299698594, 0.2981936022209592, 0.2613084632006893, 0.2699521899581333, 0.29451048975837896, 0.253177883658354, 0.29451048975837896, 0.2699521899581333, 0.27495679912705384, 0.3018074546734477, 0.4332863417748008, 0.4530430424460816, 0.48019953797717524, 0.33034714867544795, 0.3346094186957652, 0.3534646244039396, 0.3346094186957652, 0.3095260212759078, 0.3387756096144394, 0.3315749585800216, 0.3095260212759078, 0.35828000060732873, 0.35828000060732873, 0.37629533874646653, 0.38189038988561785, 0.3868521254539063, 0.37294045160714023]}

inference_example.ipynb ADDED Viewed

	@@ -0,0 +1,345 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# CYB009 Baseline Classifier — Inference Example\n",
+    "\n",
+    "End-to-end demo: load the trained XGBoost and PyTorch MLP models from the Hugging Face repo and predict the **vulnerability class** (8-class CWE-style family) for a vulnerability record.\n",
+    "\n",
+    "**Models predict one of 8 vulnerability classes:** `auth_access_control`, `cryptographic_failure`, `information_disclosure`, `injection_family`, `logic_flaw`, `memory_corruption`, `misconfiguration`, `supply_chain_weakness`.\n",
+    "\n",
+    "**Read `leakage_diagnostic.json` first.** This is the most extensive structural-leakage audit in the XpertSystems catalog. Eight oracle paths were found across CYB009's targets; vulnerability_class is the only README-suggested target that learns honestly on the sample, and it gives the catalog's weakest baseline (acc 0.24 vs majority 0.18). The primary artifact value of this repo is the diagnostic, not the classifier."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. Install dependencies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install --quiet xgboost torch safetensors pandas numpy huggingface_hub"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. Download model artifacts from Hugging Face"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from huggingface_hub import hf_hub_download\n",
+    "\n",
+    "REPO_ID = \"xpertsystems/cyb009-baseline-classifier\"\n",
+    "\n",
+    "files = {}\n",
+    "for name in [\"model_xgb.json\", \"model_mlp.safetensors\",\n",
+    "             \"feature_engineering.py\", \"feature_meta.json\",\n",
+    "             \"feature_scaler.json\"]:\n",
+    "    files[name] = hf_hub_download(repo_id=REPO_ID, filename=name)\n",
+    "    print(f\"  downloaded: {name}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sys, os\n",
+    "fe_dir = os.path.dirname(files[\"feature_engineering.py\"])\n",
+    "if fe_dir not in sys.path:\n",
+    "    sys.path.insert(0, fe_dir)\n",
+    "\n",
+    "from feature_engineering import (\n",
+    "    transform_single, load_meta, build_asset_lookup, INT_TO_LABEL,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. Load models and metadata"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import numpy as np\n",
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import xgboost as xgb\n",
+    "from safetensors.torch import load_file\n",
+    "\n",
+    "meta = load_meta(files[\"feature_meta.json\"])\n",
+    "with open(files[\"feature_scaler.json\"]) as f:\n",
+    "    scaler = json.load(f)\n",
+    "\n",
+    "N_FEATURES = len(meta[\"feature_names\"])\n",
+    "N_CLASSES = len(meta[\"int_to_label\"])\n",
+    "print(f\"feature count: {N_FEATURES}\")\n",
+    "print(f\"class count:   {N_CLASSES}\")\n",
+    "print(f\"label classes: {list(meta['int_to_label'].values())}\")\n",
+    "print(f\"\\noutcome-leak columns excluded from features:\")\n",
+    "for c in meta.get(\"outcome_leak_excluded\", []):\n",
+    "    print(f\"  - {c}\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "xgb_model = xgb.XGBClassifier()\n",
+    "xgb_model.load_model(files[\"model_xgb.json\"])\n",
+    "\n",
+    "# MLP architecture (must match training)\n",
+    "class VulnClassMLP(nn.Module):\n",
+    "    def __init__(self, n_features, n_classes=8, hidden1=128, hidden2=64, dropout=0.3):\n",
+    "        super().__init__()\n",
+    "        self.net = nn.Sequential(\n",
+    "            nn.Linear(n_features, hidden1),\n",
+    "            nn.BatchNorm1d(hidden1),\n",
+    "            nn.ReLU(),\n",
+    "            nn.Dropout(dropout),\n",
+    "            nn.Linear(hidden1, hidden2),\n",
+    "            nn.BatchNorm1d(hidden2),\n",
+    "            nn.ReLU(),\n",
+    "            nn.Dropout(dropout),\n",
+    "            nn.Linear(hidden2, n_classes),\n",
+    "        )\n",
+    "    def forward(self, x):\n",
+    "        return self.net(x)\n",
+    "\n",
+    "mlp_model = VulnClassMLP(N_FEATURES, n_classes=N_CLASSES)\n",
+    "mlp_model.load_state_dict(load_file(files[\"model_mlp.safetensors\"]))\n",
+    "mlp_model.eval()\n",
+    "print(\"models loaded\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4. Load asset inventory for asset-feature lookup\n",
+    "\n",
+    "The model uses asset context (asset_type, criticality, environment, OS, scanner_coverage, etc.) as features. To predict on a new vulnerability, we look up its asset features from the asset_inventory."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from huggingface_hub import snapshot_download\n",
+    "\n",
+    "ds_path = snapshot_download(repo_id=\"xpertsystems/cyb009-sample\", repo_type=\"dataset\")\n",
+    "asset_lookup = build_asset_lookup(f\"{ds_path}/asset_inventory.csv\")\n",
+    "print(f\"loaded {len(asset_lookup)} asset records\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5. Prediction helper"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "MU = np.array(scaler[\"mean\"], dtype=np.float32)\n",
+    "SD = np.array(scaler[\"std\"],  dtype=np.float32)\n",
+    "\n",
+    "def predict_vuln_class(record: dict) -> dict:\n",
+    "    \"\"\"Predict the vulnerability class for one record.\n",
+    "\n",
+    "    Note: do NOT include exploit_maturity_final, cvss_temporal_score_final,\n",
+    "    time_to_exploit_days, time_to_remediate_days, patch_lag_days, or\n",
+    "    risk_score_composite in the record. These were outcome leaks in\n",
+    "    the training data and are excluded from the feature set.\n",
+    "\n",
+    "    Asset features (asset_type, criticality, etc.) are looked up\n",
+    "    from asset_inventory by asset_id.\n",
+    "    \"\"\"\n",
+    "    X = transform_single(record, meta, asset_lookup=asset_lookup)\n",
+    "\n",
+    "    xgb_proba = xgb_model.predict_proba(X)[0]\n",
+    "    xgb_label = INT_TO_LABEL[int(np.argmax(xgb_proba))]\n",
+    "\n",
+    "    Xs = ((X - MU) / SD).astype(np.float32)\n",
+    "    with torch.no_grad():\n",
+    "        logits = mlp_model(torch.tensor(Xs))\n",
+    "        mlp_proba = torch.softmax(logits, dim=1).numpy()[0]\n",
+    "    mlp_label = INT_TO_LABEL[int(np.argmax(mlp_proba))]\n",
+    "\n",
+    "    return {\n",
+    "        \"xgboost\": {\n",
+    "            \"label\": xgb_label,\n",
+    "            \"probabilities\": {INT_TO_LABEL[i]: float(p) for i, p in enumerate(xgb_proba)},\n",
+    "        },\n",
+    "        \"mlp\": {\n",
+    "            \"label\": mlp_label,\n",
+    "            \"probabilities\": {INT_TO_LABEL[i]: float(p) for i, p in enumerate(mlp_proba)},\n",
+    "        },\n",
+    "    }"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6. Run on an example record\n",
+    "\n",
+    "Real critical-severity vulnerability from the CYB009 sample. True class is `memory_corruption` (CVSS 9.9, exploitation hasn't yet occurred, compensating control in place). On this kind of high-CVSS critical vulnerability the model has its strongest signal."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Real vulnerability from the sample dataset (true class: memory_corruption)\n",
+    "# Note: asset_id is supplied so asset features are auto-looked-up\n",
+    "example_record = {\n",
+    "    \"asset_id\": \"ASSET000001\",\n",
+    "    \"severity_class\": \"critical\",\n",
+    "    \"cvss_base_score\": 9.9,\n",
+    "    \"epss_score_final\": 0.2397,\n",
+    "    \"sla_compliance_flag\": 1,\n",
+    "    \"exploitation_occurred_flag\": 0,\n",
+    "    \"zero_day_flag\": 0,\n",
+    "    \"remediation_success_flag\": 1,\n",
+    "    \"compensating_control_flag\": 1,\n",
+    "    \"supply_chain_propagation_flag\": 0,\n",
+    "    \"cisa_kev_flag\": 0,\n",
+    "    \"false_positive_flag\": 0,\n",
+    "}\n",
+    "\n",
+    "result = predict_vuln_class(example_record)\n",
+    "\n",
+    "print(f\"XGBoost  ->  {result['xgboost']['label']}\")\n",
+    "for lbl, p in sorted(result['xgboost']['probabilities'].items(), key=lambda x: -x[1]):\n",
+    "    print(f\"    P({lbl:30s}) = {p:.4f}\")\n",
+    "\n",
+    "print(f\"\\nMLP      ->  {result['mlp']['label']}\")\n",
+    "for lbl, p in sorted(result['mlp']['probabilities'].items(), key=lambda x: -x[1]):\n",
+    "    print(f\"    P({lbl:30s}) = {p:.4f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Modest, honest confidence\n",
+    "\n",
+    "The model's confidence on individual predictions is modest (top-1 typically 0.2-0.4) because vulnerability_class is a genuinely hard task on this sample. The per-class feature distributions overlap heavily — different vuln classes have similar CVSS, EPSS, and asset distributions.\n",
+    "\n",
+    "The model is a useful baseline (acc 0.24 vs majority 0.18, AUC 0.69) but not a production classifier. Read `leakage_diagnostic.json` for the structural reasons why every other CYB009 README-suggested target is either trivially solvable via oracle features or unlearnable after honest leak removal."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 7. Batch prediction on the sample dataset"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "\n",
+    "vulns = pd.read_csv(f\"{ds_path}/vuln_summary.csv\")\n",
+    "\n",
+    "# Score the first 500 vulnerabilities\n",
+    "sample = vulns.head(500).copy()\n",
+    "preds = [predict_vuln_class(row.to_dict())[\"xgboost\"][\"label\"] for _, row in sample.iterrows()]\n",
+    "sample[\"xgb_pred\"] = preds\n",
+    "\n",
+    "ct = pd.crosstab(sample[\"vulnerability_class\"], sample[\"xgb_pred\"],\n",
+    "                 rownames=[\"true\"], colnames=[\"pred\"])\n",
+    "print(\"Confusion on first 500 sample vulnerabilities (XGBoost):\")\n",
+    "print(ct)\n",
+    "acc = (sample[\"vulnerability_class\"] == sample[\"xgb_pred\"]).mean()\n",
+    "print(f\"\\nbatch accuracy on first 500 vulns (in-distribution): {acc:.4f}\")\n",
+    "print(\"\\nNote: this includes training-set vulnerabilities. See validation_results.json\\n\"\n",
+    "      \"for proper held-out test metrics.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 8. Important reading: the leakage diagnostic\n",
+    "\n",
+    "Before using CYB009 sample data to train your own models, read **`leakage_diagnostic.json`** in this repo. It documents **8 oracle paths** across the sample's targets:\n",
+    "\n",
+    "1. **`cvss_temporal_score_final`** is a near-deterministic function of `exploit_maturity_final` (via CVSS v3.1 multipliers 0.91/0.94/0.97/1.00).\n",
+    "2. **`time_to_exploit_days`** uses a -1 sentinel that perfectly identifies `exploitation_occurred_flag = 0`.\n",
+    "3. **`time_to_remediate_days`** uses a 120 sentinel that perfectly identifies `remediation_success_flag = 0`.\n",
+    "4. **`severity_class`** is a 100% mechanical function of `cvss_base_score` (CVSS v3.1 boundaries).\n",
+    "5. **`lifecycle_phase`** has 5+ phases that deterministically pin `remediation_status` (e.g. `residual_risk_review` → 100% `remediated`).\n",
+    "6. **`patch_status`** has 5 of 6 values that pin `remediation_status` (e.g. `deployed` → 100% `remediated`).\n",
+    "7. **`risk_score_composite`** is computed from flag fields (indirect oracle).\n",
+    "8. **`patch_lag_days`** is suspected to have similar sentinel structure (precaution).\n",
+    "\n",
+    "It also documents **6 README-suggested headline targets that are unlearnable on the sample** after honest leak removal: `exploitation_occurred_flag`, `zero_day_flag`, `cisa_kev_flag`, `supply_chain_propagation_flag`, `false_positive_flag`, and `exploit_maturity_final`."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 9. Next steps\n",
+    "\n",
+    "- See `validation_results.json` for held-out test metrics (396 vulnerabilities).\n",
+    "- See `multi_seed_results.json` for the across-10-seeds picture (accuracy 0.244 ± 0.023, ROC-AUC 0.687 ± 0.014).\n",
+    "- See `ablation_results.json` — every feature group contributes 1-3pp accuracy, indicating spread-out modest signal across the feature set.\n",
+    "- See **`leakage_diagnostic.json`** for the comprehensive structural-leakage audit (8 oracle paths + 6 unlearnable targets).\n",
+    "- For the full ~487k-row CYB009 dataset and commercial licensing, contact **pradeep@xpertsystems.ai**."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

leakage_diagnostic.json ADDED Viewed

	@@ -0,0 +1,218 @@

+{
+  "purpose": "CYB009 sample has the most pervasive structural leakage of any SKU in the XpertSystems catalog. Eight oracle paths were discovered, and five of the README's headline targets are unlearnable on the sample after honest leak removal. The primary baseline that ships with this repo (vulnerability_class 8-class) is the only README-suggested target that learns honestly - and it is the WEAKEST baseline in the catalog by design (acc 0.24 vs majority 0.18). The headline finding for CYB009 is this diagnostic, not the classifier.",
+  "primary_target": "vulnerability_class (8-class)",
+  "split": "StratifiedShuffleSplit, 70/15/15 nested",
+  "oracle_paths_documented": {
+    "P1_cvss_temporal_ratio": {
+      "target": "exploit_maturity_final",
+      "leak_column": "cvss_temporal_score_final",
+      "mechanism": "CVSS v3.1 computes Temporal Score from Base Score using an Exploit Code Maturity multiplier (0.91 unproven, 0.94 PoC, 0.97 functional, 1.00 high/weaponised). The cvss_temporal/cvss_base ratio in the sample clusters near these multipliers per maturity tier, making it a near-deterministic oracle for the target.",
+      "observed_ratios_by_tier": {
+        "functional": {
+          "min": 0.8516,
+          "median": 0.8537,
+          "max": 0.8843,
+          "std": 0.0113
+        },
+        "proof_of_concept": {
+          "min": 0.8255,
+          "median": 0.8274,
+          "max": 0.8567,
+          "std": 0.0114
+        },
+        "unproven": {
+          "min": 0.7991,
+          "median": 0.801,
+          "max": 0.8302,
+          "std": 0.011
+        },
+        "weaponised": {
+          "min": 0.878,
+          "median": 0.88,
+          "max": 0.9116,
+          "std": 0.0115
+        }
+      },
+      "impact": "With cvss_temporal_score_final included, XGBoost achieves test accuracy 0.74 (mF1 0.72, AUC 0.91). With it excluded, accuracy collapses to 0.31 (mF1 0.31, AUC 0.58) - below majority baseline of 0.36. The target is structurally unlearnable on the sample after honest leak removal."
+    },
+    "P2_time_to_exploit_sentinel": {
+      "target": "exploitation_occurred_flag (and zero_day_flag)",
+      "leak_column": "time_to_exploit_days",
+      "mechanism": "Sentinel-coded post-hoc field: -1 when no exploitation occurred; positive (0-95 days) when exploitation occurred. Perfect oracle.",
+      "evidence": {
+        "time_to_exploit_minus1_AND_flag_0": 2435,
+        "time_to_exploit_positive_AND_flag_1": 197,
+        "time_to_exploit_positive_AND_flag_0": 0
+      },
+      "impact": "Perfect oracle for exploitation_occurred_flag and zero_day_flag."
+    },
+    "P3_time_to_remediate_sentinel": {
+      "target": "remediation_success_flag, sla_compliance_flag",
+      "leak_column": "time_to_remediate_days",
+      "mechanism": "Sentinel-coded post-hoc field: 120 (the timeline horizon) when not remediated; lower values (3-113) when remediated. Perfect oracle.",
+      "evidence": {
+        "remediation_flag_0_time_mean": 120.0,
+        "remediation_flag_0_time_min": 120,
+        "remediation_flag_1_time_mean": 41.77892756349953,
+        "remediation_flag_1_time_max": 113
+      },
+      "impact": "Perfect oracle for remediation_success_flag and near-perfect for sla_compliance_flag."
+    },
+    "P4_severity_class_cvss_boundaries": {
+      "target": "severity_class",
+      "leak_column": "cvss_base_score",
+      "mechanism": "severity_class is computed as a CVSS v3.1 boundary function of cvss_base_score (low=0.1-3.9, medium=4.0-6.9, high=7.0-8.9, critical=9.0-10.0). Including cvss_base_score makes severity prediction trivial; excluding it leaves only weak signal (acc 0.55 vs majority 0.51 = barely above).",
+      "observed_cvss_ranges_per_severity": {
+        "critical": {
+          "min": 9.0,
+          "max": 10.0
+        },
+        "high": {
+          "min": 7.0,
+          "max": 9.0
+        },
+        "low": {
+          "min": 1.77,
+          "max": 4.0
+        },
+        "medium": {
+          "min": 4.02,
+          "max": 7.0
+        }
+      },
+      "impact": "100% mechanical encoding. severity_class is not a useful ML target on this dataset."
+    },
+    "P5_lifecycle_to_remediation": {
+      "target": "remediation_status (per-timestep)",
+      "leak_column": "lifecycle_phase",
+      "mechanism": "The 12-phase lifecycle state machine has multiple phases that deterministically pin remediation_status. ~83% of per-timestep rows have lifecycle_phase that determines remediation_status exactly.",
+      "deterministic_phase_mappings": {
+        "accepted_risk": {
+          "maps_to": "in_remediation",
+          "purity": 1.0,
+          "n_rows": 16
+        },
+        "discovery": {
+          "maps_to": "undetected",
+          "purity": 1.0,
+          "n_rows": 327
+        },
+        "false_positive_closed": {
+          "maps_to": "in_remediation",
+          "purity": 0.9944,
+          "n_rows": 1421
+        },
+        "organisational_triage": {
+          "maps_to": "triaged",
+          "purity": 1.0,
+          "n_rows": 18
+        },
+        "patch_release": {
+          "maps_to": "undetected",
+          "purity": 1.0,
+          "n_rows": 33
+        },
+        "remediation_deployment": {
+          "maps_to": "in_remediation",
+          "purity": 1.0,
+          "n_rows": 4362
+        },
+        "residual_risk_review": {
+          "maps_to": "remediated",
+          "purity": 1.0,
+          "n_rows": 8921
+        }
+      },
+      "impact": "Per-timestep targets remediation_status, patch_status, and lifecycle_phase form a tightly-coupled state machine; any two pin the third. All three appear as 0.95-0.98 accuracy in naive evaluation but are mechanically determined."
+    },
+    "P6_patch_to_remediation": {
+      "target": "remediation_status (per-timestep)",
+      "leak_column": "patch_status",
+      "mechanism": "Of 6 patch_status values, at least 5 map near-deterministically to a single remediation_status value. `patch_status=deployed` -> 100% `remediated`; `patch_validated`/`vendor_notified`/`patch_in_development`/`patch_released` -> ~99% `in_remediation`.",
+      "deterministic_status_mappings": {
+        "deployed": {
+          "maps_to": "remediated",
+          "purity": 1.0,
+          "n_rows": 8958
+        },
+        "patch_validated": {
+          "maps_to": "in_remediation",
+          "purity": 0.9941,
+          "n_rows": 5293
+        }
+      },
+      "impact": "patch_status alone is a near-oracle for remediation_status."
+    },
+    "P7_risk_score_composite": {
+      "target": "all binary flag fields (indirect)",
+      "leak_column": "risk_score_composite",
+      "mechanism": "risk_score_composite is computed in the generator from cvss_base_score, epss_score_final, and the flag fields. Including it in features would launder flag information into the model via this composite.",
+      "evidence": "Generator-side composite; correlation with all flag fields > 0.3.",
+      "impact": "Precautionary drop. Affects all binary flag targets."
+    },
+    "P8_patch_lag_days": {
+      "target": "remediation_success_flag (suspected)",
+      "leak_column": "patch_lag_days",
+      "mechanism": "Likely same sentinel-coding structure as time_to_remediate_days (120 sentinel for unpatched; lower values when patched). Dropped as precaution; not separately validated.",
+      "impact": "Precautionary drop."
+    }
+  },
+  "unlearnable_targets": [
+    {
+      "target": "exploitation_occurred_flag",
+      "n_positives": 203,
+      "majority_baseline": 0.9230477634571645,
+      "honest_accuracy": 0.8569023569023568,
+      "honest_roc_auc": 0.6534304796599878,
+      "verdict": "below_majority"
+    },
+    {
+      "target": "zero_day_flag",
+      "n_positives": 76,
+      "majority_baseline": 0.9711902956785443,
+      "honest_accuracy": 0.9486531986531986,
+      "honest_roc_auc": 0.6040141676505313,
+      "verdict": "below_majority"
+    },
+    {
+      "target": "cisa_kev_flag",
+      "n_positives": 14,
+      "majority_baseline": 0.9946929492039424,
+      "honest_accuracy": 0.9924242424242425,
+      "honest_roc_auc": 0.6125211505922166,
+      "verdict": "below_majority"
+    },
+    {
+      "target": "supply_chain_propagation_flag",
+      "n_positives": 20,
+      "majority_baseline": 0.9924184988627748,
+      "honest_accuracy": 0.9915824915824917,
+      "honest_roc_auc": 0.7950240316652529,
+      "verdict": "below_majority"
+    },
+    {
+      "target": "false_positive_flag",
+      "n_positives": 205,
+      "majority_baseline": 0.922289613343442,
+      "honest_accuracy": 0.8661616161616162,
+      "honest_roc_auc": 0.5172779496243923,
+      "verdict": "below_majority"
+    },
+    {
+      "target": "exploit_maturity_final (after cvss_temporal_score_final dropped)",
+      "n_classes": 4,
+      "majority_baseline": 0.35898407884761185,
+      "honest_accuracy": 0.30639730639730645,
+      "honest_roc_auc": 0.5731243306339614,
+      "verdict": "below_majority"
+    }
+  ],
+  "unlearnable_summary": "Six of the README's headline use cases are unlearnable on the sample after honest leak removal: exploitation_occurred_flag, zero_day_flag, cisa_kev_flag, supply_chain_propagation_flag, false_positive_flag, and exploit_maturity_final (the original primary candidate target before the cvss_temporal_score_final leakage was discovered). Only vulnerability_class learns honestly, and it gives the weakest baseline in the catalog (acc 0.24 vs majority 0.18).",
+  "recommendations_to_dataset_author": [
+    "Remove the deterministic CVSS v3.1 exploit-code-maturity multiplier from cvss_temporal_score_final calculation, or add per-vulnerability noise so the cvss_temporal/cvss_base ratio overlaps across maturity tiers. As shipped, the ratio uniquely identifies the tier.",
+    "Replace -1 / 120 / etc. sentinel values in time_to_exploit_days, time_to_remediate_days, and patch_lag_days with probabilistic censoring that doesn't perfectly identify the outcome class. For example, use the latest observed time on partially-complete trajectories rather than a fixed sentinel.",
+    "Decouple the lifecycle_phase -> remediation_status -> patch_status state machine. Real telemetry has noisy intermediate states (e.g. a vuln can move to patch_released without immediately being remediated). The current sample has 5+ pure deterministic edges in this graph.",
+    "Add per-vulnerability-class feature signatures. The 8 classes differ in cvss_base_score means (5.4-8.3) but per-class feature distributions overlap heavily. Add class-specific EPSS distributions, asset-affinity, and disclosure-timeline patterns to make class prediction tractable from features.",
+    "Increase positive-class counts for rare-event binaries in the sample: 14 cisa_kev positives, 20 supply_chain positives, and 76 zero_day positives are below the threshold for reliable minority-class ML evaluation at n=2638. Either upsample these in the sample or document them as full-product-only signals."
+  ]
+}

model_mlp.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b67a4b451d808d4c8574681f1a59fa67647c20c1f50dfea4aa2e495b14a67cba
+size 69096

model_xgb.json ADDED Viewed

The diff for this file is too large to render. See raw diff

multi_seed_results.json ADDED Viewed

	@@ -0,0 +1,98 @@

+{
+  "purpose": "Multi-seed evaluation across 10 stratified splits of the 2,638-vulnerability sample.",
+  "seeds_evaluated": [
+    42,
+    7,
+    13,
+    17,
+    23,
+    31,
+    45,
+    99,
+    123,
+    200
+  ],
+  "per_seed": [
+    {
+      "seed": 42,
+      "test_n_classes": 8,
+      "accuracy": 0.23737373737373738,
+      "macro_f1": 0.22437482872901052,
+      "macro_roc_auc_ovr": 0.6837125710196055
+    },
+    {
+      "seed": 7,
+      "test_n_classes": 8,
+      "accuracy": 0.2222222222222222,
+      "macro_f1": 0.2093010862619929,
+      "macro_roc_auc_ovr": 0.6598529124901316
+    },
+    {
+      "seed": 13,
+      "test_n_classes": 8,
+      "accuracy": 0.2398989898989899,
+      "macro_f1": 0.2307013362941505,
+      "macro_roc_auc_ovr": 0.6859754559014113
+    },
+    {
+      "seed": 17,
+      "test_n_classes": 8,
+      "accuracy": 0.2828282828282828,
+      "macro_f1": 0.2641998881222478,
+      "macro_roc_auc_ovr": 0.7001133264273626
+    },
+    {
+      "seed": 23,
+      "test_n_classes": 8,
+      "accuracy": 0.22474747474747475,
+      "macro_f1": 0.20938909311730927,
+      "macro_roc_auc_ovr": 0.6952258894131303
+    },
+    {
+      "seed": 31,
+      "test_n_classes": 8,
+      "accuracy": 0.25252525252525254,
+      "macro_f1": 0.23228517698591994,
+      "macro_roc_auc_ovr": 0.6868917272897719
+    },
+    {
+      "seed": 45,
+      "test_n_classes": 8,
+      "accuracy": 0.2601010101010101,
+      "macro_f1": 0.23328085381091487,
+      "macro_roc_auc_ovr": 0.6955734168438206
+    },
+    {
+      "seed": 99,
+      "test_n_classes": 8,
+      "accuracy": 0.21717171717171718,
+      "macro_f1": 0.2064102665659866,
+      "macro_roc_auc_ovr": 0.700000049204532
+    },
+    {
+      "seed": 123,
+      "test_n_classes": 8,
+      "accuracy": 0.2222222222222222,
+      "macro_f1": 0.20983049912880922,
+      "macro_roc_auc_ovr": 0.662519489088299
+    },
+    {
+      "seed": 200,
+      "test_n_classes": 8,
+      "accuracy": 0.2828282828282828,
+      "macro_f1": 0.2801905278759914,
+      "macro_roc_auc_ovr": 0.6954305041778505
+    }
+  ],
+  "aggregate": {
+    "accuracy_mean": 0.2441919191919192,
+    "accuracy_std": 0.023337760304165702,
+    "accuracy_min": 0.21717171717171718,
+    "accuracy_max": 0.2828282828282828,
+    "macro_f1_mean": 0.22999635568923332,
+    "macro_f1_std": 0.023565611735295866,
+    "roc_auc_mean": 0.6865295341855916,
+    "roc_auc_std": 0.013780848086567432
+  },
+  "published_artifact_seed": 42
+}

validation_results.json ADDED Viewed

	@@ -0,0 +1,290 @@

+{
+  "version": "1.0.0",
+  "dataset": "xpertsystems/cyb009-sample",
+  "task": "8-class vulnerability_class classification (CWE-style families)",
+  "baselines": {
+    "always_predict_majority_accuracy": 0.17676767676767677,
+    "majority_class": "memory_corruption",
+    "random_guess_accuracy": 0.125
+  },
+  "split": {
+    "strategy": "stratified (StratifiedShuffleSplit, nested 70/15/15)",
+    "rationale": "Per-vulnerability task (n=2638), one row per vuln. Stratified random splitting preserves class distribution. No row-correlation structure to leak.",
+    "vulns_train": 1846,
+    "vulns_val": 396,
+    "vulns_test": 396,
+    "seed": 42
+  },
+  "n_features": 57,
+  "label_classes": [
+    "auth_access_control",
+    "cryptographic_failure",
+    "information_disclosure",
+    "injection_family",
+    "logic_flaw",
+    "memory_corruption",
+    "misconfiguration",
+    "supply_chain_weakness"
+  ],
+  "class_distribution_train": {
+    "memory_corruption": 325,
+    "injection_family": 305,
+    "misconfiguration": 305,
+    "auth_access_control": 245,
+    "cryptographic_failure": 211,
+    "supply_chain_weakness": 189,
+    "logic_flaw": 160,
+    "information_disclosure": 106
+  },
+  "class_distribution_test": {
+    "memory_corruption": 70,
+    "misconfiguration": 65,
+    "injection_family": 65,
+    "auth_access_control": 53,
+    "cryptographic_failure": 45,
+    "supply_chain_weakness": 41,
+    "logic_flaw": 34,
+    "information_disclosure": 23
+  },
+  "outcome_leak_excluded_features": [
+    "exploit_maturity_final (indirect leak via CVSS temporal multiplier)",
+    "cvss_temporal_score_final (near-deterministic per exploit_maturity_final tier)",
+    "time_to_exploit_days (sentinel -1 / positive)",
+    "time_to_remediate_days (sentinel 120 / lower)",
+    "patch_lag_days (suspected similar sentinel - precaution)",
+    "risk_score_composite (computed from flag fields - precaution)"
+  ],
+  "leakage_audit_note": "CYB009 has the most pervasive structural leakage of any SKU in the XpertSystems catalog. See leakage_diagnostic.json for the full 8-oracle-path audit. Six of the README's headline use cases are unlearnable on the sample after honest leak removal; vulnerability_class is the only viable target and gives the catalog's weakest baseline by design.",
+  "models": {
+    "xgboost": {
+      "architecture": "Gradient-boosted decision trees, multi:softprob, 8 classes",
+      "framework": "xgboost",
+      "test_metrics": {
+        "model": "xgboost",
+        "accuracy": 0.23737373737373738,
+        "macro_f1": 0.22437482872901052,
+        "weighted_f1": 0.23213786276177156,
+        "per_class_f1": {
+          "auth_access_control": 0.14583333333333334,
+          "cryptographic_failure": 0.21686746987951808,
+          "information_disclosure": 0.2909090909090909,
+          "injection_family": 0.23728813559322035,
+          "logic_flaw": 0.08955223880597014,
+          "memory_corruption": 0.3333333333333333,
+          "misconfiguration": 0.2589928057553957,
+          "supply_chain_weakness": 0.2222222222222222
+        },
+        "confusion_matrix": {
+          "labels": [
+            "auth_access_control",
+            "cryptographic_failure",
+            "information_disclosure",
+            "injection_family",
+            "logic_flaw",
+            "memory_corruption",
+            "misconfiguration",
+            "supply_chain_weakness"
+          ],
+          "matrix": [
+            [
+              7,
+              7,
+              0,
+              11,
+              6,
+              10,
+              7,
+              5
+            ],
+            [
+              4,
+              9,
+              3,
+              5,
+              3,
+              5,
+              16,
+              0
+            ],
+            [
+              3,
+              0,
+              8,
+              1,
+              4,
+              0,
+              7,
+              0
+            ],
+            [
+              3,
+              6,
+              1,
+              14,
+              8,
+              20,
+              6,
+              7
+            ],
+            [
+              4,
+              4,
+              5,
+              3,
+              3,
+              2,
+              13,
+              0
+            ],
+            [
+              11,
+              3,
+              0,
+              13,
+              3,
+              27,
+              5,
+              8
+            ],
+            [
+              6,
+              9,
+              15,
+              2,
+              5,
+              7,
+              18,
+              3
+            ],
+            [
+              5,
+              0,
+              0,
+              4,
+              1,
+              21,
+              2,
+              8
+            ]
+          ]
+        },
+        "macro_roc_auc_ovr": 0.6837125710196055
+      }
+    },
+    "mlp": {
+      "architecture": "PyTorch MLP, 57 -> 128 -> 64 -> 8, BatchNorm1d + ReLU + Dropout, weighted cross-entropy loss",
+      "framework": "pytorch",
+      "test_metrics": {
+        "model": "mlp",
+        "accuracy": 0.23232323232323232,
+        "macro_f1": 0.22092024769409177,
+        "weighted_f1": 0.22940625794114217,
+        "per_class_f1": {
+          "auth_access_control": 0.16279069767441862,
+          "cryptographic_failure": 0.16842105263157894,
+          "information_disclosure": 0.15384615384615385,
+          "injection_family": 0.23529411764705882,
+          "logic_flaw": 0.22784810126582278,
+          "memory_corruption": 0.36486486486486486,
+          "misconfiguration": 0.16216216216216217,
+          "supply_chain_weakness": 0.29213483146067415
+        },
+        "confusion_matrix": {
+          "labels": [
+            "auth_access_control",
+            "cryptographic_failure",
+            "information_disclosure",
+            "injection_family",
+            "logic_flaw",
+            "memory_corruption",
+            "misconfiguration",
+            "supply_chain_weakness"
+          ],
+          "matrix": [
+            [
+              7,
+              8,
+              1,
+              12,
+              6,
+              12,
+              4,
+              3
+            ],
+            [
+              5,
+              8,
+              4,
+              3,
+              5,
+              5,
+              14,
+              1
+            ],
+            [
+              1,
+              3,
+              5,
+              2,
+              5,
+              1,
+              6,
+              0
+            ],
+            [
+              3,
+              7,
+              3,
+              14,
+              6,
+              17,
+              2,
+              13
+            ],
+            [
+              1,
+              5,
+              9,
+              3,
+              9,
+              1,
+              6,
+              0
+            ],
+            [
+              8,
+              7,
+              0,
+              9,
+              3,
+              27,
+              2,
+              14
+            ],
+            [
+              3,
+              10,
+              20,
+              5,
+              10,
+              4,
+              9,
+              4
+            ],
+            [
+              5,
+              2,
+              0,
+              6,
+              1,
+              11,
+              3,
+              13
+            ]
+          ]
+        },
+        "macro_roc_auc_ovr": 0.6899177016524518
+      }
+    }
+  }
+}