pradeep-xpert commited on
Commit
e2c4702
·
verified ·
1 Parent(s): 87139e3

Initial release: attack_lifecycle_phase 5-class baseline + 11-oracle-path leakage diagnostic

Browse files
README.md ADDED
@@ -0,0 +1,483 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ library_name: pytorch
4
+ tags:
5
+ - cybersecurity
6
+ - siem
7
+ - security-logs
8
+ - mitre-attack
9
+ - apt
10
+ - tabular-classification
11
+ - synthetic-data
12
+ - xgboost
13
+ - baseline
14
+ - leakage-diagnostic
15
+ pipeline_tag: tabular-classification
16
+ base_model: []
17
+ datasets:
18
+ - xpertsystems/cyb010-sample
19
+ metrics:
20
+ - accuracy
21
+ - f1
22
+ - roc_auc
23
+ model-index:
24
+ - name: cyb010-baseline-classifier
25
+ results:
26
+ - task:
27
+ type: tabular-classification
28
+ name: 5-class attack lifecycle phase classification
29
+ dataset:
30
+ type: xpertsystems/cyb010-sample
31
+ name: CYB010 Synthetic Security Event Log Dataset (Sample)
32
+ metrics:
33
+ - type: roc_auc
34
+ value: 0.9904
35
+ name: Test macro ROC-AUC OvR (XGBoost, seed 42)
36
+ - type: accuracy
37
+ value: 0.9493
38
+ name: Test accuracy (XGBoost, seed 42)
39
+ - type: f1
40
+ value: 0.7781
41
+ name: Test macro-F1 (XGBoost, seed 42)
42
+ - type: accuracy
43
+ value: 0.936
44
+ name: Multi-seed accuracy mean ± 0.007 (XGBoost, 10 seeds)
45
+ - type: roc_auc
46
+ value: 0.988
47
+ name: Multi-seed ROC-AUC mean ± 0.001 (XGBoost, 10 seeds)
48
+ ---
49
+
50
+ # CYB010 Baseline Classifier
51
+
52
+ **Attack lifecycle phase classifier (5-class) trained on the CYB010
53
+ synthetic security event log sample. Predicts which of 5 attack phases
54
+ (`benign_background` / `initial_access` / `lateral_movement` /
55
+ `persistence_establishment` / `exfiltration_or_impact`) a security
56
+ event belongs to, from per-event features. ALSO ships a comprehensive
57
+ `leakage_diagnostic.json` documenting 11 oracle paths discovered
58
+ across the dataset's targets and 2 README-suggested targets that are
59
+ unlearnable on the sample after honest leak removal.**
60
+
61
+ > **Read this first.** This repo ships two related artifacts:
62
+ > (1) a working baseline classifier for `attack_lifecycle_phase` (the
63
+ > dataset's headline target), and (2) `leakage_diagnostic.json`
64
+ > documenting 11 separate oracle paths plus 2 unlearnable targets.
65
+ > Both files matter; the diagnostic is required reading for anyone
66
+ > evaluating CYB010 for SIEM ML work.
67
+
68
+ ## Model overview
69
+
70
+ | Property | Value |
71
+ |---|---|
72
+ | Primary task | 5-class `attack_lifecycle_phase` classification |
73
+ | Secondary artifact | `leakage_diagnostic.json` — 11 oracle paths + 2 unlearnable targets |
74
+ | Training data | `xpertsystems/cyb010-sample` (21,896 events / 500 incidents) |
75
+ | Models | XGBoost + PyTorch MLP |
76
+ | Input features | 87 (after one-hot encoding) |
77
+ | Split | **Group-aware** (GroupShuffleSplit on `incident_id`) |
78
+ | Validation | Single seed (artifact) + multi-seed aggregate across 10 seeds |
79
+ | License | CC-BY-NC-4.0 (matches dataset) |
80
+ | Status | Reference baseline + comprehensive leakage diagnostic |
81
+
82
+ ## Why this task — and what was dropped
83
+
84
+ The CYB010 README's central concept is the "5-phase attack lifecycle
85
+ state machine", and `attack_lifecycle_phase` is the data's headline
86
+ target. We piloted six candidate targets and found:
87
+
88
+ - **`attack_lifecycle_phase` 5-class**: strongest honest result.
89
+ Acc 0.936 ± 0.007, ROC-AUC 0.988 ± 0.001 (multi-seed). All 5 classes
90
+ represented, per-class F1 range 0.48–1.00.
91
+
92
+ - **`threat_actor_profile` 5-class**: works at acc 0.84 but per-class
93
+ F1 reveals it's almost entirely driven by `benign_user` separation
94
+ (F1 1.00 vs F1 0.17-0.69 for the 4 malicious classes). The 4-class
95
+ malicious-only formulation is below majority (acc 0.55 vs 0.61).
96
+
97
+ - **`label_true_positive` binary on alerts**: documented as a secondary
98
+ finding. Has 7 oracle features; honest acc 0.80, AUC 0.89 after
99
+ dropping all of them.
100
+
101
+ - **`mitre_tactic` 14-class**: hits acc 0.90 but macro-F1 0.37 -
102
+ imbalance gaming (benign class dominates at 57%).
103
+
104
+ - **`event_class` 12-class**: unlearnable (acc 0.35 vs majority 0.42).
105
+
106
+ ### Six oracle columns dropped from the phase task
107
+
108
+ CYB010 encodes the benign vs malicious distinction explicitly in
109
+ multiple columns. Each is a perfect or near-perfect oracle for the
110
+ `benign_background` phase:
111
+
112
+ | Column | Oracle relationship |
113
+ |---|---|
114
+ | `mitre_tactic` | `=="benign"` ↔ `benign_background` phase (12,448/12,448, perfect) |
115
+ | `mitre_technique_id` | Perfect ATT&CK-by-design oracle for `mitre_tactic` (54/54 techniques → single tactic) |
116
+ | `label_malicious` | `==False` ↔ `benign_background` (perfect) |
117
+ | `threat_actor_id` | `=="NONE"` ↔ `benign_background` (perfect) |
118
+ | `threat_actor_profile` | `=="benign_user"` ↔ `benign_background` (perfect) |
119
+ | `event_type` | Many values phase-specific (`c2_beacon_outbound` → 100% `exfiltration_or_impact`) |
120
+
121
+ With these six columns present, a plain XGBoost trivially separates
122
+ benign vs malicious. The published baseline trains with all six
123
+ excluded.
124
+
125
+ Two model artifacts are published. They are designed to be used
126
+ together:
127
+
128
+ - `model_xgb.json` — gradient-boosted trees (slightly higher F1)
129
+ - `model_mlp.safetensors` — PyTorch MLP
130
+
131
+ ## Quick start
132
+
133
+ ```bash
134
+ pip install xgboost torch safetensors pandas huggingface_hub
135
+ ```
136
+
137
+ ```python
138
+ from huggingface_hub import hf_hub_download, snapshot_download
139
+ import json, numpy as np, torch, xgboost as xgb
140
+ from safetensors.torch import load_file
141
+
142
+ REPO = "xpertsystems/cyb010-baseline-classifier"
143
+
144
+ paths = {n: hf_hub_download(REPO, n) for n in [
145
+ "model_xgb.json", "model_mlp.safetensors",
146
+ "feature_engineering.py", "feature_meta.json", "feature_scaler.json",
147
+ ]}
148
+
149
+ import sys, os
150
+ sys.path.insert(0, os.path.dirname(paths["feature_engineering.py"]))
151
+ from feature_engineering import (
152
+ transform_single, load_meta, build_host_lookup, INT_TO_LABEL,
153
+ )
154
+
155
+ meta = load_meta(paths["feature_meta.json"])
156
+
157
+ # Host features are joined from host_inventory.csv at inference time
158
+ ds = snapshot_download("xpertsystems/cyb010-sample", repo_type="dataset")
159
+ host_lookup = build_host_lookup(f"{ds}/host_inventory.csv")
160
+
161
+ xgb_model = xgb.XGBClassifier(); xgb_model.load_model(paths["model_xgb.json"])
162
+
163
+ # Predict (see inference_example.ipynb for the full pattern)
164
+ # Note: do NOT include mitre_tactic, mitre_technique_id, label_malicious,
165
+ # threat_actor_id, threat_actor_profile, or event_type - those were the
166
+ # oracle columns.
167
+ X = transform_single(my_event, meta, host_lookup=host_lookup)
168
+ proba = xgb_model.predict_proba(X)[0]
169
+ print(INT_TO_LABEL[int(np.argmax(proba))])
170
+ ```
171
+
172
+ See [`inference_example.ipynb`](./inference_example.ipynb) for the full
173
+ copy-paste demo.
174
+
175
+ ## Training data
176
+
177
+ Trained on the public sample of CYB010, 21,896 per-event records:
178
+
179
+ | Phase | Events | Class share |
180
+ |---|---:|---:|
181
+ | `benign_background` | 12,448 | 56.9% |
182
+ | `exfiltration_or_impact` | 6,205 | 28.3% |
183
+ | `initial_access` | 1,674 | 7.6% |
184
+ | `lateral_movement` | 968 | 4.4% |
185
+ | `persistence_establishment` | 601 | 2.7% |
186
+
187
+ ### Group-aware split by incident_id
188
+
189
+ 500 incidents × ~44 events each. Events from the same incident share
190
+ host, threat actor, and phase trajectory — so train/test contamination
191
+ is a real risk with random splitting. The baseline uses
192
+ **GroupShuffleSplit** on `incident_id` (nested 70/15/15):
193
+
194
+ | Fold | Events | Incidents |
195
+ |---|---:|---:|
196
+ | Train | 14,697 | ~350 |
197
+ | Validation | 3,473 | ~75 |
198
+ | Test | 3,726 | ~75 |
199
+
200
+ All 10 multi-seed evaluations yielded all 5 classes in the test fold.
201
+ Class imbalance is addressed with `class_weight='balanced'` (XGBoost
202
+ `sample_weight`) and weighted cross-entropy (MLP).
203
+
204
+ ## Feature pipeline
205
+
206
+ The bundled `feature_engineering.py` is the canonical recipe. 87
207
+ features survive after encoding, drawn from:
208
+
209
+ - **Per-event numeric** (5): `source_port`, `dest_port`,
210
+ `cvss_score_analogue`, `label_log_tampered`, `label_false_positive`
211
+ - **Per-event categorical** (3, one-hot): `event_class` (12 values),
212
+ `log_source_type` (8 values), `severity_level` (5 values)
213
+ - **Host features** (joined from `host_inventory.csv`): 3 numeric +
214
+ 7 categorical (os_type, host_role, network_segment, defender_posture,
215
+ criticality_rating, cloud_provider, siem_platform)
216
+ - **Engineered** (9): `hour_of_day`, `is_off_hours`, `is_weekend`,
217
+ `log_cvss`, `is_high_cvss`, `is_well_known_port`, `is_dynamic_port`,
218
+ `is_outbound_web`, `risk_composite`
219
+
220
+ ### Partial-oracle features kept as legitimate observables
221
+
222
+ `event_class` (max purity 0.87, mean 0.72 across phases) is the
223
+ strongest non-oracle feature. C2 beacon traffic (`event_class =
224
+ network_flow`) is 65% exfiltration phase but also 29% benign and 6%
225
+ other phases — real overlap, not deterministic encoding. Kept.
226
+
227
+ `severity_level` and `cvss_score_analogue` correlate strongly with
228
+ phase (high-severity events skew toward exfil and initial_access) but
229
+ with substantial overlap. Kept.
230
+
231
+ `label_log_tampered` is a real observable — APTs tamper more than
232
+ script_kiddies — but is not phase-deterministic. Kept.
233
+
234
+ ## Evaluation
235
+
236
+ ### Test-set metrics, seed 42 (n = 3,726 events from ~75 test incidents)
237
+
238
+ **XGBoost** (the published `model_xgb.json` artifact)
239
+
240
+ | Metric | Value |
241
+ |---|---:|
242
+ | Macro ROC-AUC (OvR) | **0.9904** |
243
+ | Accuracy | **0.9493** |
244
+ | Macro-F1 | 0.7781 |
245
+ | Weighted-F1 | 0.9478 |
246
+
247
+ **MLP** (the published `model_mlp.safetensors` artifact)
248
+
249
+ | Metric | Value |
250
+ |---|---:|
251
+ | Macro ROC-AUC (OvR) | **0.9861** |
252
+ | Accuracy | **0.9412** |
253
+ | Macro-F1 | 0.7534 |
254
+ | Weighted-F1 | 0.9396 |
255
+
256
+ XGBoost slightly outperforms MLP on this task (acc 0.949 vs 0.941,
257
+ macro-F1 0.778 vs 0.753). The gap is consistent across seeds.
258
+
259
+ ### Multi-seed robustness (XGBoost, 10 seeds)
260
+
261
+ | Metric | Mean | Std | Min | Max |
262
+ |---|---:|---:|---:|---:|
263
+ | Accuracy | 0.936 | 0.007 | 0.923 | 0.949 |
264
+ | Macro-F1 | 0.759 | 0.015 | 0.741 | 0.781 |
265
+ | Macro ROC-AUC OvR | 0.988 | 0.001 | 0.986 | 0.990 |
266
+
267
+ **Tightest ROC-AUC std in the catalog** (0.001). All 10 seeds yielded
268
+ all 5 classes in the test fold. Full per-seed results in
269
+ [`multi_seed_results.json`](./multi_seed_results.json).
270
+
271
+ ### Per-class F1 (seed 42)
272
+
273
+ | Phase | Class share | XGBoost F1 | MLP F1 |
274
+ |---|---:|---:|---:|
275
+ | `benign_background` | 56.9% | **0.998** | 0.994 |
276
+ | `exfiltration_or_impact` | 28.3% | **0.987** | 0.981 |
277
+ | `initial_access` | 7.6% | 0.720 | 0.651 |
278
+ | `persistence_establishment` | 2.7% | 0.703 | 0.690 |
279
+ | `lateral_movement` | 4.4% | **0.483** | 0.451 |
280
+
281
+ The two largest classes (`benign_background` and `exfiltration_or_impact`)
282
+ are nearly perfectly separable — `benign_background` because the
283
+ non-oracle features (severity, CVSS, log_source) still cleanly separate
284
+ non-malicious traffic, and `exfiltration_or_impact` because it's
285
+ dominated by network_flow events (C2 beacons). The three middle
286
+ classes overlap substantially in feature space; `lateral_movement` is
287
+ the hardest (F1 0.48) because lateral movement events look similar to
288
+ initial_access events at the per-event level. A sequence model that
289
+ considers event ordering within an incident would likely do better
290
+ than the per-event baseline.
291
+
292
+ ### Ablation: which feature groups matter
293
+
294
+ | Configuration | Accuracy | Macro-F1 | ROC-AUC | Δ accuracy | Δ macro-F1 |
295
+ |---|---:|---:|---:|---:|---:|
296
+ | Full feature set (published) | 0.9493 | 0.7781 | 0.9904 | — | — |
297
+ | No `event_class` | 0.9206 | 0.5969 | 0.9723 | **−0.0287** | **−0.181** |
298
+ | No CVSS features | 0.9383 | 0.7475 | 0.9812 | −0.0110 | −0.031 |
299
+ | No `log_source_type` | 0.9469 | 0.7655 | 0.9902 | −0.0024 | −0.013 |
300
+ | No engineered features | 0.9471 | 0.7655 | 0.9903 | −0.0022 | −0.013 |
301
+ | No ports | 0.9463 | 0.7621 | 0.9903 | −0.0030 | −0.016 |
302
+ | No `severity_level` | 0.9479 | 0.7688 | 0.9902 | −0.0014 | −0.009 |
303
+ | No tamper flags | 0.9469 | 0.7657 | 0.9905 | −0.0024 | −0.012 |
304
+ | No timing | 0.9501 | 0.7730 | 0.9907 | +0.0008 | −0.005 |
305
+ | No host features | 0.9522 | 0.7828 | 0.9917 | +0.0029 | +0.005 |
306
+
307
+ Three findings:
308
+
309
+ 1. **`event_class` is the dominant signal** (drops 18pp macro-F1 when
310
+ removed). Phase prediction without it loses most discrimination
311
+ between the middle classes.
312
+ 2. **CVSS features are second-strongest** (drops 3pp F1). Captures
313
+ severity information that complements event_class.
314
+ 3. **Host features and timing add modest noise.** The model performs
315
+ marginally *better* without host features (+0.3pp accuracy), and
316
+ timing features contribute essentially nothing. Kept in the
317
+ pipeline as documented baseline reference.
318
+
319
+ ### Architecture
320
+
321
+ **XGBoost:** multi-class gradient boosting (`multi:softprob`, 5 classes),
322
+ `hist` tree method, class-balanced sample weights, early stopping on
323
+ validation mlogloss.
324
+
325
+ **MLP:** `87 → 128 → 64 → 5`, each hidden layer followed by `BatchNorm1d`
326
+ → `ReLU` → `Dropout(0.3)`, weighted cross-entropy loss, AdamW optimizer,
327
+ early stopping on validation macro-F1.
328
+
329
+ Training hyperparameters are held internally by XpertSystems.
330
+
331
+ ## Limitations
332
+
333
+ **This is a baseline reference, not a production phase classifier.**
334
+
335
+ 1. **The leakage diagnostic is required reading.** Six oracle columns
336
+ for the phase task and seven for the alert TP task are documented
337
+ in `leakage_diagnostic.json`. If you use CYB010 sample data for
338
+ your own training, you MUST drop these or your model will learn
339
+ the oracles instead of the task.
340
+
341
+ 2. **`lateral_movement` F1 0.48 is the weakest class.** The 968-event
342
+ sample with substantial overlap to `initial_access` makes this
343
+ class hard. A sequence model that considers event ordering within
344
+ incidents would likely do better than per-event classification.
345
+
346
+ 3. **`threat_actor_profile` 4-class (malicious-only) is unlearnable
347
+ on this sample** (acc 0.55 vs majority 0.61). The 5-class
348
+ formulation with benign included works only because benign_user
349
+ separation is structurally trivial.
350
+
351
+ 4. **`event_class` 12-class is unlearnable on this sample** (acc 0.35
352
+ vs majority 0.42). event_class is a structural property of the
353
+ event itself, not something to predict from other features.
354
+
355
+ 5. **Synthetic-vs-real transfer.** The dataset is synthetic, calibrated
356
+ to 6 benchmarks from SANS / IBM / Mandiant / Verizon / CISA / MITRE
357
+ ATT&CK Evaluations / Splunk. Real SIEM telemetry has different noise
358
+ characteristics — and in particular, the explicit `mitre_tactic ==
359
+ "benign"` marker and `threat_actor_id == "NONE"` benign sentinel
360
+ would not be present in real data. Real telemetry has implicit
361
+ benign-vs-malicious distinctions that emerge from event content.
362
+ Do not assume metrics transfer end-to-end.
363
+
364
+ 6. **21,896 events / 500 incidents is a modest training set.** The
365
+ 3,726-event / ~75-incident test fold yields stable multi-seed
366
+ metrics (std 0.007 on accuracy) but per-class confidence intervals
367
+ widen for the smallest classes (lateral_movement, persistence).
368
+
369
+ ## Notes on dataset schema
370
+
371
+ The CYB010 sample dataset README describes some fields differently
372
+ from the actual schema. The model was trained on the actual schema;
373
+ this note helps buyers reconcile what they read with what they receive.
374
+
375
+ | What the README says | What the data actually contains |
376
+ |---|---|
377
+ | `security_events` has 16 columns | Data has **23 columns** |
378
+ | Field renames | `timestamp_utc` → `timestamp`, `user` → `user_id`, `log_format` → `log_source_type` |
379
+ | README missing from `security_events` | `event_class`, `severity_level`, `label_malicious`, `label_log_tampered`, `threat_actor_id`, `cvss_score_analogue` are in data but not documented |
380
+ | README claims `command_line` / `process_name` / `is_off_hours` columns | Not present in `security_events` (off-hours derived from timestamp in pipeline) |
381
+ | `alert_records` has 9 columns | Data has **21 columns** |
382
+ | Field renames | `alert_severity` → `severity_level`, `detection_rule` → `alert_rule_name` |
383
+ | README's `triage_outcome` (categorical) | Replaced by `label_true_positive` / `label_false_positive` (mirror booleans) |
384
+ | README's `ioc_matched` | Not present in `alert_records` |
385
+ | README missing from `alert_records` | `correlated_chain_length`, `time_to_detect_seconds`, `suppression_reason`, `analyst_triage_priority` are in data but not documented |
386
+ | `incident_summary` has 8 columns | Data has **24 columns** |
387
+ | `host_inventory` has 6 columns | Data has **15 columns** |
388
+ | `threat_actor_profile` has 4 values | Data has **5 values** (adds `benign_user` at 57% of events) |
389
+ | `attack_lifecycle_phase` 5-phase malicious lifecycle | Data adds `benign_background` as a phase value (57% of events) — so the lifecycle is 5-class with benign included |
390
+ | README says MITRE ATT&CK v14 with 50 techniques | Data has 54 unique technique IDs across 14 tactics + benign |
391
+
392
+ None of these affects model correctness — the feature pipeline uses
393
+ the actual column names. If you build your own pipeline against the
394
+ dataset, use the actual columns.
395
+
396
+ ## Intended use
397
+
398
+ - **Evaluating fit** of the CYB010 dataset for your SIEM ML research
399
+ - **Baseline reference** for new model architectures on the
400
+ attack-phase classification task
401
+ - **Reference example of structural-leakage diagnostics** for
402
+ synthetic SIEM datasets — the methodology is reusable
403
+ - **Feature engineering reference** for per-event SIEM telemetry
404
+
405
+ ## Out-of-scope use
406
+
407
+ - Production SIEM phase detection on real telemetry
408
+ - Threat actor attribution (4-class malicious-only is unlearnable
409
+ on the sample)
410
+ - Event-class prediction (this is a structural property, not a
411
+ learnable target)
412
+ - Any operational decision affecting actual security operations
413
+ without further validation on your own data
414
+
415
+ ## Reproducibility
416
+
417
+ Outputs above were produced with `seed = 42` (published artifact),
418
+ nested `GroupShuffleSplit` on `incident_id` (70/15/15), on the published
419
+ sample (`xpertsystems/cyb010-sample`, version 1.0.0, generated
420
+ 2026-05-16). The feature pipeline in `feature_engineering.py` is
421
+ deterministic and the trained weights in this repo correspond exactly
422
+ to the metrics above.
423
+
424
+ Multi-seed results (seeds 42, 7, 13, 17, 23, 31, 45, 99, 123, 200)
425
+ in `multi_seed_results.json` confirm robust performance across splits
426
+ (std 0.007 on accuracy, 0.001 on ROC-AUC — the tightest ROC-AUC std
427
+ in the XpertSystems catalog).
428
+
429
+ The training script itself is private to XpertSystems.
430
+
431
+ ## Files in this repo
432
+
433
+ | File | Purpose |
434
+ |---|---|
435
+ | `model_xgb.json` | XGBoost weights (seed 42) |
436
+ | `model_mlp.safetensors` | PyTorch MLP weights (seed 42) |
437
+ | `feature_engineering.py` | Feature pipeline |
438
+ | `feature_meta.json` | Feature column order + categorical levels |
439
+ | `feature_scaler.json` | MLP input mean/std (XGBoost ignores) |
440
+ | `validation_results.json` | Per-class metrics, confusion matrix, architecture |
441
+ | `ablation_results.json` | Per-feature-group ablation |
442
+ | `multi_seed_results.json` | XGBoost metrics across 10 seeds |
443
+ | **`leakage_diagnostic.json`** | **11-oracle-path audit + 2 unlearnable targets** |
444
+ | `inference_example.ipynb` | End-to-end inference demo notebook |
445
+ | `README.md` | This file |
446
+
447
+ ## Contact and full product
448
+
449
+ The full **CYB010** dataset contains **~550,000 rows** across four files,
450
+ with calibrated benchmark validation against 6 metrics drawn from
451
+ authoritative SOC operations and threat intelligence sources (SANS SOC
452
+ Survey, IBM Cost of Data Breach, Mandiant M-Trends, Verizon DBIR, CISA
453
+ Joint Advisories, MITRE ATT&CK Evaluations, Splunk State of Security).
454
+
455
+ The full XpertSystems.ai synthetic data catalogue spans 41 SKUs across
456
+ Cybersecurity, Healthcare, Insurance & Risk, Oil & Gas, and Materials
457
+ & Energy.
458
+
459
+ - 📧 **pradeep@xpertsystems.ai**
460
+ - 🌐 **https://xpertsystems.ai**
461
+ - 🗂 Dataset: https://huggingface.co/datasets/xpertsystems/cyb010-sample
462
+ - 🤖 Companion models:
463
+ - https://huggingface.co/xpertsystems/cyb001-baseline-classifier (network traffic)
464
+ - https://huggingface.co/xpertsystems/cyb002-baseline-classifier (ATT&CK kill-chain)
465
+ - https://huggingface.co/xpertsystems/cyb003-baseline-classifier (malware execution phase)
466
+ - https://huggingface.co/xpertsystems/cyb004-baseline-classifier (phishing campaign phase)
467
+ - https://huggingface.co/xpertsystems/cyb005-baseline-classifier (ransomware actor-tier attribution)
468
+ - https://huggingface.co/xpertsystems/cyb006-baseline-classifier (user risk tier + leakage diagnostic)
469
+ - https://huggingface.co/xpertsystems/cyb007-baseline-classifier (insider threat type)
470
+ - https://huggingface.co/xpertsystems/cyb008-baseline-classifier (SOC alert triage + leakage diagnostic)
471
+ - https://huggingface.co/xpertsystems/cyb009-baseline-classifier (vulnerability classification + leakage diagnostic)
472
+
473
+ ## Citation
474
+
475
+ ```bibtex
476
+ @misc{xpertsystems_cyb010_baseline_2026,
477
+ title = {CYB010 Baseline Classifier: XGBoost and MLP for Attack Lifecycle Phase Classification, with 11-Oracle-Path Leakage Diagnostic},
478
+ author = {XpertSystems.ai},
479
+ year = {2026},
480
+ url = {https://huggingface.co/xpertsystems/cyb010-baseline-classifier},
481
+ note = {Baseline reference model + comprehensive leakage audit trained on xpertsystems/cyb010-sample}
482
+ }
483
+ ```
ablation_results.json ADDED
@@ -0,0 +1,659 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "purpose": "Quantify how much each feature group contributes to the headline XGBoost score. Identical architecture, same group-aware split, with one feature group dropped at a time.",
3
+ "full_model_metrics": {
4
+ "model": "xgboost",
5
+ "accuracy": 0.9492753623188406,
6
+ "macro_f1": 0.7780594102481514,
7
+ "weighted_f1": 0.9522470071864876,
8
+ "per_class_f1": {
9
+ "benign_background": 0.9975996159385502,
10
+ "initial_access": 0.7196652719665272,
11
+ "lateral_movement": 0.48322147651006714,
12
+ "persistence_establishment": 0.703030303030303,
13
+ "exfiltration_or_impact": 0.9867803837953092
14
+ },
15
+ "confusion_matrix": {
16
+ "labels": [
17
+ "benign_background",
18
+ "initial_access",
19
+ "lateral_movement",
20
+ "persistence_establishment",
21
+ "exfiltration_or_impact"
22
+ ],
23
+ "matrix": [
24
+ [
25
+ 2078,
26
+ 6,
27
+ 0,
28
+ 0,
29
+ 0
30
+ ],
31
+ [
32
+ 4,
33
+ 172,
34
+ 65,
35
+ 6,
36
+ 0
37
+ ],
38
+ [
39
+ 0,
40
+ 38,
41
+ 72,
42
+ 6,
43
+ 2
44
+ ],
45
+ [
46
+ 0,
47
+ 11,
48
+ 22,
49
+ 58,
50
+ 0
51
+ ],
52
+ [
53
+ 0,
54
+ 4,
55
+ 21,
56
+ 4,
57
+ 1157
58
+ ]
59
+ ]
60
+ },
61
+ "macro_roc_auc_ovr": 0.9904125505537232
62
+ },
63
+ "ablations": {
64
+ "no_event_class": {
65
+ "n_features": 75,
66
+ "dropped_count": 12,
67
+ "metrics": {
68
+ "model": "xgboost_no_event_class",
69
+ "accuracy": 0.9205582393988191,
70
+ "macro_f1": 0.5968926085832369,
71
+ "weighted_f1": 0.9214122465392139,
72
+ "per_class_f1": {
73
+ "benign_background": 0.9978412089230031,
74
+ "initial_access": 0.5674044265593562,
75
+ "lateral_movement": 0.3170731707317073,
76
+ "persistence_establishment": 0.11965811965811966,
77
+ "exfiltration_or_impact": 0.9824861170439982
78
+ },
79
+ "confusion_matrix": {
80
+ "labels": [
81
+ "benign_background",
82
+ "initial_access",
83
+ "lateral_movement",
84
+ "persistence_establishment",
85
+ "exfiltration_or_impact"
86
+ ],
87
+ "matrix": [
88
+ [
89
+ 2080,
90
+ 4,
91
+ 0,
92
+ 0,
93
+ 0
94
+ ],
95
+ [
96
+ 4,
97
+ 141,
98
+ 94,
99
+ 6,
100
+ 2
101
+ ],
102
+ [
103
+ 0,
104
+ 54,
105
+ 52,
106
+ 9,
107
+ 3
108
+ ],
109
+ [
110
+ 1,
111
+ 40,
112
+ 43,
113
+ 7,
114
+ 0
115
+ ],
116
+ [
117
+ 0,
118
+ 11,
119
+ 21,
120
+ 4,
121
+ 1150
122
+ ]
123
+ ]
124
+ },
125
+ "macro_roc_auc_ovr": 0.9722802673741894
126
+ },
127
+ "delta_accuracy": 0.028717122920021487,
128
+ "delta_macro_f1": 0.1811668016649145
129
+ },
130
+ "no_log_source": {
131
+ "n_features": 79,
132
+ "dropped_count": 8,
133
+ "metrics": {
134
+ "model": "xgboost_no_log_source",
135
+ "accuracy": 0.9468599033816425,
136
+ "macro_f1": 0.7655457635864822,
137
+ "weighted_f1": 0.9496485129647918,
138
+ "per_class_f1": {
139
+ "benign_background": 0.9975996159385502,
140
+ "initial_access": 0.7080745341614907,
141
+ "lateral_movement": 0.4536082474226804,
142
+ "persistence_establishment": 0.6829268292682927,
143
+ "exfiltration_or_impact": 0.985519591141397
144
+ },
145
+ "confusion_matrix": {
146
+ "labels": [
147
+ "benign_background",
148
+ "initial_access",
149
+ "lateral_movement",
150
+ "persistence_establishment",
151
+ "exfiltration_or_impact"
152
+ ],
153
+ "matrix": [
154
+ [
155
+ 2078,
156
+ 6,
157
+ 0,
158
+ 0,
159
+ 0
160
+ ],
161
+ [
162
+ 4,
163
+ 171,
164
+ 65,
165
+ 6,
166
+ 1
167
+ ],
168
+ [
169
+ 0,
170
+ 43,
171
+ 66,
172
+ 7,
173
+ 2
174
+ ],
175
+ [
176
+ 0,
177
+ 12,
178
+ 21,
179
+ 56,
180
+ 2
181
+ ],
182
+ [
183
+ 0,
184
+ 4,
185
+ 21,
186
+ 4,
187
+ 1157
188
+ ]
189
+ ]
190
+ },
191
+ "macro_roc_auc_ovr": 0.9902223408149018
192
+ },
193
+ "delta_accuracy": 0.0024154589371980784,
194
+ "delta_macro_f1": 0.012513646661669209
195
+ },
196
+ "no_severity": {
197
+ "n_features": 82,
198
+ "dropped_count": 5,
199
+ "metrics": {
200
+ "model": "xgboost_no_severity",
201
+ "accuracy": 0.9479334406870639,
202
+ "macro_f1": 0.7688286964848263,
203
+ "weighted_f1": 0.9505815101921871,
204
+ "per_class_f1": {
205
+ "benign_background": 0.9971195391262602,
206
+ "initial_access": 0.7213114754098361,
207
+ "lateral_movement": 0.4689655172413793,
208
+ "persistence_establishment": 0.6708074534161491,
209
+ "exfiltration_or_impact": 0.985939497230507
210
+ },
211
+ "confusion_matrix": {
212
+ "labels": [
213
+ "benign_background",
214
+ "initial_access",
215
+ "lateral_movement",
216
+ "persistence_establishment",
217
+ "exfiltration_or_impact"
218
+ ],
219
+ "matrix": [
220
+ [
221
+ 2077,
222
+ 7,
223
+ 0,
224
+ 0,
225
+ 0
226
+ ],
227
+ [
228
+ 4,
229
+ 176,
230
+ 60,
231
+ 7,
232
+ 0
233
+ ],
234
+ [
235
+ 0,
236
+ 42,
237
+ 68,
238
+ 5,
239
+ 3
240
+ ],
241
+ [
242
+ 1,
243
+ 12,
244
+ 23,
245
+ 54,
246
+ 1
247
+ ],
248
+ [
249
+ 0,
250
+ 4,
251
+ 21,
252
+ 4,
253
+ 1157
254
+ ]
255
+ ]
256
+ },
257
+ "macro_roc_auc_ovr": 0.9901923411691304
258
+ },
259
+ "delta_accuracy": 0.0013419216317767102,
260
+ "delta_macro_f1": 0.009230713763325071
261
+ },
262
+ "no_cvss": {
263
+ "n_features": 84,
264
+ "dropped_count": 3,
265
+ "metrics": {
266
+ "model": "xgboost_no_cvss",
267
+ "accuracy": 0.9382716049382716,
268
+ "macro_f1": 0.7475120671323378,
269
+ "weighted_f1": 0.940926432572893,
270
+ "per_class_f1": {
271
+ "benign_background": 0.9930737998566993,
272
+ "initial_access": 0.6948775055679287,
273
+ "lateral_movement": 0.43278688524590164,
274
+ "persistence_establishment": 0.6428571428571429,
275
+ "exfiltration_or_impact": 0.9739650021340163
276
+ },
277
+ "confusion_matrix": {
278
+ "labels": [
279
+ "benign_background",
280
+ "initial_access",
281
+ "lateral_movement",
282
+ "persistence_establishment",
283
+ "exfiltration_or_impact"
284
+ ],
285
+ "matrix": [
286
+ [
287
+ 2079,
288
+ 4,
289
+ 0,
290
+ 0,
291
+ 1
292
+ ],
293
+ [
294
+ 12,
295
+ 156,
296
+ 60,
297
+ 14,
298
+ 5
299
+ ],
300
+ [
301
+ 6,
302
+ 31,
303
+ 66,
304
+ 5,
305
+ 10
306
+ ],
307
+ [
308
+ 6,
309
+ 8,
310
+ 23,
311
+ 54,
312
+ 0
313
+ ],
314
+ [
315
+ 0,
316
+ 3,
317
+ 38,
318
+ 4,
319
+ 1141
320
+ ]
321
+ ]
322
+ },
323
+ "macro_roc_auc_ovr": 0.9812083795500166
324
+ },
325
+ "delta_accuracy": 0.011003757380569024,
326
+ "delta_macro_f1": 0.03054734311581364
327
+ },
328
+ "no_host": {
329
+ "n_features": 39,
330
+ "dropped_count": 48,
331
+ "metrics": {
332
+ "model": "xgboost_no_host",
333
+ "accuracy": 0.9522275899087493,
334
+ "macro_f1": 0.7828011365615016,
335
+ "weighted_f1": 0.9541737562003638,
336
+ "per_class_f1": {
337
+ "benign_background": 0.9983217453847998,
338
+ "initial_access": 0.746268656716418,
339
+ "lateral_movement": 0.4962962962962963,
340
+ "persistence_establishment": 0.6871794871794872,
341
+ "exfiltration_or_impact": 0.985939497230507
342
+ },
343
+ "confusion_matrix": {
344
+ "labels": [
345
+ "benign_background",
346
+ "initial_access",
347
+ "lateral_movement",
348
+ "persistence_establishment",
349
+ "exfiltration_or_impact"
350
+ ],
351
+ "matrix": [
352
+ [
353
+ 2082,
354
+ 1,
355
+ 0,
356
+ 1,
357
+ 0
358
+ ],
359
+ [
360
+ 4,
361
+ 175,
362
+ 49,
363
+ 18,
364
+ 1
365
+ ],
366
+ [
367
+ 0,
368
+ 36,
369
+ 67,
370
+ 13,
371
+ 2
372
+ ],
373
+ [
374
+ 1,
375
+ 6,
376
+ 16,
377
+ 67,
378
+ 1
379
+ ],
380
+ [
381
+ 0,
382
+ 4,
383
+ 20,
384
+ 5,
385
+ 1157
386
+ ]
387
+ ]
388
+ },
389
+ "macro_roc_auc_ovr": 0.9917448228530954
390
+ },
391
+ "delta_accuracy": -0.0029522275899087624,
392
+ "delta_macro_f1": -0.004741726313350236
393
+ },
394
+ "no_timing": {
395
+ "n_features": 84,
396
+ "dropped_count": 3,
397
+ "metrics": {
398
+ "model": "xgboost_no_timing",
399
+ "accuracy": 0.9500805152979066,
400
+ "macro_f1": 0.7730074031058032,
401
+ "weighted_f1": 0.9527084816660557,
402
+ "per_class_f1": {
403
+ "benign_background": 0.9990407673860912,
404
+ "initial_access": 0.7326315789473684,
405
+ "lateral_movement": 0.48484848484848486,
406
+ "persistence_establishment": 0.6625766871165644,
407
+ "exfiltration_or_impact": 0.985939497230507
408
+ },
409
+ "confusion_matrix": {
410
+ "labels": [
411
+ "benign_background",
412
+ "initial_access",
413
+ "lateral_movement",
414
+ "persistence_establishment",
415
+ "exfiltration_or_impact"
416
+ ],
417
+ "matrix": [
418
+ [
419
+ 2083,
420
+ 1,
421
+ 0,
422
+ 0,
423
+ 0
424
+ ],
425
+ [
426
+ 3,
427
+ 174,
428
+ 60,
429
+ 8,
430
+ 2
431
+ ],
432
+ [
433
+ 0,
434
+ 39,
435
+ 72,
436
+ 5,
437
+ 2
438
+ ],
439
+ [
440
+ 0,
441
+ 9,
442
+ 28,
443
+ 54,
444
+ 0
445
+ ],
446
+ [
447
+ 0,
448
+ 5,
449
+ 19,
450
+ 5,
451
+ 1157
452
+ ]
453
+ ]
454
+ },
455
+ "macro_roc_auc_ovr": 0.9906863118522171
456
+ },
457
+ "delta_accuracy": -0.0008051529790660261,
458
+ "delta_macro_f1": 0.005052007142348214
459
+ },
460
+ "no_ports": {
461
+ "n_features": 82,
462
+ "dropped_count": 5,
463
+ "metrics": {
464
+ "model": "xgboost_no_ports",
465
+ "accuracy": 0.9463231347289318,
466
+ "macro_f1": 0.7620715002556177,
467
+ "weighted_f1": 0.949550457691939,
468
+ "per_class_f1": {
469
+ "benign_background": 0.9978401727861771,
470
+ "initial_access": 0.7036247334754797,
471
+ "lateral_movement": 0.45544554455445546,
472
+ "persistence_establishment": 0.6666666666666666,
473
+ "exfiltration_or_impact": 0.9867803837953092
474
+ },
475
+ "confusion_matrix": {
476
+ "labels": [
477
+ "benign_background",
478
+ "initial_access",
479
+ "lateral_movement",
480
+ "persistence_establishment",
481
+ "exfiltration_or_impact"
482
+ ],
483
+ "matrix": [
484
+ [
485
+ 2079,
486
+ 5,
487
+ 0,
488
+ 0,
489
+ 0
490
+ ],
491
+ [
492
+ 4,
493
+ 165,
494
+ 72,
495
+ 6,
496
+ 0
497
+ ],
498
+ [
499
+ 0,
500
+ 38,
501
+ 69,
502
+ 9,
503
+ 2
504
+ ],
505
+ [
506
+ 0,
507
+ 11,
508
+ 24,
509
+ 56,
510
+ 0
511
+ ],
512
+ [
513
+ 0,
514
+ 3,
515
+ 20,
516
+ 6,
517
+ 1157
518
+ ]
519
+ ]
520
+ },
521
+ "macro_roc_auc_ovr": 0.9902855327593585
522
+ },
523
+ "delta_accuracy": 0.0029522275899087624,
524
+ "delta_macro_f1": 0.015987909992533744
525
+ },
526
+ "no_engineered": {
527
+ "n_features": 79,
528
+ "dropped_count": 8,
529
+ "metrics": {
530
+ "model": "xgboost_no_engineered",
531
+ "accuracy": 0.9471282877079978,
532
+ "macro_f1": 0.7655097846280253,
533
+ "weighted_f1": 0.9499972622574527,
534
+ "per_class_f1": {
535
+ "benign_background": 0.9975984630163305,
536
+ "initial_access": 0.7166666666666667,
537
+ "lateral_movement": 0.4697986577181208,
538
+ "persistence_establishment": 0.6583850931677019,
539
+ "exfiltration_or_impact": 0.9851000425713069
540
+ },
541
+ "confusion_matrix": {
542
+ "labels": [
543
+ "benign_background",
544
+ "initial_access",
545
+ "lateral_movement",
546
+ "persistence_establishment",
547
+ "exfiltration_or_impact"
548
+ ],
549
+ "matrix": [
550
+ [
551
+ 2077,
552
+ 7,
553
+ 0,
554
+ 0,
555
+ 0
556
+ ],
557
+ [
558
+ 3,
559
+ 172,
560
+ 63,
561
+ 8,
562
+ 1
563
+ ],
564
+ [
565
+ 0,
566
+ 40,
567
+ 70,
568
+ 5,
569
+ 3
570
+ ],
571
+ [
572
+ 0,
573
+ 10,
574
+ 26,
575
+ 53,
576
+ 2
577
+ ],
578
+ [
579
+ 0,
580
+ 4,
581
+ 21,
582
+ 4,
583
+ 1157
584
+ ]
585
+ ]
586
+ },
587
+ "macro_roc_auc_ovr": 0.9903013631552575
588
+ },
589
+ "delta_accuracy": 0.0021470746108427363,
590
+ "delta_macro_f1": 0.01254962562012607
591
+ },
592
+ "no_tamper": {
593
+ "n_features": 85,
594
+ "dropped_count": 2,
595
+ "metrics": {
596
+ "model": "xgboost_no_tamper",
597
+ "accuracy": 0.9468599033816425,
598
+ "macro_f1": 0.7656884000157337,
599
+ "weighted_f1": 0.9499631319237402,
600
+ "per_class_f1": {
601
+ "benign_background": 0.9980806142034548,
602
+ "initial_access": 0.7048832271762208,
603
+ "lateral_movement": 0.4605263157894737,
604
+ "persistence_establishment": 0.6790123456790124,
605
+ "exfiltration_or_impact": 0.985939497230507
606
+ },
607
+ "confusion_matrix": {
608
+ "labels": [
609
+ "benign_background",
610
+ "initial_access",
611
+ "lateral_movement",
612
+ "persistence_establishment",
613
+ "exfiltration_or_impact"
614
+ ],
615
+ "matrix": [
616
+ [
617
+ 2080,
618
+ 4,
619
+ 0,
620
+ 0,
621
+ 0
622
+ ],
623
+ [
624
+ 4,
625
+ 166,
626
+ 70,
627
+ 6,
628
+ 1
629
+ ],
630
+ [
631
+ 0,
632
+ 39,
633
+ 70,
634
+ 7,
635
+ 2
636
+ ],
637
+ [
638
+ 0,
639
+ 11,
640
+ 24,
641
+ 55,
642
+ 1
643
+ ],
644
+ [
645
+ 0,
646
+ 4,
647
+ 22,
648
+ 3,
649
+ 1157
650
+ ]
651
+ ]
652
+ },
653
+ "macro_roc_auc_ovr": 0.9904534455006762
654
+ },
655
+ "delta_accuracy": 0.0024154589371980784,
656
+ "delta_macro_f1": 0.012371010232417712
657
+ }
658
+ }
659
+ }
feature_engineering.py ADDED
@@ -0,0 +1,413 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ feature_engineering.py
3
+ ======================
4
+
5
+ Feature pipeline for the CYB010 baseline classifier.
6
+
7
+ Predicts `attack_lifecycle_phase` (5-class attack phase) from per-event
8
+ features on the CYB010 sample dataset.
9
+
10
+ CSV inputs:
11
+ security_events.csv (primary, one row per event, 21,896 events)
12
+ host_inventory.csv (per-host registry, joined for host context)
13
+ alert_records.csv (per-alert records; reserved)
14
+ incident_summary.csv (per-incident summaries; reserved)
15
+
16
+ Target classes (5):
17
+ benign_background, initial_access, lateral_movement,
18
+ persistence_establishment, exfiltration_or_impact
19
+
20
+ Why this task
21
+ -------------
22
+ The CYB010 README's central concept is the "5-phase attack lifecycle
23
+ state machine", and `attack_lifecycle_phase` is the data's headline
24
+ target. We piloted six candidate targets and found it gives the
25
+ strongest honest result on the sample (acc 0.95, macro-F1 0.78,
26
+ ROC-AUC 0.99 with group-aware split on incident_id).
27
+
28
+ The other README-suggested targets either have unrecoverable structural
29
+ leakage or are weaker after honest leak removal:
30
+
31
+ - `threat_actor_profile` 5-class works (acc 0.84) but is benign-driven
32
+ - 4-class malicious-only collapses to acc 0.57 vs majority 0.61.
33
+ - `label_true_positive` on alerts has 9 oracle features; after dropping
34
+ all of them, honest acc 0.80, AUC 0.89 (documented as a secondary
35
+ finding in leakage_diagnostic.json).
36
+ - `mitre_tactic` 14-class hits 0.90 acc but macro-F1 0.37 - imbalance
37
+ gaming (benign class dominates at 57%).
38
+ - `event_class` 12-class is unlearnable (acc 0.35 vs majority 0.42).
39
+
40
+ Group structure
41
+ ---------------
42
+ 500 incidents x ~44 events each. The per-event task has clear group
43
+ structure: events from the same incident share host, threat actor, and
44
+ phase trajectory. Group-aware split by `incident_id` is required to
45
+ prevent train/test contamination. With 500 incidents, ~75 test
46
+ incidents per fold gives reasonable estimation precision.
47
+
48
+ Leakage audit
49
+ -------------
50
+ Four columns dropped from features because they're structural oracles
51
+ for the target:
52
+
53
+ 1. `mitre_tactic`: when == "benign", deterministically pins
54
+ attack_lifecycle_phase == "benign_background" (12,448 cases - all
55
+ benign events).
56
+
57
+ 2. `mitre_technique_id`: perfect oracle for `mitre_tactic` by ATT&CK
58
+ design (54 techniques, each maps to exactly one tactic). Dropped
59
+ because it indirectly encodes the benign vs malicious distinction.
60
+
61
+ 3. `label_malicious`: when False, perfect oracle for
62
+ benign_background phase.
63
+
64
+ 4. `threat_actor_id`: when == "NONE", perfect oracle for benign
65
+ profile/phase. The non-"NONE" actor IDs are 10 distinct labels
66
+ that would also leak actor profile information indirectly.
67
+
68
+ 5. `threat_actor_profile`: contains "benign_user" which trivially
69
+ identifies benign_background phase.
70
+
71
+ 6. `event_type`: many event types are phase-specific
72
+ (`c2_beacon_outbound` -> 99% exfiltration_or_impact). Dropped to
73
+ avoid this near-oracle path.
74
+
75
+ KEPT features that are informative but NOT oracles:
76
+
77
+ - `event_class` (12 values): max purity 0.87, mean 0.72 - real signal
78
+ with substantial overlap. C2 beacons (network_flow class) hit 65%
79
+ exfil phase but also 29% benign. Strong feature, kept.
80
+
81
+ - `severity_level`, `cvss_score_analogue`: per-event severity is a
82
+ real observable, correlates with phase, has overlap.
83
+
84
+ - `label_log_tampered`: real observable (APTs tamper more), correlates
85
+ with malicious phases but not deterministic.
86
+
87
+ - `log_source_type`, `siem_platform`: not phase-deterministic.
88
+
89
+ - All host context features.
90
+
91
+ Public API
92
+ ----------
93
+ build_features(events_path, hosts_path) -> (X, y, ids, groups, meta)
94
+ transform_single(record, meta, host_lookup=None) -> np.ndarray
95
+ save_meta(meta, path) / load_meta(path)
96
+ build_host_lookup(hosts_path) -> dict
97
+
98
+ License
99
+ -------
100
+ Ships with the public model on Hugging Face under CC-BY-NC-4.0,
101
+ matching the dataset license. See README.md.
102
+ """
103
+
104
+ from __future__ import annotations
105
+
106
+ import json
107
+ from pathlib import Path
108
+ from typing import Any
109
+
110
+ import numpy as np
111
+ import pandas as pd
112
+
113
+ # ---------------------------------------------------------------------------
114
+ # Label space
115
+ # ---------------------------------------------------------------------------
116
+
117
+ # Ordered by attack progression.
118
+ LABEL_ORDER = [
119
+ "benign_background",
120
+ "initial_access",
121
+ "lateral_movement",
122
+ "persistence_establishment",
123
+ "exfiltration_or_impact",
124
+ ]
125
+ LABEL_TO_INT = {lbl: i for i, lbl in enumerate(LABEL_ORDER)}
126
+ INT_TO_LABEL = {i: lbl for lbl, i in LABEL_TO_INT.items()}
127
+
128
+ # ---------------------------------------------------------------------------
129
+ # Identifier and target columns
130
+ # ---------------------------------------------------------------------------
131
+
132
+ ID_COLUMNS = [
133
+ "event_id", "host_id", "incident_id", "timestamp", "user_id",
134
+ "source_ip", "dest_ip", "raw_log_payload",
135
+ ]
136
+ TARGET_COLUMN = "attack_lifecycle_phase"
137
+ GROUP_COLUMN = "incident_id"
138
+
139
+ # Oracle columns dropped from features.
140
+ ORACLE_COLUMNS = [
141
+ "mitre_tactic", # benign value -> benign_background phase
142
+ "mitre_technique_id", # ATT&CK technique -> tactic deterministic
143
+ "label_malicious", # False -> benign_background
144
+ "threat_actor_id", # NONE -> benign
145
+ "threat_actor_profile", # benign_user -> benign_background
146
+ "event_type", # many event types phase-specific (e.g. c2_beacon_outbound)
147
+ ]
148
+
149
+ # ---------------------------------------------------------------------------
150
+ # Per-event numeric features
151
+ # ---------------------------------------------------------------------------
152
+
153
+ EVENT_NUMERIC_FEATURES = [
154
+ "source_port",
155
+ "dest_port",
156
+ "cvss_score_analogue",
157
+ "label_log_tampered", # bool kept as observable
158
+ "label_false_positive", # bool kept as observable (all False on events)
159
+ ]
160
+
161
+ EVENT_CATEGORICAL_FEATURES = [
162
+ "event_class", # 12 values
163
+ "log_source_type", # 8 values
164
+ "severity_level", # 5 values
165
+ ]
166
+
167
+ # ---------------------------------------------------------------------------
168
+ # Host features (joined on host_id from host_inventory.csv)
169
+ # ---------------------------------------------------------------------------
170
+
171
+ HOST_NUMERIC_FEATURES = [
172
+ "edr_agent_installed",
173
+ "patch_compliance_level",
174
+ "vulnerability_count_open",
175
+ ]
176
+
177
+ HOST_CATEGORICAL_FEATURES = [
178
+ "os_type", # 7 values
179
+ "host_role", # 10 values
180
+ "network_segment", # 8 values
181
+ "defender_posture_tier", # 4 values
182
+ "criticality_rating", # 4 values
183
+ "cloud_provider", # 4 values
184
+ "siem_platform", # 8 values
185
+ ]
186
+
187
+
188
+ # ---------------------------------------------------------------------------
189
+ # Engineered features
190
+ # ---------------------------------------------------------------------------
191
+
192
+ def _add_engineered_features(df: pd.DataFrame) -> pd.DataFrame:
193
+ """
194
+ Six engineered features encoding phase-discriminative hypotheses.
195
+ Each composite is something a SOC analyst would compute by hand.
196
+ """
197
+ df = df.copy()
198
+
199
+ # 1. Hour of day (0-23) from timestamp, if available
200
+ if "timestamp" in df.columns:
201
+ ts = pd.to_datetime(df["timestamp"], errors="coerce")
202
+ df["hour_of_day"] = ts.dt.hour.fillna(12).astype(int)
203
+ df["is_off_hours"] = ((ts.dt.hour < 9) | (ts.dt.hour > 17)).fillna(False).astype(int)
204
+ df["is_weekend"] = (ts.dt.weekday >= 5).fillna(False).astype(int)
205
+ else:
206
+ df["hour_of_day"] = 12
207
+ df["is_off_hours"] = 0
208
+ df["is_weekend"] = 0
209
+
210
+ # 2. Log-scaled CVSS (heavy-tailed)
211
+ df["log_cvss"] = np.log1p(
212
+ df.get("cvss_score_analogue", 0).clip(lower=0)
213
+ ).astype(float)
214
+
215
+ # 3. High-CVSS indicator
216
+ df["is_high_cvss"] = (
217
+ df.get("cvss_score_analogue", 0) >= 7.0
218
+ ).astype(int)
219
+
220
+ # 4. Port category: well-known (<1024) vs registered vs dynamic
221
+ dest = df.get("dest_port", 0).fillna(0).astype(int)
222
+ df["is_well_known_port"] = (dest < 1024).astype(int)
223
+ df["is_dynamic_port"] = (dest >= 49152).astype(int)
224
+
225
+ # 5. Network direction: same-network if source_port equals dest_port
226
+ # OR if specific dest_port matches common service. Rough proxy.
227
+ df["is_outbound_web"] = (dest.isin([80, 443, 8080, 8443])).astype(int)
228
+
229
+ # 6. Risk composite: CVSS x defender_weakness. Higher composite -> later phase.
230
+ if "patch_compliance_level" in df.columns:
231
+ df["risk_composite"] = (
232
+ df["cvss_score_analogue"].fillna(0) *
233
+ (1 - df["patch_compliance_level"].fillna(1))
234
+ ).astype(float)
235
+ else:
236
+ df["risk_composite"] = 0.0
237
+
238
+ return df
239
+
240
+
241
+ # ---------------------------------------------------------------------------
242
+ # Public API
243
+ # ---------------------------------------------------------------------------
244
+
245
+ def build_features(
246
+ events_path: str | Path,
247
+ hosts_path: str | Path,
248
+ ) -> tuple[pd.DataFrame, pd.Series, pd.Series, pd.Series, dict[str, Any]]:
249
+ """
250
+ Load security_events.csv, join host_inventory.csv, drop target +
251
+ identifiers + oracle columns, engineer features, one-hot encode,
252
+ return (X, y, ids, groups, meta).
253
+ """
254
+ events = pd.read_csv(events_path)
255
+ hosts = pd.read_csv(hosts_path)
256
+
257
+ y = events[TARGET_COLUMN].map(LABEL_TO_INT)
258
+ if y.isna().any():
259
+ bad = events.loc[y.isna(), TARGET_COLUMN].unique()
260
+ raise ValueError(f"Unknown attack_lifecycle_phase values: {bad}")
261
+ y = y.astype(int)
262
+ ids = events["event_id"].copy()
263
+ groups = events[GROUP_COLUMN].copy()
264
+
265
+ host_cols_needed = (
266
+ ["host_id"] + HOST_NUMERIC_FEATURES + HOST_CATEGORICAL_FEATURES
267
+ )
268
+ events = events.merge(
269
+ hosts[host_cols_needed], on="host_id", how="left",
270
+ )
271
+
272
+ # Apply engineered features BEFORE dropping timestamp
273
+ events = _add_engineered_features(events)
274
+
275
+ events = events.drop(
276
+ columns=ID_COLUMNS + [TARGET_COLUMN] + ORACLE_COLUMNS,
277
+ errors="ignore",
278
+ )
279
+
280
+ numeric_features = (
281
+ EVENT_NUMERIC_FEATURES
282
+ + HOST_NUMERIC_FEATURES
283
+ + [
284
+ "hour_of_day", "is_off_hours", "is_weekend",
285
+ "log_cvss", "is_high_cvss",
286
+ "is_well_known_port", "is_dynamic_port", "is_outbound_web",
287
+ "risk_composite",
288
+ ]
289
+ )
290
+ numeric_features = [c for c in numeric_features if c in events.columns]
291
+ X_numeric = events[numeric_features].apply(
292
+ lambda s: s.astype(float) if s.dtype != bool else s.astype(int).astype(float)
293
+ )
294
+
295
+ all_categorical = EVENT_CATEGORICAL_FEATURES + HOST_CATEGORICAL_FEATURES
296
+ categorical_levels: dict[str, list[str]] = {}
297
+ blocks: list[pd.DataFrame] = []
298
+ for col in all_categorical:
299
+ if col not in events.columns:
300
+ continue
301
+ levels = sorted(events[col].dropna().astype(str).unique().tolist())
302
+ categorical_levels[col] = levels
303
+ block = pd.get_dummies(
304
+ events[col].astype(str).astype("category").cat.set_categories(levels),
305
+ prefix=col, dummy_na=False,
306
+ ).astype(int)
307
+ blocks.append(block)
308
+
309
+ X = pd.concat(
310
+ [X_numeric.reset_index(drop=True)]
311
+ + [b.reset_index(drop=True) for b in blocks],
312
+ axis=1,
313
+ ).fillna(0.0)
314
+
315
+ meta = {
316
+ "feature_names": X.columns.tolist(),
317
+ "numeric_features": numeric_features,
318
+ "categorical_levels": categorical_levels,
319
+ "label_to_int": LABEL_TO_INT,
320
+ "int_to_label": INT_TO_LABEL,
321
+ "oracle_excluded": ORACLE_COLUMNS,
322
+ }
323
+ return X, y, ids, groups, meta
324
+
325
+
326
+ def transform_single(
327
+ record: dict | pd.DataFrame,
328
+ meta: dict[str, Any],
329
+ host_lookup: dict | None = None,
330
+ ) -> np.ndarray:
331
+ """Encode a single event record for inference."""
332
+ if isinstance(record, dict):
333
+ df = pd.DataFrame([record.copy()])
334
+ else:
335
+ df = record.copy()
336
+
337
+ if host_lookup is not None and "host_id" in df.columns:
338
+ host_id = df["host_id"].iloc[0]
339
+ host_feats = host_lookup.get(host_id, {})
340
+ for k, v in host_feats.items():
341
+ if k not in df.columns:
342
+ df[k] = v
343
+
344
+ df = _add_engineered_features(df)
345
+
346
+ numeric = pd.DataFrame()
347
+ for col in meta["numeric_features"]:
348
+ s = df.get(col, pd.Series([0.0] * len(df)))
349
+ if s.dtype == bool:
350
+ s = s.astype(int)
351
+ numeric[col] = s.astype(float).values
352
+ blocks: list[pd.DataFrame] = [numeric]
353
+ for col, levels in meta["categorical_levels"].items():
354
+ val = df.get(col, pd.Series([None] * len(df))).astype(str)
355
+ block = pd.get_dummies(
356
+ val.astype("category").cat.set_categories(levels),
357
+ prefix=col, dummy_na=False,
358
+ ).astype(int)
359
+ for lvl in levels:
360
+ cname = f"{col}_{lvl}"
361
+ if cname not in block.columns:
362
+ block[cname] = 0
363
+ block = block[[f"{col}_{lvl}" for lvl in levels]]
364
+ blocks.append(block)
365
+
366
+ X = pd.concat(blocks, axis=1).fillna(0.0)
367
+ X = X.reindex(columns=meta["feature_names"], fill_value=0.0)
368
+ return X.values.astype(np.float32)
369
+
370
+
371
+ def save_meta(meta: dict[str, Any], path: str | Path) -> None:
372
+ serializable = {
373
+ "feature_names": meta["feature_names"],
374
+ "numeric_features": meta["numeric_features"],
375
+ "categorical_levels": meta["categorical_levels"],
376
+ "label_to_int": meta["label_to_int"],
377
+ "int_to_label": {str(k): v for k, v in meta["int_to_label"].items()},
378
+ "oracle_excluded": meta.get("oracle_excluded", []),
379
+ }
380
+ with open(path, "w") as f:
381
+ json.dump(serializable, f, indent=2)
382
+
383
+
384
+ def load_meta(path: str | Path) -> dict[str, Any]:
385
+ with open(path) as f:
386
+ meta = json.load(f)
387
+ meta["int_to_label"] = {int(k): v for k, v in meta["int_to_label"].items()}
388
+ return meta
389
+
390
+
391
+ def build_host_lookup(hosts_path: str | Path) -> dict[str, dict]:
392
+ """Build {host_id: {host feature values}} for inference-time lookup."""
393
+ hosts = pd.read_csv(hosts_path)
394
+ cols = HOST_NUMERIC_FEATURES + HOST_CATEGORICAL_FEATURES
395
+ out = {}
396
+ for _, row in hosts.iterrows():
397
+ out[row["host_id"]] = {c: row[c] for c in cols if c in hosts.columns}
398
+ return out
399
+
400
+
401
+ if __name__ == "__main__":
402
+ import sys
403
+ base = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("/mnt/user-data/uploads")
404
+ X, y, ids, groups, meta = build_features(
405
+ base / "security_events.csv",
406
+ base / "host_inventory.csv",
407
+ )
408
+ print(f"X shape: {X.shape}")
409
+ print(f"y shape: {y.shape}")
410
+ print(f"groups: {groups.nunique()} unique incidents")
411
+ print(f"n_features: {len(meta['feature_names'])}")
412
+ print(f"label distribution:\n{y.map(INT_TO_LABEL).value_counts()}")
413
+ print(f"X has NaN: {X.isnull().any().any()}")
feature_meta.json ADDED
@@ -0,0 +1,224 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "feature_names": [
3
+ "source_port",
4
+ "dest_port",
5
+ "cvss_score_analogue",
6
+ "label_log_tampered",
7
+ "label_false_positive",
8
+ "edr_agent_installed",
9
+ "patch_compliance_level",
10
+ "vulnerability_count_open",
11
+ "hour_of_day",
12
+ "is_off_hours",
13
+ "is_weekend",
14
+ "log_cvss",
15
+ "is_high_cvss",
16
+ "is_well_known_port",
17
+ "is_dynamic_port",
18
+ "is_outbound_web",
19
+ "risk_composite",
20
+ "event_class_application_api",
21
+ "event_class_application_waf",
22
+ "event_class_authentication",
23
+ "event_class_cloud_compute",
24
+ "event_class_cloud_iam",
25
+ "event_class_cloud_storage",
26
+ "event_class_dns_resolution",
27
+ "event_class_endpoint_filesystem",
28
+ "event_class_endpoint_process",
29
+ "event_class_endpoint_registry",
30
+ "event_class_network_flow",
31
+ "event_class_threat_intelligence_match",
32
+ "log_source_type_arcsight_esm",
33
+ "log_source_type_aws_security_hub",
34
+ "log_source_type_elastic_siem",
35
+ "log_source_type_google_chronicle",
36
+ "log_source_type_ibm_qradar",
37
+ "log_source_type_microsoft_sentinel",
38
+ "log_source_type_palo_alto_xsiam",
39
+ "log_source_type_splunk",
40
+ "severity_level_critical",
41
+ "severity_level_high",
42
+ "severity_level_informational",
43
+ "severity_level_low",
44
+ "severity_level_medium",
45
+ "os_type_cloud_managed",
46
+ "os_type_linux_debian",
47
+ "os_type_linux_rhel",
48
+ "os_type_linux_ubuntu",
49
+ "os_type_macos",
50
+ "os_type_windows_server",
51
+ "os_type_windows_workstation",
52
+ "host_role_cloud_compute_instance",
53
+ "host_role_database_server",
54
+ "host_role_domain_controller",
55
+ "host_role_file_server",
56
+ "host_role_ot_ics_controller",
57
+ "host_role_siem_collector",
58
+ "host_role_vpn_gateway",
59
+ "host_role_web_server",
60
+ "host_role_workstation_privileged",
61
+ "host_role_workstation_standard",
62
+ "network_segment_cloud_workload",
63
+ "network_segment_corporate_lan",
64
+ "network_segment_data_exfiltration_target",
65
+ "network_segment_dmz_perimeter",
66
+ "network_segment_endpoint_fleet",
67
+ "network_segment_ot_ics_control_network",
68
+ "network_segment_soc_management_plane",
69
+ "network_segment_zero_trust_segment",
70
+ "defender_posture_tier_hardened",
71
+ "defender_posture_tier_minimal",
72
+ "defender_posture_tier_standard",
73
+ "defender_posture_tier_zero_trust",
74
+ "criticality_rating_critical",
75
+ "criticality_rating_high",
76
+ "criticality_rating_low",
77
+ "criticality_rating_medium",
78
+ "cloud_provider_aws",
79
+ "cloud_provider_azure",
80
+ "cloud_provider_gcp",
81
+ "cloud_provider_on_premises",
82
+ "siem_platform_arcsight_esm",
83
+ "siem_platform_aws_security_hub",
84
+ "siem_platform_elastic_siem",
85
+ "siem_platform_google_chronicle",
86
+ "siem_platform_ibm_qradar",
87
+ "siem_platform_microsoft_sentinel",
88
+ "siem_platform_palo_alto_xsiam",
89
+ "siem_platform_splunk"
90
+ ],
91
+ "numeric_features": [
92
+ "source_port",
93
+ "dest_port",
94
+ "cvss_score_analogue",
95
+ "label_log_tampered",
96
+ "label_false_positive",
97
+ "edr_agent_installed",
98
+ "patch_compliance_level",
99
+ "vulnerability_count_open",
100
+ "hour_of_day",
101
+ "is_off_hours",
102
+ "is_weekend",
103
+ "log_cvss",
104
+ "is_high_cvss",
105
+ "is_well_known_port",
106
+ "is_dynamic_port",
107
+ "is_outbound_web",
108
+ "risk_composite"
109
+ ],
110
+ "categorical_levels": {
111
+ "event_class": [
112
+ "application_api",
113
+ "application_waf",
114
+ "authentication",
115
+ "cloud_compute",
116
+ "cloud_iam",
117
+ "cloud_storage",
118
+ "dns_resolution",
119
+ "endpoint_filesystem",
120
+ "endpoint_process",
121
+ "endpoint_registry",
122
+ "network_flow",
123
+ "threat_intelligence_match"
124
+ ],
125
+ "log_source_type": [
126
+ "arcsight_esm",
127
+ "aws_security_hub",
128
+ "elastic_siem",
129
+ "google_chronicle",
130
+ "ibm_qradar",
131
+ "microsoft_sentinel",
132
+ "palo_alto_xsiam",
133
+ "splunk"
134
+ ],
135
+ "severity_level": [
136
+ "critical",
137
+ "high",
138
+ "informational",
139
+ "low",
140
+ "medium"
141
+ ],
142
+ "os_type": [
143
+ "cloud_managed",
144
+ "linux_debian",
145
+ "linux_rhel",
146
+ "linux_ubuntu",
147
+ "macos",
148
+ "windows_server",
149
+ "windows_workstation"
150
+ ],
151
+ "host_role": [
152
+ "cloud_compute_instance",
153
+ "database_server",
154
+ "domain_controller",
155
+ "file_server",
156
+ "ot_ics_controller",
157
+ "siem_collector",
158
+ "vpn_gateway",
159
+ "web_server",
160
+ "workstation_privileged",
161
+ "workstation_standard"
162
+ ],
163
+ "network_segment": [
164
+ "cloud_workload",
165
+ "corporate_lan",
166
+ "data_exfiltration_target",
167
+ "dmz_perimeter",
168
+ "endpoint_fleet",
169
+ "ot_ics_control_network",
170
+ "soc_management_plane",
171
+ "zero_trust_segment"
172
+ ],
173
+ "defender_posture_tier": [
174
+ "hardened",
175
+ "minimal",
176
+ "standard",
177
+ "zero_trust"
178
+ ],
179
+ "criticality_rating": [
180
+ "critical",
181
+ "high",
182
+ "low",
183
+ "medium"
184
+ ],
185
+ "cloud_provider": [
186
+ "aws",
187
+ "azure",
188
+ "gcp",
189
+ "on_premises"
190
+ ],
191
+ "siem_platform": [
192
+ "arcsight_esm",
193
+ "aws_security_hub",
194
+ "elastic_siem",
195
+ "google_chronicle",
196
+ "ibm_qradar",
197
+ "microsoft_sentinel",
198
+ "palo_alto_xsiam",
199
+ "splunk"
200
+ ]
201
+ },
202
+ "label_to_int": {
203
+ "benign_background": 0,
204
+ "initial_access": 1,
205
+ "lateral_movement": 2,
206
+ "persistence_establishment": 3,
207
+ "exfiltration_or_impact": 4
208
+ },
209
+ "int_to_label": {
210
+ "0": "benign_background",
211
+ "1": "initial_access",
212
+ "2": "lateral_movement",
213
+ "3": "persistence_establishment",
214
+ "4": "exfiltration_or_impact"
215
+ },
216
+ "oracle_excluded": [
217
+ "mitre_tactic",
218
+ "mitre_technique_id",
219
+ "label_malicious",
220
+ "threat_actor_id",
221
+ "threat_actor_profile",
222
+ "event_type"
223
+ ]
224
+ }
feature_scaler.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"mean": [33252.34347145676, 2996.478260869565, 2.8174688711982037, 0.05245968565013268, 0.0, 0.7555283391168266, 0.7200855956998027, 4.552833911682656, 12.7339593114241, 0.5532421582635912, 0.2685582091583316, 0.8327589119295066, 0.30815812750901544, 0.596652378036334, 0.0, 0.6238007756685038, 0.8264450180308907, 0.015989657753282982, 0.021364904402258963, 0.2782200449071239, 0.014220589235898484, 0.029189630536844254, 0.01483295910730081, 0.009253589167857386, 0.055317411716676874, 0.0923998094849289, 0.03075457576376131, 0.4073620466761924, 0.031094781247873717, 0.10784513846363203, 0.12022861808532354, 0.1143770837585902, 0.10566782336531265, 0.13771517996870108, 0.15207185139824453, 0.1263523167993468, 0.13574198816084915, 0.023065931822820983, 0.3070014288630333, 0.33496631965707285, 0.23208818126148192, 0.10287813839559094, 0.06021637068789549, 0.1196842893107437, 0.17017078315302442, 0.14302238552085458, 0.20902224943866096, 0.08226168605837926, 0.2156222358304416, 0.20466761924202218, 0.056406069265836564, 0.04790093216302647, 0.06736068585425597, 0.01986800027216439, 0.01850717833571477, 0.02878138395590937, 0.13186364564196776, 0.07191943934136218, 0.35272504592774034, 0.11424100156494522, 0.11070286453017622, 0.13553786487038172, 0.13887187861468328, 0.14145744029393753, 0.13771517996870108, 0.10553174117166769, 0.11594202898550725, 0.2548139076001905, 0.20806967408314622, 0.48683404776484995, 0.050282370551813296, 0.048241137647138874, 0.18922229026331905, 0.3440838266312853, 0.4184527454582568, 0.019731918078519425, 0.02605974008301014, 0.014424712526365926, 0.9397836293121045, 0.10784513846363203, 0.12022861808532354, 0.1143770837585902, 0.10566782336531265, 0.13771517996870108, 0.15207185139824453, 0.1263523167993468, 0.13574198816084915], "std": [18715.207254926845, 3628.380310921406, 3.54459303672502, 0.2229597484434109, 1.0, 0.42978812956197554, 0.17539856245260071, 3.3596279846325925, 6.617606652566204, 0.497174105443988, 0.4432246202477297, 1.0144933397707419, 0.4617479865503102, 0.49058607154102, 1.0, 0.48444745480159396, 1.3184493675097533, 0.12543946439978274, 0.14460244808610237, 0.4481376083636101, 0.11840320083280491, 0.16834347108896366, 0.1208881167845266, 0.09575272369957992, 0.2286065431480699, 0.28959936317018303, 0.17265792825750237, 0.4913599872613703, 0.17357979692806114, 0.3101952797244992, 0.3252397499180114, 0.3182795299089748, 0.3074224535341671, 0.34461252093496103, 0.35910273966521833, 0.3322573102735896, 0.3425260335658089, 0.1501180467072525, 0.4612656808966241, 0.47199474836365296, 0.42217932766928296, 0.303809985457358, 0.23789537641829842, 0.3246030337165473, 0.3757955516422055, 0.35010758763584876, 0.4066241493207977, 0.2747723387770502, 0.41126730452490956, 0.4034722558944266, 0.23071204197108325, 0.21356389250949495, 0.2506541416123477, 0.1395513808948919, 0.1347809285968829, 0.16719724273077177, 0.33835397762852726, 0.25836326255215475, 0.4778343054123098, 0.31811457161396967, 0.3137745038447582, 0.3423088149527841, 0.3458245469796016, 0.34850465828256166, 0.34461252093496103, 0.30724780868090457, 0.32016628422038745, 0.435771386019691, 0.4059407557259376, 0.49984363288754735, 0.21853444401923464, 0.21428265102990002, 0.39169842291834606, 0.47508473363402964, 0.49332200863876363, 0.13908229817866685, 0.15931841410593847, 0.11923760974001169, 0.23789537641829842, 0.3101952797244992, 0.3252397499180114, 0.3182795299089748, 0.3074224535341671, 0.34461252093496103, 0.35910273966521833, 0.3322573102735896, 0.3425260335658089]}
inference_example.ipynb ADDED
@@ -0,0 +1,350 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# CYB010 Baseline Classifier — Inference Example\n",
8
+ "\n",
9
+ "End-to-end demo: load the trained XGBoost and PyTorch MLP models from the Hugging Face repo and predict the **attack lifecycle phase** for a security event.\n",
10
+ "\n",
11
+ "**Models predict one of 5 phases:** `benign_background`, `initial_access`, `lateral_movement`, `persistence_establishment`, `exfiltration_or_impact`.\n",
12
+ "\n",
13
+ "**This is a baseline reference model**, not a production phase classifier. See the model card and **`leakage_diagnostic.json`** for the structural-leakage findings (11 oracle paths documented across the dataset)."
14
+ ]
15
+ },
16
+ {
17
+ "cell_type": "markdown",
18
+ "metadata": {},
19
+ "source": [
20
+ "## 1. Install dependencies"
21
+ ]
22
+ },
23
+ {
24
+ "cell_type": "code",
25
+ "execution_count": null,
26
+ "metadata": {},
27
+ "outputs": [],
28
+ "source": [
29
+ "%pip install --quiet xgboost torch safetensors pandas numpy huggingface_hub"
30
+ ]
31
+ },
32
+ {
33
+ "cell_type": "markdown",
34
+ "metadata": {},
35
+ "source": [
36
+ "## 2. Download model artifacts from Hugging Face"
37
+ ]
38
+ },
39
+ {
40
+ "cell_type": "code",
41
+ "execution_count": null,
42
+ "metadata": {},
43
+ "outputs": [],
44
+ "source": [
45
+ "from huggingface_hub import hf_hub_download\n",
46
+ "\n",
47
+ "REPO_ID = \"xpertsystems/cyb010-baseline-classifier\"\n",
48
+ "\n",
49
+ "files = {}\n",
50
+ "for name in [\"model_xgb.json\", \"model_mlp.safetensors\",\n",
51
+ " \"feature_engineering.py\", \"feature_meta.json\",\n",
52
+ " \"feature_scaler.json\"]:\n",
53
+ " files[name] = hf_hub_download(repo_id=REPO_ID, filename=name)\n",
54
+ " print(f\" downloaded: {name}\")"
55
+ ]
56
+ },
57
+ {
58
+ "cell_type": "code",
59
+ "execution_count": null,
60
+ "metadata": {},
61
+ "outputs": [],
62
+ "source": [
63
+ "import sys, os\n",
64
+ "fe_dir = os.path.dirname(files[\"feature_engineering.py\"])\n",
65
+ "if fe_dir not in sys.path:\n",
66
+ " sys.path.insert(0, fe_dir)\n",
67
+ "\n",
68
+ "from feature_engineering import (\n",
69
+ " transform_single, load_meta, build_host_lookup, INT_TO_LABEL,\n",
70
+ ")"
71
+ ]
72
+ },
73
+ {
74
+ "cell_type": "markdown",
75
+ "metadata": {},
76
+ "source": [
77
+ "## 3. Load models and metadata"
78
+ ]
79
+ },
80
+ {
81
+ "cell_type": "code",
82
+ "execution_count": null,
83
+ "metadata": {},
84
+ "outputs": [],
85
+ "source": [
86
+ "import json\n",
87
+ "import numpy as np\n",
88
+ "import torch\n",
89
+ "import torch.nn as nn\n",
90
+ "import xgboost as xgb\n",
91
+ "from safetensors.torch import load_file\n",
92
+ "\n",
93
+ "meta = load_meta(files[\"feature_meta.json\"])\n",
94
+ "with open(files[\"feature_scaler.json\"]) as f:\n",
95
+ " scaler = json.load(f)\n",
96
+ "\n",
97
+ "N_FEATURES = len(meta[\"feature_names\"])\n",
98
+ "N_CLASSES = len(meta[\"int_to_label\"])\n",
99
+ "print(f\"feature count: {N_FEATURES}\")\n",
100
+ "print(f\"class count: {N_CLASSES}\")\n",
101
+ "print(f\"label classes: {list(meta['int_to_label'].values())}\")\n",
102
+ "print(f\"\\noracle columns excluded (do not pass these to the model):\")\n",
103
+ "for c in meta.get(\"oracle_excluded\", []):\n",
104
+ " print(f\" - {c}\")"
105
+ ]
106
+ },
107
+ {
108
+ "cell_type": "code",
109
+ "execution_count": null,
110
+ "metadata": {},
111
+ "outputs": [],
112
+ "source": [
113
+ "xgb_model = xgb.XGBClassifier()\n",
114
+ "xgb_model.load_model(files[\"model_xgb.json\"])\n",
115
+ "\n",
116
+ "# MLP architecture (must match training)\n",
117
+ "class PhaseMLP(nn.Module):\n",
118
+ " def __init__(self, n_features, n_classes=5, hidden1=128, hidden2=64, dropout=0.3):\n",
119
+ " super().__init__()\n",
120
+ " self.net = nn.Sequential(\n",
121
+ " nn.Linear(n_features, hidden1),\n",
122
+ " nn.BatchNorm1d(hidden1),\n",
123
+ " nn.ReLU(),\n",
124
+ " nn.Dropout(dropout),\n",
125
+ " nn.Linear(hidden1, hidden2),\n",
126
+ " nn.BatchNorm1d(hidden2),\n",
127
+ " nn.ReLU(),\n",
128
+ " nn.Dropout(dropout),\n",
129
+ " nn.Linear(hidden2, n_classes),\n",
130
+ " )\n",
131
+ " def forward(self, x):\n",
132
+ " return self.net(x)\n",
133
+ "\n",
134
+ "mlp_model = PhaseMLP(N_FEATURES, n_classes=N_CLASSES)\n",
135
+ "mlp_model.load_state_dict(load_file(files[\"model_mlp.safetensors\"]))\n",
136
+ "mlp_model.eval()\n",
137
+ "print(\"models loaded\")"
138
+ ]
139
+ },
140
+ {
141
+ "cell_type": "markdown",
142
+ "metadata": {},
143
+ "source": [
144
+ "## 4. Load host inventory for host-feature lookup\n",
145
+ "\n",
146
+ "The model uses host context (os_type, host_role, defender_posture, etc.) as features. To predict on a new event, we look up its host features from the host_inventory."
147
+ ]
148
+ },
149
+ {
150
+ "cell_type": "code",
151
+ "execution_count": null,
152
+ "metadata": {},
153
+ "outputs": [],
154
+ "source": [
155
+ "from huggingface_hub import snapshot_download\n",
156
+ "\n",
157
+ "ds_path = snapshot_download(repo_id=\"xpertsystems/cyb010-sample\", repo_type=\"dataset\")\n",
158
+ "host_lookup = build_host_lookup(f\"{ds_path}/host_inventory.csv\")\n",
159
+ "print(f\"loaded {len(host_lookup)} host records\")"
160
+ ]
161
+ },
162
+ {
163
+ "cell_type": "markdown",
164
+ "metadata": {},
165
+ "source": [
166
+ "## 5. Prediction helper"
167
+ ]
168
+ },
169
+ {
170
+ "cell_type": "code",
171
+ "execution_count": null,
172
+ "metadata": {},
173
+ "outputs": [],
174
+ "source": [
175
+ "MU = np.array(scaler[\"mean\"], dtype=np.float32)\n",
176
+ "SD = np.array(scaler[\"std\"], dtype=np.float32)\n",
177
+ "\n",
178
+ "def predict_attack_phase(event: dict) -> dict:\n",
179
+ " \"\"\"Predict the attack lifecycle phase for one security event.\n",
180
+ "\n",
181
+ " Note: do NOT include mitre_tactic, mitre_technique_id,\n",
182
+ " label_malicious, threat_actor_id, threat_actor_profile, or\n",
183
+ " event_type in the record. These were structural oracles in the\n",
184
+ " training data and are excluded from the feature set.\n",
185
+ "\n",
186
+ " Host features (os_type, host_role, etc.) are looked up from\n",
187
+ " host_inventory by host_id.\n",
188
+ " \"\"\"\n",
189
+ " X = transform_single(event, meta, host_lookup=host_lookup)\n",
190
+ "\n",
191
+ " xgb_proba = xgb_model.predict_proba(X)[0]\n",
192
+ " xgb_label = INT_TO_LABEL[int(np.argmax(xgb_proba))]\n",
193
+ "\n",
194
+ " Xs = ((X - MU) / SD).astype(np.float32)\n",
195
+ " with torch.no_grad():\n",
196
+ " logits = mlp_model(torch.tensor(Xs))\n",
197
+ " mlp_proba = torch.softmax(logits, dim=1).numpy()[0]\n",
198
+ " mlp_label = INT_TO_LABEL[int(np.argmax(mlp_proba))]\n",
199
+ "\n",
200
+ " return {\n",
201
+ " \"xgboost\": {\n",
202
+ " \"label\": xgb_label,\n",
203
+ " \"probabilities\": {INT_TO_LABEL[i]: float(p) for i, p in enumerate(xgb_proba)},\n",
204
+ " },\n",
205
+ " \"mlp\": {\n",
206
+ " \"label\": mlp_label,\n",
207
+ " \"probabilities\": {INT_TO_LABEL[i]: float(p) for i, p in enumerate(mlp_proba)},\n",
208
+ " },\n",
209
+ " }"
210
+ ]
211
+ },
212
+ {
213
+ "cell_type": "markdown",
214
+ "metadata": {},
215
+ "source": [
216
+ "## 6. Run on an example event\n",
217
+ "\n",
218
+ "Real high-severity authentication event from the CYB010 sample. True phase is `initial_access` — an APT session anomaly with CVSS 7.56 against a workstation."
219
+ ]
220
+ },
221
+ {
222
+ "cell_type": "code",
223
+ "execution_count": null,
224
+ "metadata": {},
225
+ "outputs": [],
226
+ "source": [
227
+ "# Real event from the sample dataset (true phase: initial_access)\n",
228
+ "example_event = {\n",
229
+ " \"host_id\": \"HOST-00352\",\n",
230
+ " \"timestamp\": \"2024-07-22T21:55:40.046569+00:00\",\n",
231
+ " \"source_port\": 27110,\n",
232
+ " \"dest_port\": 8443,\n",
233
+ " \"event_class\": \"authentication\",\n",
234
+ " \"log_source_type\": \"splunk\",\n",
235
+ " \"severity_level\": \"high\",\n",
236
+ " \"label_false_positive\": False,\n",
237
+ " \"label_log_tampered\": False,\n",
238
+ " \"cvss_score_analogue\": 7.56,\n",
239
+ "}\n",
240
+ "\n",
241
+ "result = predict_attack_phase(example_event)\n",
242
+ "\n",
243
+ "print(f\"XGBoost -> {result['xgboost']['label']}\")\n",
244
+ "for lbl, p in sorted(result['xgboost']['probabilities'].items(), key=lambda x: -x[1]):\n",
245
+ " print(f\" P({lbl:30s}) = {p:.4f}\")\n",
246
+ "\n",
247
+ "print(f\"\\nMLP -> {result['mlp']['label']}\")\n",
248
+ "for lbl, p in sorted(result['mlp']['probabilities'].items(), key=lambda x: -x[1]):\n",
249
+ " print(f\" P({lbl:30s}) = {p:.4f}\")"
250
+ ]
251
+ },
252
+ {
253
+ "cell_type": "markdown",
254
+ "metadata": {},
255
+ "source": [
256
+ "### Per-class confidence patterns\n",
257
+ "\n",
258
+ "The model has strong confidence on `benign_background` and `exfiltration_or_impact` (per-class F1 0.99 each). The middle phases (`initial_access`, `lateral_movement`, `persistence_establishment`) overlap more in feature space — expect modest confidence (0.4-0.7) on those predictions.\n",
259
+ "\n",
260
+ "`lateral_movement` is the hardest class (F1 0.48 at seed 42). Real SOC data would have stronger sequential signal (event-sequence features within an incident) that the per-event baseline does not capture."
261
+ ]
262
+ },
263
+ {
264
+ "cell_type": "markdown",
265
+ "metadata": {},
266
+ "source": [
267
+ "## 7. Batch prediction on the sample dataset"
268
+ ]
269
+ },
270
+ {
271
+ "cell_type": "code",
272
+ "execution_count": null,
273
+ "metadata": {},
274
+ "outputs": [],
275
+ "source": [
276
+ "import pandas as pd\n",
277
+ "\n",
278
+ "events = pd.read_csv(f\"{ds_path}/security_events.csv\")\n",
279
+ "\n",
280
+ "# Score the first 500 events\n",
281
+ "sample = events.head(500).copy()\n",
282
+ "preds = [predict_attack_phase(row.to_dict())[\"xgboost\"][\"label\"] for _, row in sample.iterrows()]\n",
283
+ "sample[\"xgb_pred\"] = preds\n",
284
+ "\n",
285
+ "ct = pd.crosstab(sample[\"attack_lifecycle_phase\"], sample[\"xgb_pred\"],\n",
286
+ " rownames=[\"true\"], colnames=[\"pred\"])\n",
287
+ "print(\"Confusion on first 500 sample events (XGBoost):\")\n",
288
+ "print(ct)\n",
289
+ "acc = (sample[\"attack_lifecycle_phase\"] == sample[\"xgb_pred\"]).mean()\n",
290
+ "print(f\"\\nbatch accuracy on first 500 events (in-distribution): {acc:.4f}\")\n",
291
+ "print(\"\\nNote: this includes training-set events. See validation_results.json\\n\"\n",
292
+ " \"for proper held-out test metrics (group-aware split by incident_id).\")"
293
+ ]
294
+ },
295
+ {
296
+ "cell_type": "markdown",
297
+ "metadata": {},
298
+ "source": [
299
+ "## 8. Important reading: the leakage diagnostic\n",
300
+ "\n",
301
+ "Before using CYB010 sample data to train your own models, read **`leakage_diagnostic.json`** in this repo. It documents **11 oracle paths** across the sample's targets:\n",
302
+ "\n",
303
+ "**Phase target oracles (6 paths):**\n",
304
+ "1. `mitre_tactic == \"benign\"` → 100% `benign_background` phase\n",
305
+ "2. `mitre_technique_id` → `mitre_tactic` (perfect ATT&CK-by-design oracle)\n",
306
+ "3. `label_malicious == False` → 100% `benign_background`\n",
307
+ "4. `threat_actor_id == \"NONE\"` → 100% benign\n",
308
+ "5. `threat_actor_profile == \"benign_user\"` → 100% benign\n",
309
+ "6. `event_type` (e.g. `c2_beacon_outbound`) → 100% specific phase\n",
310
+ "\n",
311
+ "**Alert TP target oracles (7 paths)** — for the secondary `label_true_positive` task on `alert_records.csv`:\n",
312
+ "1. `alert_category == \"false_positive_noise\"` → 100% FP\n",
313
+ "2. `label_false_positive` (mirror of target)\n",
314
+ "3. `time_to_detect_seconds == 0` → 100% FP\n",
315
+ "4. `correlated_chain_length == 1` → near-100% FP\n",
316
+ "5. `analyst_triage_priority ∈ {P1,P2,P3}` → 100% TP\n",
317
+ "6. `suppression_reason == NaN` → 100% TP\n",
318
+ "7. `alert_rule_name` (rule names encode the answer)\n",
319
+ "\n",
320
+ "It also documents **2 README-suggested targets that are unlearnable on the sample** after honest leak removal: `threat_actor_profile` 4-class (malicious-only) and `event_class` 12-class."
321
+ ]
322
+ },
323
+ {
324
+ "cell_type": "markdown",
325
+ "metadata": {},
326
+ "source": [
327
+ "## 9. Next steps\n",
328
+ "\n",
329
+ "- See `validation_results.json` for held-out test metrics (3,726 events from ~75 test incidents).\n",
330
+ "- See `multi_seed_results.json` for the across-10-seeds picture (accuracy 0.936 ± 0.007, ROC-AUC 0.988 ± 0.001).\n",
331
+ "- See `ablation_results.json` for per-feature-group contribution. `event_class` carries the dominant signal (−18pp macro-F1 when removed); CVSS features are second.\n",
332
+ "- See **`leakage_diagnostic.json`** for the full 11-oracle-path audit.\n",
333
+ "- For the full ~550k-row CYB010 dataset and commercial licensing, contact **pradeep@xpertsystems.ai**."
334
+ ]
335
+ }
336
+ ],
337
+ "metadata": {
338
+ "kernelspec": {
339
+ "display_name": "Python 3",
340
+ "language": "python",
341
+ "name": "python3"
342
+ },
343
+ "language_info": {
344
+ "name": "python",
345
+ "version": "3.10"
346
+ }
347
+ },
348
+ "nbformat": 4,
349
+ "nbformat_minor": 5
350
+ }
leakage_diagnostic.json ADDED
@@ -0,0 +1,186 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "purpose": "CYB010 sample has extensive structural leakage in two places: the per-event phase/profile labels are oracled by the mitre_tactic == 'benign' marker and the threat_actor_id == 'NONE' marker (both perfect benign indicators), and the per-alert label_true_positive target is oracled by SEVEN separate columns including the alert_category, alert_rule_name, time_to_detect_seconds sentinel, correlated_chain_length sentinel, analyst_triage_priority, and suppression_reason fields. The published baseline (attack_lifecycle_phase 5-class) trains with the four phase oracles excluded.",
3
+ "primary_target": "attack_lifecycle_phase (5-class, per-event)",
4
+ "split": "GroupShuffleSplit on incident_id, 70/15/15 nested",
5
+ "oracle_paths_documented": {
6
+ "P1_mitre_tactic_benign": {
7
+ "target": "attack_lifecycle_phase == 'benign_background'",
8
+ "leak_column": "mitre_tactic",
9
+ "mechanism": "All events with mitre_tactic == 'benign' are in benign_background phase; all events in benign_background have mitre_tactic == 'benign'. Perfect bidirectional oracle (12,448 of 12,448 cases).",
10
+ "evidence_counts": {
11
+ "tactic_benign_AND_phase_benign": 12448,
12
+ "tactic_benign_AND_phase_other": 0,
13
+ "tactic_attack_AND_phase_benign": 0
14
+ },
15
+ "verdict": "Perfect oracle for benign_background phase."
16
+ },
17
+ "P2_mitre_technique_id": {
18
+ "target": "mitre_tactic",
19
+ "leak_column": "mitre_technique_id",
20
+ "mechanism": "By ATT&CK design, each MITRE technique (T-number) belongs to exactly one tactic. 100% of techniques in the sample (54 of 54) map deterministically to a single tactic. Indirect oracle for phase via the mitre_tactic chain.",
21
+ "evidence": {
22
+ "n_unique_techniques": 54,
23
+ "techniques_mapping_to_single_tactic": 54,
24
+ "percent_oracle": 100.0
25
+ },
26
+ "verdict": "Perfect oracle for mitre_tactic; indirect for phase."
27
+ },
28
+ "P3_label_malicious": {
29
+ "target": "attack_lifecycle_phase == 'benign_background'",
30
+ "leak_column": "label_malicious",
31
+ "mechanism": "label_malicious is False if and only if the event is in benign_background phase. Perfect bidirectional encoding.",
32
+ "evidence_counts": {
33
+ "label_malicious_False_AND_phase_benign": 12448,
34
+ "label_malicious_False_AND_phase_other": 0
35
+ },
36
+ "verdict": "Perfect oracle for benign_background phase."
37
+ },
38
+ "P4_threat_actor_id_NONE": {
39
+ "target": "attack_lifecycle_phase == 'benign_background'",
40
+ "leak_column": "threat_actor_id",
41
+ "mechanism": "threat_actor_id has 11 values: 10 ACTOR-XXXX labels (one per malicious actor) plus 'NONE' for benign events. threat_actor_id == 'NONE' is a perfect oracle for benign phase; the 10 ACTOR-XXXX values are perfect oracles for non-benign phase.",
42
+ "evidence_counts": {
43
+ "actor_NONE_AND_phase_benign": 12448,
44
+ "actor_NONE_AND_phase_other": 0
45
+ },
46
+ "verdict": "Perfect oracle for benign_background phase."
47
+ },
48
+ "P5_threat_actor_profile_benign": {
49
+ "target": "attack_lifecycle_phase == 'benign_background'",
50
+ "leak_column": "threat_actor_profile",
51
+ "mechanism": "threat_actor_profile == 'benign_user' is a perfect oracle for benign_background phase. The 4 non-benign profile values (apt, nation_state, insider, script_kiddie) all indicate non-benign phase.",
52
+ "evidence_counts": {
53
+ "profile_benign_user_AND_phase_benign": 12448
54
+ },
55
+ "verdict": "Perfect oracle for benign_background phase."
56
+ },
57
+ "P6_event_type_phase": {
58
+ "target": "attack_lifecycle_phase (multiple phases)",
59
+ "leak_column": "event_type",
60
+ "mechanism": "Many event_type values are phase-specific. For example, 'c2_beacon_outbound' (6,158 events) maps to exfiltration_or_impact with 100% purity. Other event types similarly map to specific phases.",
61
+ "near_oracle_event_types": {
62
+ "c2_beacon_outbound": {
63
+ "maps_to": "exfiltration_or_impact",
64
+ "purity": 0.9514,
65
+ "n_events": 6158
66
+ },
67
+ "credential_dumping_attempt": {
68
+ "maps_to": "benign_background",
69
+ "purity": 0.9518,
70
+ "n_events": 166
71
+ },
72
+ "process_hollowing_detected": {
73
+ "maps_to": "benign_background",
74
+ "purity": 0.9527,
75
+ "n_events": 169
76
+ }
77
+ },
78
+ "n_event_types_with_purity_above_95pct": 3,
79
+ "verdict": "Strong near-oracle for multiple phases. Dropped."
80
+ },
81
+ "A1_alert_category_FP_noise": {
82
+ "target": "label_true_positive (alerts)",
83
+ "leak_column": "alert_category",
84
+ "mechanism": "alert_category == 'false_positive_noise' is a perfect oracle for label_true_positive == False (2,721 of 2,721 noise alerts are FP; all 14 other categories are 100% TP).",
85
+ "verdict": "Perfect oracle."
86
+ },
87
+ "A2_label_false_positive_mirror": {
88
+ "target": "label_true_positive (alerts)",
89
+ "leak_column": "label_false_positive",
90
+ "mechanism": "label_false_positive is exactly NOT label_true_positive (verified across all 5,162 alerts). Same target.",
91
+ "verdict": "Perfect oracle (mirror target)."
92
+ },
93
+ "A3_time_to_detect_sentinel": {
94
+ "target": "label_true_positive (alerts)",
95
+ "leak_column": "time_to_detect_seconds",
96
+ "mechanism": "FP alerts have time_to_detect_seconds == 0 (sentinel for 'no detection time because it's a false positive'). TP alerts have detection times ranging 240 to 2,592,000 seconds. Perfect oracle.",
97
+ "evidence": {
98
+ "FP_alerts_time_zero": 2721,
99
+ "TP_alerts_time_zero": 0
100
+ },
101
+ "verdict": "Perfect oracle."
102
+ },
103
+ "A4_correlated_chain_sentinel": {
104
+ "target": "label_true_positive (alerts)",
105
+ "leak_column": "correlated_chain_length",
106
+ "mechanism": "FP alerts always have correlated_chain_length == 1 (no correlation possible because false positives don't chain). TP alerts have chain length 1-20 with mean 3.14. Perfect oracle when chain_length > 1; chain_length == 1 still allows some TPs.",
107
+ "verdict": "Strong oracle - chain_length > 1 perfectly identifies TP."
108
+ },
109
+ "A5_analyst_triage_priority": {
110
+ "target": "label_true_positive (alerts)",
111
+ "leak_column": "analyst_triage_priority",
112
+ "mechanism": "P1, P2, P3 priorities are 100% TP (1,609 alerts total). P4 splits 76% FP / 24% TP. The P1/P2/P3 indicator alone is a perfect oracle for TP within those alerts.",
113
+ "evidence_counts": {
114
+ "P1": {
115
+ "false": 0,
116
+ "true": 131
117
+ },
118
+ "P2": {
119
+ "false": 0,
120
+ "true": 432
121
+ },
122
+ "P3": {
123
+ "false": 0,
124
+ "true": 1046
125
+ },
126
+ "P4": {
127
+ "false": 2721,
128
+ "true": 832
129
+ }
130
+ },
131
+ "verdict": "Strong oracle (perfect for P1/P2/P3)."
132
+ },
133
+ "A6_suppression_reason": {
134
+ "target": "label_true_positive (alerts)",
135
+ "leak_column": "suppression_reason",
136
+ "mechanism": "suppression_reason is NaN if and only if the alert is TP (1,744 of 1,744 NaN values are TP). Any non-NaN suppression reason is 79-82% FP. Strong oracle.",
137
+ "verdict": "Strong oracle."
138
+ },
139
+ "A7_alert_rule_name": {
140
+ "target": "label_true_positive (alerts)",
141
+ "leak_column": "alert_rule_name",
142
+ "mechanism": "alert_rule_name often encodes the answer (rules with 'false_positive' or 'noise' in name map deterministically to FP; rules with attack-specific names map to TP).",
143
+ "verdict": "Strong oracle by rule naming convention."
144
+ }
145
+ },
146
+ "unlearnable_targets": [
147
+ {
148
+ "target": "threat_actor_profile 4-class (malicious events only)",
149
+ "n_classes": 4,
150
+ "n_events": 9448,
151
+ "majority_baseline": 0.6110287891617273,
152
+ "honest_accuracy": 0.5543902985277928,
153
+ "honest_roc_auc": 0.7473176763614474,
154
+ "verdict": "below_majority",
155
+ "note": "After filtering to malicious events only and dropping all phase/tactic oracles, threat actor attribution is below majority baseline. The 5-class formulation works only because benign_user separation is trivial (which is a structural oracle finding)."
156
+ },
157
+ {
158
+ "target": "event_class 12-class (per-event)",
159
+ "n_classes": 12,
160
+ "majority_baseline": 0.4211728169528681,
161
+ "honest_accuracy": 0.3508069868328931,
162
+ "verdict": "below_majority",
163
+ "note": "event_class is a structural property of the event itself (e.g. network_flow, authentication, endpoint_process) and is not learnable from other features without leaking event_type."
164
+ }
165
+ ],
166
+ "alert_task_findings": {
167
+ "task": "label_true_positive binary on alert_records (5,162 alerts)",
168
+ "with_oracles_intact_accuracy": 1.0,
169
+ "with_oracles_intact_note": "100% test accuracy with any single oracle column present",
170
+ "honest_accuracy_mean_3seeds": 0.7636892643739505,
171
+ "honest_roc_auc_mean_3seeds": 0.8541442200259074,
172
+ "majority_baseline": 0.5271212708252615,
173
+ "interpretation": "After dropping all 7 oracle columns, honest XGBoost achieves acc 0.764 and AUC 0.854 on the alert TP task - real signal from severity_level, siem_platform_type, suppressed_flag, and host context features. This is a viable secondary task but is NOT the published baseline (the per-event attack_lifecycle_phase task is)."
174
+ },
175
+ "unlearnable_summary": "Two README-suggested targets are unlearnable on the sample after honest oracle removal: threat_actor_profile 4-class (malicious-only) and event_class 12-class. The 5-class threat_actor_profile WITH benign included is technically viable (acc 0.84) but per-class F1 reveals it's almost entirely driven by benign_user separation (F1 1.00 vs F1 0.17-0.69 for the 4 malicious classes). Hence the published primary target is attack_lifecycle_phase 5-class.",
176
+ "recommendations_to_dataset_author": [
177
+ "Remove the threat_actor_id == 'NONE' sentinel for benign events. Use a per-event mask or a separate benign-actor pool with realistic actor IDs.",
178
+ "Replace the mitre_tactic == 'benign' marker with phase-specific tactic distributions (e.g. benign events should sample from realistic non-malicious tactic-free patterns, not all share a 'benign' value).",
179
+ "Make event_type less deterministic per phase. 'c2_beacon_outbound' should appear in a few different phases with phase-specific frequencies, not 100% in exfiltration.",
180
+ "Replace time_to_detect_seconds == 0 sentinel for FP alerts with realistic detection-time distributions; FP alerts can still have a 'time to detection' value (the time to dismiss).",
181
+ "Replace correlated_chain_length == 1 sentinel for FP with occasional 2-3 chains (real noise sometimes correlates).",
182
+ "Replace analyst_triage_priority P1/P2/P3 -> 100% TP with realistic uncertainty; some P1 alerts are FPs in real data.",
183
+ "Make alert_category names less revealing - rule names like 'false_positive_noise' deterministically encode the label. Use abstract rule IDs and have the FP label come from outcome statistics, not the rule name.",
184
+ "To enable threat_actor_profile 4-class learning, add stronger per-actor feature signatures - APT vs nation_state should have distinct host targeting, dwell time per host, and log_source affinity. Current overlap is too tight."
185
+ ]
186
+ }
model_mlp.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d4be794e569b948bf7f16454742ed84e865710a260635494c641b7216fffd10a
3
+ size 83676
model_xgb.json ADDED
The diff for this file is too large to render. See raw diff
 
multi_seed_results.json ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "purpose": "Multi-seed evaluation across 10 group-aware splits of the 21,896-event sample (500 incidents).",
3
+ "seeds_evaluated": [
4
+ 42,
5
+ 7,
6
+ 13,
7
+ 17,
8
+ 23,
9
+ 31,
10
+ 45,
11
+ 99,
12
+ 123,
13
+ 200
14
+ ],
15
+ "per_seed": [
16
+ {
17
+ "seed": 42,
18
+ "test_n_classes": 5,
19
+ "accuracy": 0.9492753623188406,
20
+ "macro_f1": 0.7780594102481514,
21
+ "macro_roc_auc_ovr": 0.9904125505537232
22
+ },
23
+ {
24
+ "seed": 7,
25
+ "test_n_classes": 5,
26
+ "accuracy": 0.9371447676362421,
27
+ "macro_f1": 0.7470429505084855,
28
+ "macro_roc_auc_ovr": 0.9883780833142183
29
+ },
30
+ {
31
+ "seed": 13,
32
+ "test_n_classes": 5,
33
+ "accuracy": 0.9440175631174533,
34
+ "macro_f1": 0.7786431389219104,
35
+ "macro_roc_auc_ovr": 0.9893348598508764
36
+ },
37
+ {
38
+ "seed": 17,
39
+ "test_n_classes": 5,
40
+ "accuracy": 0.9301659988551803,
41
+ "macro_f1": 0.7496550235562918,
42
+ "macro_roc_auc_ovr": 0.9862828960991046
43
+ },
44
+ {
45
+ "seed": 23,
46
+ "test_n_classes": 5,
47
+ "accuracy": 0.9409375,
48
+ "macro_f1": 0.7808189932344203,
49
+ "macro_roc_auc_ovr": 0.9899045909034948
50
+ },
51
+ {
52
+ "seed": 31,
53
+ "test_n_classes": 5,
54
+ "accuracy": 0.930905695611578,
55
+ "macro_f1": 0.7613555094687323,
56
+ "macro_roc_auc_ovr": 0.9868934259288492
57
+ },
58
+ {
59
+ "seed": 45,
60
+ "test_n_classes": 5,
61
+ "accuracy": 0.9233565586186004,
62
+ "macro_f1": 0.7409385948742784,
63
+ "macro_roc_auc_ovr": 0.9864613394709789
64
+ },
65
+ {
66
+ "seed": 99,
67
+ "test_n_classes": 5,
68
+ "accuracy": 0.9290322580645162,
69
+ "macro_f1": 0.7409062534499034,
70
+ "macro_roc_auc_ovr": 0.9861301771811058
71
+ },
72
+ {
73
+ "seed": 123,
74
+ "test_n_classes": 5,
75
+ "accuracy": 0.937037037037037,
76
+ "macro_f1": 0.7622080835728512,
77
+ "macro_roc_auc_ovr": 0.9882332249503822
78
+ },
79
+ {
80
+ "seed": 200,
81
+ "test_n_classes": 5,
82
+ "accuracy": 0.9404943545926152,
83
+ "macro_f1": 0.7495112167344459,
84
+ "macro_roc_auc_ovr": 0.988891453888266
85
+ }
86
+ ],
87
+ "aggregate": {
88
+ "accuracy_mean": 0.9362367095852064,
89
+ "accuracy_std": 0.007451938413439355,
90
+ "accuracy_min": 0.9233565586186004,
91
+ "accuracy_max": 0.9492753623188406,
92
+ "macro_f1_mean": 0.758913917456947,
93
+ "macro_f1_std": 0.014882483861819625,
94
+ "roc_auc_mean": 0.9880922602141,
95
+ "roc_auc_std": 0.001489069995610803
96
+ },
97
+ "published_artifact_seed": 42
98
+ }
validation_results.json ADDED
@@ -0,0 +1,180 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0.0",
3
+ "dataset": "xpertsystems/cyb010-sample",
4
+ "task": "5-class attack_lifecycle_phase classification",
5
+ "baselines": {
6
+ "always_predict_majority_accuracy": 0.5593129361245304,
7
+ "majority_class": "benign_background",
8
+ "random_guess_accuracy": 0.2
9
+ },
10
+ "split": {
11
+ "strategy": "group-aware (GroupShuffleSplit on incident_id, nested 70/15/15)",
12
+ "rationale": "500 incidents x ~44 events each. Events from the same incident share host, threat actor, and phase trajectory. Group-aware splitting prevents train/test leakage. ~75 test incidents per fold.",
13
+ "events_train": 14697,
14
+ "events_val": 3473,
15
+ "events_test": 3726,
16
+ "n_incidents_train": 350,
17
+ "seed": 42
18
+ },
19
+ "n_features": 87,
20
+ "label_classes": [
21
+ "benign_background",
22
+ "initial_access",
23
+ "lateral_movement",
24
+ "persistence_establishment",
25
+ "exfiltration_or_impact"
26
+ ],
27
+ "class_distribution_train": {
28
+ "benign_background": 8547,
29
+ "exfiltration_or_impact": 3898,
30
+ "initial_access": 1187,
31
+ "lateral_movement": 670,
32
+ "persistence_establishment": 395
33
+ },
34
+ "class_distribution_test": {
35
+ "benign_background": 2084,
36
+ "exfiltration_or_impact": 1186,
37
+ "initial_access": 247,
38
+ "lateral_movement": 118,
39
+ "persistence_establishment": 91
40
+ },
41
+ "oracle_excluded_features": [
42
+ "mitre_tactic (benign value -> benign_background phase, perfect oracle)",
43
+ "mitre_technique_id (ATT&CK-by-design perfect oracle for mitre_tactic)",
44
+ "label_malicious (False -> benign_background, perfect oracle)",
45
+ "threat_actor_id (NONE -> benign, perfect oracle)",
46
+ "threat_actor_profile (benign_user -> benign_background, perfect oracle)",
47
+ "event_type (many values phase-specific; e.g. c2_beacon_outbound -> 100% exfil)"
48
+ ],
49
+ "leakage_audit_note": "See leakage_diagnostic.json for the full audit. 11 oracle paths documented (4 phase oracles, 1 ATT&CK indirect, 6 event_type near-oracles, 7 alert-task oracles), and 2 unlearnable README-suggested targets after honest leakage removal.",
50
+ "models": {
51
+ "xgboost": {
52
+ "architecture": "Gradient-boosted decision trees, multi:softprob, 5 classes",
53
+ "framework": "xgboost",
54
+ "test_metrics": {
55
+ "model": "xgboost",
56
+ "accuracy": 0.9492753623188406,
57
+ "macro_f1": 0.7780594102481514,
58
+ "weighted_f1": 0.9522470071864876,
59
+ "per_class_f1": {
60
+ "benign_background": 0.9975996159385502,
61
+ "initial_access": 0.7196652719665272,
62
+ "lateral_movement": 0.48322147651006714,
63
+ "persistence_establishment": 0.703030303030303,
64
+ "exfiltration_or_impact": 0.9867803837953092
65
+ },
66
+ "confusion_matrix": {
67
+ "labels": [
68
+ "benign_background",
69
+ "initial_access",
70
+ "lateral_movement",
71
+ "persistence_establishment",
72
+ "exfiltration_or_impact"
73
+ ],
74
+ "matrix": [
75
+ [
76
+ 2078,
77
+ 6,
78
+ 0,
79
+ 0,
80
+ 0
81
+ ],
82
+ [
83
+ 4,
84
+ 172,
85
+ 65,
86
+ 6,
87
+ 0
88
+ ],
89
+ [
90
+ 0,
91
+ 38,
92
+ 72,
93
+ 6,
94
+ 2
95
+ ],
96
+ [
97
+ 0,
98
+ 11,
99
+ 22,
100
+ 58,
101
+ 0
102
+ ],
103
+ [
104
+ 0,
105
+ 4,
106
+ 21,
107
+ 4,
108
+ 1157
109
+ ]
110
+ ]
111
+ },
112
+ "macro_roc_auc_ovr": 0.9904125505537232
113
+ }
114
+ },
115
+ "mlp": {
116
+ "architecture": "PyTorch MLP, 87 -> 128 -> 64 -> 5, BatchNorm1d + ReLU + Dropout, weighted cross-entropy loss",
117
+ "framework": "pytorch",
118
+ "test_metrics": {
119
+ "model": "mlp",
120
+ "accuracy": 0.9412238325281803,
121
+ "macro_f1": 0.7533989932595785,
122
+ "weighted_f1": 0.9423850278932477,
123
+ "per_class_f1": {
124
+ "benign_background": 0.9937679769894535,
125
+ "initial_access": 0.6511627906976745,
126
+ "lateral_movement": 0.4507042253521127,
127
+ "persistence_establishment": 0.6903553299492385,
128
+ "exfiltration_or_impact": 0.9810046433094133
129
+ },
130
+ "confusion_matrix": {
131
+ "labels": [
132
+ "benign_background",
133
+ "initial_access",
134
+ "lateral_movement",
135
+ "persistence_establishment",
136
+ "exfiltration_or_impact"
137
+ ],
138
+ "matrix": [
139
+ [
140
+ 2073,
141
+ 11,
142
+ 0,
143
+ 0,
144
+ 0
145
+ ],
146
+ [
147
+ 10,
148
+ 140,
149
+ 72,
150
+ 17,
151
+ 8
152
+ ],
153
+ [
154
+ 2,
155
+ 27,
156
+ 64,
157
+ 12,
158
+ 13
159
+ ],
160
+ [
161
+ 2,
162
+ 4,
163
+ 17,
164
+ 68,
165
+ 0
166
+ ],
167
+ [
168
+ 1,
169
+ 1,
170
+ 13,
171
+ 9,
172
+ 1162
173
+ ]
174
+ ]
175
+ },
176
+ "macro_roc_auc_ovr": 0.986126094475466
177
+ }
178
+ }
179
+ }
180
+ }