pradeep-xpert commited on
Commit
c6a80e7
·
verified ·
1 Parent(s): dbfd183

Initial release: XGBoost + MLP for malware execution phase classification

Browse files
README.md ADDED
@@ -0,0 +1,438 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ library_name: pytorch
4
+ tags:
5
+ - cybersecurity
6
+ - malware
7
+ - malware-behaviour
8
+ - sandbox-analysis
9
+ - edr
10
+ - tabular-classification
11
+ - synthetic-data
12
+ - xgboost
13
+ - baseline
14
+ pipeline_tag: tabular-classification
15
+ base_model: []
16
+ datasets:
17
+ - xpertsystems/cyb003-sample
18
+ metrics:
19
+ - accuracy
20
+ - f1
21
+ - roc_auc
22
+ model-index:
23
+ - name: cyb003-baseline-classifier
24
+ results:
25
+ - task:
26
+ type: tabular-classification
27
+ name: 10-class malware execution phase classification
28
+ dataset:
29
+ type: xpertsystems/cyb003-sample
30
+ name: CYB003 Synthetic Malware Behaviour & Classification Dataset (Sample)
31
+ metrics:
32
+ - type: roc_auc
33
+ value: 0.9792
34
+ name: Test macro ROC-AUC OvR (XGBoost, seed 42)
35
+ - type: accuracy
36
+ value: 0.9178
37
+ name: Test accuracy (XGBoost, seed 42)
38
+ - type: f1
39
+ value: 0.7781
40
+ name: Test macro-F1 (XGBoost, seed 42)
41
+ - type: accuracy
42
+ value: 0.905
43
+ name: Multi-seed accuracy mean ± 0.010 (XGBoost, 10 seeds)
44
+ - type: roc_auc
45
+ value: 0.975
46
+ name: Multi-seed ROC-AUC mean ± 0.002 (XGBoost, 10 seeds)
47
+ - type: roc_auc
48
+ value: 0.9681
49
+ name: Test macro ROC-AUC OvR (MLP, seed 42)
50
+ - type: accuracy
51
+ value: 0.8222
52
+ name: Test accuracy (MLP, seed 42)
53
+ - type: f1
54
+ value: 0.7072
55
+ name: Test macro-F1 (MLP, seed 42)
56
+ ---
57
+
58
+ # CYB003 Baseline Classifier
59
+
60
+ **Malware execution-phase classifier trained on the CYB003 synthetic
61
+ malware behaviour sample. Predicts which of 10 execution phases a
62
+ per-timestep telemetry record belongs to, from observable behavioural
63
+ and PE-static features.**
64
+
65
+ > **Baseline reference, not for production use.** This model demonstrates
66
+ > that the [CYB003 sample dataset](https://huggingface.co/datasets/xpertsystems/cyb003-sample)
67
+ > is learnable end-to-end and gives prospective buyers a working starting
68
+ > point. It is not a production sandbox, EDR, or threat-detection system.
69
+ > See [Limitations](#limitations).
70
+
71
+ ## Model overview
72
+
73
+ | Property | Value |
74
+ |---|---|
75
+ | Task | 10-class execution_phase classification |
76
+ | Training data | `xpertsystems/cyb003-sample` (6,000 timesteps across 100 malware samples) |
77
+ | Models | XGBoost + PyTorch MLP |
78
+ | Input features | 69 (after one-hot encoding) |
79
+ | Split | **Group-aware by sample_id** (disjoint train/val/test samples) |
80
+ | Validation | Single seed (artifact) + multi-seed aggregate across 10 seeds |
81
+ | License | CC-BY-NC-4.0 (matches dataset) |
82
+ | Status | Reference baseline |
83
+
84
+ ## Why this task instead of malware family classification?
85
+
86
+ The CYB003 dataset README leads with "training malware family classifiers"
87
+ as a suggested use case. We piloted that target first and found it is
88
+ **not learnable from the sample dataset** under proper group-aware
89
+ evaluation: with only 100 unique samples spread across 10 families,
90
+ XGBoost on per-timestep features lands at ~15% accuracy and ROC-AUC ~0.58
91
+ — at majority baseline. Per-sample aggregation gives the same result.
92
+
93
+ This is a **sample-size constraint**, not a feature-engineering failure.
94
+ With ~7 samples per family on average, a held-out test set of 15 samples
95
+ covers at most ~8 families and yields a model that cannot generalize.
96
+ The full 280k-row CYB003 product, with ~28 samples per family at the
97
+ sample's distribution, will not have this constraint.
98
+
99
+ We pivoted to **execution_phase prediction**, which has 6,000 rows of
100
+ per-timestep data and learns cleanly: 91% accuracy, ROC-AUC 0.98, stable
101
+ across seeds. This is a legitimate SOC use case — dynamic-analysis tools
102
+ and EDR systems regularly need to tag what phase of execution observed
103
+ malware activity belongs to — and it shows the dataset is well-calibrated
104
+ even when the headline product use case needs more data.
105
+
106
+ Two model artifacts are published. They are designed to be used together — disagreement is a useful triage signal:
107
+
108
+ - `model_xgb.json` — gradient-boosted trees, primary recommendation
109
+ - `model_mlp.safetensors` — PyTorch MLP in SafeTensors format
110
+
111
+ ## Quick start
112
+
113
+ ```bash
114
+ pip install xgboost torch safetensors pandas huggingface_hub
115
+ ```
116
+
117
+ ```python
118
+ from huggingface_hub import hf_hub_download
119
+ import json, numpy as np, torch, xgboost as xgb
120
+ from safetensors.torch import load_file
121
+
122
+ REPO = "xpertsystems/cyb003-baseline-classifier"
123
+
124
+ paths = {n: hf_hub_download(REPO, n) for n in [
125
+ "model_xgb.json", "model_mlp.safetensors",
126
+ "feature_engineering.py", "feature_meta.json", "feature_scaler.json",
127
+ ]}
128
+
129
+ import sys, os
130
+ sys.path.insert(0, os.path.dirname(paths["feature_engineering.py"]))
131
+ from feature_engineering import transform_single, load_meta, INT_TO_LABEL
132
+
133
+ meta = load_meta(paths["feature_meta.json"])
134
+ xgb_model = xgb.XGBClassifier(); xgb_model.load_model(paths["model_xgb.json"])
135
+
136
+ # Predict (see inference_example.ipynb for the full pattern)
137
+ X = transform_single(my_timestep_record, meta)
138
+ proba = xgb_model.predict_proba(X)[0]
139
+ print(INT_TO_LABEL[int(np.argmax(proba))])
140
+ ```
141
+
142
+ See [`inference_example.ipynb`](./inference_example.ipynb) for the full
143
+ copy-paste demo.
144
+
145
+ ## Training data
146
+
147
+ Trained on the public sample of CYB003, 6,000 per-timestep telemetry
148
+ rows from 100 malware samples (60 timesteps per sample):
149
+
150
+ | Phase | Total rows | Train share | Test rows (seed 42) |
151
+ |---|---:|---:|---:|
152
+ | `initial_drop` | 801 | 13.4% | 120 |
153
+ | `lateral_movement` | 799 | 13.3% | 120 |
154
+ | `persistence_establishment` | 787 | 13.1% | 119 |
155
+ | `data_exfiltration` | 783 | 13.1% | 100 |
156
+ | `c2_communication` | 709 | 11.8% | 87 |
157
+ | `privilege_escalation` | 705 | 11.8% | 107 |
158
+ | `payload_execution` | 705 | 11.8% | 109 |
159
+ | `dormancy_dwell` | 250 | 4.2% | 83 |
160
+ | `sandbox_evasion_stall` | 234 | 3.9% | 32 |
161
+ | `self_destruct_cleanup` | 227 | 3.8% | 23 |
162
+
163
+ ### Group-aware split
164
+
165
+ A single malware sample generates 60 highly-correlated timesteps. Random
166
+ row-level splitting would put timesteps from the same sample in both
167
+ train and test, inflating metrics in a way that does not generalize to
168
+ new samples.
169
+
170
+ This release uses **GroupShuffleSplit by `sample_id`** (nested, 70/15/15):
171
+
172
+ | Fold | Samples | Timesteps |
173
+ |---|---:|---:|
174
+ | Train | 69 | 4,140 |
175
+ | Validation | 16 | 960 |
176
+ | Test | 15 | 900 |
177
+
178
+ All test samples are completely unseen during training. Class imbalance
179
+ is addressed with `class_weight='balanced'` (XGBoost `sample_weight`) and
180
+ weighted cross-entropy (MLP).
181
+
182
+ ## Feature pipeline
183
+
184
+ The bundled `feature_engineering.py` is the canonical feature recipe.
185
+ 69 features survive after encoding, drawn from:
186
+
187
+ - **Per-timestep numeric** (10): `timestep`, `api_call_rate`, `registry_write_count`, `network_connection_count`, `process_injection_flag`, `c2_beacon_interval_sec`, `av_signature_hit_flag`, `sandbox_evasion_flag`, `lateral_propagation_count`, `privilege_escalation_flag`
188
+ - **PE static features** (11): `pe_entropy_mean`, `pe_entropy_std`, `import_hash_cluster`, `section_count`, `packed_section_ratio`, `string_entropy_mean`, `byte_histogram_chi2`, `code_section_rx_ratio`, `resource_section_entropy`, `suspicious_import_count`, `packer_detected_flag`
189
+ - **Categorical** (6, one-hot encoded): `malware_family`, `threat_actor_tier`, `target_platform`, `obfuscation_technique`, `detection_outcome`, `ep_stack`
190
+ - **Engineered** (6): `api_burst_score`, `is_c2_active`, `is_high_net_volume`, `is_stealth_step`, `is_destructive_step`, `lateral_activity_score`
191
+
192
+ ### Leakage audit
193
+
194
+ No categorical feature has phase->phase purity above 0.17 (uniform
195
+ random baseline is 0.10), so nothing in the dataset is an oracle for
196
+ the target. The model relies on a mix of `timestep` (strong but not
197
+ deterministic) and behavioural features.
198
+
199
+ ## Evaluation
200
+
201
+ ### Test-set metrics, seed 42 (n = 900 timesteps from 15 disjoint samples)
202
+
203
+ **XGBoost** (the published `model_xgb.json` artifact)
204
+
205
+ | Metric | Value |
206
+ |---|---:|
207
+ | Macro ROC-AUC (OvR) | **0.9792** |
208
+ | Accuracy | **0.9178** |
209
+ | Macro-F1 | 0.7781 |
210
+ | Weighted-F1 | 0.9173 |
211
+
212
+ **MLP** (the published `model_mlp.safetensors` artifact)
213
+
214
+ | Metric | Value |
215
+ |---|---:|
216
+ | Macro ROC-AUC (OvR) | 0.9681 |
217
+ | Accuracy | 0.8222 |
218
+ | Macro-F1 | 0.7072 |
219
+ | Weighted-F1 | 0.8278 |
220
+
221
+ ### Multi-seed robustness (XGBoost, 10 seeds)
222
+
223
+ Accuracy and ROC-AUC are tight across seeds — the task is genuinely
224
+ learnable, not seed-lucky:
225
+
226
+ | Metric | Mean | Std | Min | Max |
227
+ |---|---:|---:|---:|---:|
228
+ | Accuracy | 0.905 | 0.010 | 0.882 | 0.921 |
229
+ | Macro-F1 | 0.784 | 0.013 | 0.759 | 0.807 |
230
+ | Macro ROC-AUC OvR | 0.975 | 0.002 | 0.972 | 0.979 |
231
+
232
+ Full per-seed results in [`multi_seed_results.json`](./multi_seed_results.json).
233
+ All 10 seeds yielded all 10 classes in the test fold, supporting clean
234
+ multi-class ROC-AUC computation.
235
+
236
+ ### Per-class F1 (seed 42) — where the signal is and isn't
237
+
238
+ | Phase | XGBoost F1 | MLP F1 | Note |
239
+ |---|---:|---:|---|
240
+ | `c2_communication` | **1.000** | 1.000 | Trivial: tight timestep window 52-59 + c2_beacon signal |
241
+ | `persistence_establishment` | **0.992** | 0.870 | Tight timestep window 9-17 + registry writes |
242
+ | `lateral_movement` | **0.992** | 0.907 | Tight timestep window 26-34 + lateral_propagation |
243
+ | `privilege_escalation` | **0.991** | 0.915 | Tight timestep window 18-25 + privilege flag |
244
+ | `data_exfiltration` | **0.970** | 0.918 | Tight timestep window 43-51 + network volume |
245
+ | `payload_execution` | **0.963** | 0.698 | Tight timestep window 35-42 + API bursts |
246
+ | `initial_drop` | **0.945** | 0.886 | Tight timestep window 0-8 |
247
+ | `dormancy_dwell` | 0.530 | 0.520 | Hard: spans full 0-59 timestep range |
248
+ | `self_destruct_cleanup` | 0.273 | 0.282 | Hard: spans full 0-59, low row count (227) |
249
+ | `sandbox_evasion_stall` | 0.125 | 0.077 | Hard: spans full 0-59, low row count (234) |
250
+
251
+ Seven phases are near-trivially classified because they sit in tight
252
+ timestep windows with characteristic behavioural signatures. **Three
253
+ phases — `dormancy_dwell`, `sandbox_evasion_stall`, `self_destruct_cleanup`
254
+ — scatter across the full 0–59 timestep range** and lack distinctive
255
+ behavioural features (idle/evasion phases have low activity by design),
256
+ so a flat-tabular event-level model can't reliably disambiguate them.
257
+ Sequence models that consider neighbouring timesteps would help here.
258
+
259
+ ### Ablation: which feature groups matter
260
+
261
+ | Configuration | Accuracy | Macro-F1 | ROC-AUC | Δ accuracy |
262
+ |---|---:|---:|---:|---:|
263
+ | Full feature set (published) | 0.9178 | 0.7781 | 0.9792 | — |
264
+ | No `timestep` | 0.6933 | 0.5963 | 0.9264 | **−0.2244** |
265
+ | No behavioural features | 0.9089 | 0.7579 | 0.9705 | −0.0089 |
266
+ | No PE static features | 0.9167 | 0.7808 | 0.9786 | −0.0011 |
267
+ | No engineered features | 0.9200 | 0.7931 | 0.9797 | +0.0022 |
268
+
269
+ Three clear findings:
270
+
271
+ 1. **`timestep` is by far the dominant feature** (drops 22 pp when removed,
272
+ ROC-AUC still 0.93). Malware execution progresses in time, and where
273
+ you are in that timeline carries most of the phase signal.
274
+ 2. **PE static features are barely used for phase prediction.** This is
275
+ honest: PE features (entropy, packed sections, import hashes) inform
276
+ family classification, not phase classification. A buyer doing family
277
+ work should expect to use them; for phase work they can be dropped.
278
+ 3. **Engineered features and behavioural features each contribute ~1 pp.**
279
+ Trees recover most of the engineered features on their own.
280
+
281
+ ### Architecture
282
+
283
+ **XGBoost:** multi-class gradient boosting (`multi:softprob`, 10 classes),
284
+ `hist` tree method, class-balanced sample weights, early stopping on
285
+ validation mlogloss.
286
+
287
+ **MLP:** `69 → 128 → 64 → 10`, each hidden layer followed by `BatchNorm1d`
288
+ → `ReLU` → `Dropout(0.3)`, weighted cross-entropy loss, AdamW optimizer,
289
+ early stopping on validation macro-F1.
290
+
291
+ Training hyperparameters (learning rate, batch size, n_estimators,
292
+ early-stopping patience, weight decay, class-weighting strategy) are
293
+ held internally by XpertSystems and are not part of this release.
294
+
295
+ ## Limitations
296
+
297
+ **This is a baseline reference, not a production sandbox or threat detector.**
298
+
299
+ 1. **Three phases are genuinely hard at sample size.** `dormancy_dwell`,
300
+ `sandbox_evasion_stall`, and `self_destruct_cleanup` span the full
301
+ 0–59 timestep range and have low row counts. Per-class F1 = 0.13–0.53.
302
+ These are the phases by design lacking distinctive moment-to-moment
303
+ features (the malware is being quiet to evade detection). Sequence
304
+ models or per-sample aggregation would substantially improve these.
305
+
306
+ 2. **The pivot away from malware family classification is dataset-limited,
307
+ not method-limited.** Family classification on 100 samples with 10
308
+ classes is at majority baseline. The full 280k-row CYB003 product
309
+ provides ~5,600 samples and supports proper family classification.
310
+
311
+ 3. **Synthetic-vs-real transfer.** The dataset is synthetic and calibrated
312
+ to threat-intelligence and AV-testing benchmark targets (VirusTotal,
313
+ AV-TEST, MITRE ATT&CK Evaluations, Mandiant M-Trends, CrowdStrike GTR,
314
+ Verizon DBIR). Real malware telemetry has different noise
315
+ characteristics, adversary adaptation, and instrumentation gaps. Do
316
+ not assume metrics transfer.
317
+
318
+ 4. **Adversarial robustness not evaluated.** The dataset is not
319
+ adversarially generated; the model has not been red-teamed against
320
+ evasive samples.
321
+
322
+ 5. **MLP brittleness on OOD inputs.** With ~4k training timesteps, the
323
+ MLP can produce confidently-wrong predictions on hand-crafted records
324
+ far from the training manifold. XGBoost is more robust. Use both;
325
+ treat disagreement as a signal for human review.
326
+
327
+ 6. **`timestep` dominance is a property of the dataset.** Real malware
328
+ in production doesn't have a clean "timestep" feature on a per-sample
329
+ 60-step normalized timeline — that's a simulator artifact. A buyer
330
+ transferring this baseline to real sandbox traces would need to
331
+ recover an equivalent temporal-position feature from execution-trace
332
+ timestamps relative to detonation.
333
+
334
+ ## Notes on dataset schema
335
+
336
+ The CYB003 sample dataset README describes some fields differently from
337
+ the actual schema. The model was trained on the actual schema; this note
338
+ helps buyers reconcile what they read with what they receive.
339
+
340
+ | What the README says | What the data actually contains |
341
+ |---|---|
342
+ | `pe_entropy` (one column) | `pe_entropy_mean` + `pe_entropy_std` (two columns) |
343
+ | `process_injection_count` | `process_injection_flag` (binary, not a count) |
344
+ | `c2_beacon_active` | `c2_beacon_interval_sec` (seconds, 0 when inactive) |
345
+ | `av_detected`, `edr_detected`, `sandbox_evaded`, `dwell_time_hours`, `persistence_mechanism`, `lotl_technique_used` (per-timestep) | None of these exist on per-timestep; equivalents (`av_signature_hit_flag`, `sandbox_evasion_flag`) do exist with different names |
346
+ | `ep_stack`: 3 values (`legacy_av`, `ngav_ml_based`, `edr_full`) | `ep_stack`: 8 values (`legacy_av_only`, `ngav_ml_based`, `edr_endpoint_detect`, `av_plus_firewall`, `xdr_extended_detect`, `managed_detection_response`, `deception_honeypot`, `no_protection`) |
347
+ | 9 malware families listed | 10 families in the data (`apt_implant` is the additional one) |
348
+ | `coordinated_campaign_flag` (described as a flag) | Constant = 1 for all rows in the sample (uninformative) |
349
+
350
+ The actual per-timestep table also contains rich PE-static features not
351
+ listed in the README: `import_hash_cluster`, `section_count`,
352
+ `packed_section_ratio`, `string_entropy_mean`, `byte_histogram_chi2`,
353
+ `code_section_rx_ratio`, `resource_section_entropy`,
354
+ `suspicious_import_count`. These are excellent features for family
355
+ classification work and are documented in the model's
356
+ `feature_engineering.py`.
357
+
358
+ None of these discrepancies affects model correctness — the feature
359
+ pipeline uses the actual column names. If you build your own pipeline
360
+ against the dataset, use the actual columns, not the README descriptions.
361
+
362
+ ## Intended use
363
+
364
+ - **Evaluating fit** of the CYB003 dataset for your malware-analysis
365
+ or sandbox-detection research
366
+ - **Baseline reference** for new model architectures (especially sequence
367
+ models, which should beat this baseline on the late/scattered phases)
368
+ - **Teaching and demo** for tabular classification on malware telemetry
369
+ - **Feature engineering reference** for per-timestep behavioural data
370
+
371
+ ## Out-of-scope use
372
+
373
+ - Production sandbox analysis on real malware
374
+ - EDR phase tagging on real systems
375
+ - Family attribution (this baseline does not address that task; see why above)
376
+ - Adversarial-evasion evaluation (dataset not adversarially generated)
377
+ - Any operational security decision
378
+
379
+ ## Reproducibility
380
+
381
+ Outputs above were produced with `seed = 42` (published artifact),
382
+ group-aware nested `GroupShuffleSplit` (70/15/15 by sample_id), on the
383
+ published sample (`xpertsystems/cyb003-sample`, version 1.0.0, generated
384
+ 2026-05-16). The feature pipeline in `feature_engineering.py` is
385
+ deterministic and the trained weights in this repo correspond exactly
386
+ to the metrics above.
387
+
388
+ Multi-seed results (seeds 42, 7, 13, 17, 23, 31, 45, 99, 123, 200) in
389
+ `multi_seed_results.json` confirm robust performance across splits.
390
+
391
+ The training script itself is private to XpertSystems. The published
392
+ artifacts contain the feature pipeline, model weights, scaler, metadata,
393
+ and validation results — sufficient to reproduce inference but not
394
+ training.
395
+
396
+ ## Files in this repo
397
+
398
+ | File | Purpose |
399
+ |---|---|
400
+ | `model_xgb.json` | XGBoost weights (seed 42) |
401
+ | `model_mlp.safetensors` | PyTorch MLP weights (seed 42) |
402
+ | `feature_engineering.py` | Feature pipeline (load → engineer → encode) |
403
+ | `feature_meta.json` | Feature column order + categorical levels |
404
+ | `feature_scaler.json` | MLP input mean/std (XGBoost ignores) |
405
+ | `validation_results.json` | Per-class metrics, confusion matrix, architecture |
406
+ | `ablation_results.json` | Per-feature-group ablation (timestep, behavioural, PE static, engineered) |
407
+ | `multi_seed_results.json` | XGBoost metrics across 10 seeds with aggregate statistics |
408
+ | `inference_example.ipynb` | End-to-end inference demo notebook |
409
+ | `README.md` | This file |
410
+
411
+ ## Contact and full product
412
+
413
+ The full **CYB003** dataset contains ~349,000 rows across four files,
414
+ with calibrated benchmark validation against 12 metrics drawn from
415
+ authoritative threat intelligence and AV-testing sources (VirusTotal,
416
+ AV-TEST, MITRE ATT&CK Evaluations, Mandiant, CrowdStrike, Verizon).
417
+ The full XpertSystems.ai synthetic data catalogue spans 41 SKUs across
418
+ Cybersecurity, Healthcare, Insurance & Risk, Oil & Gas, and Materials
419
+ & Energy.
420
+
421
+ - 📧 **pradeep@xpertsystems.ai**
422
+ - 🌐 **https://xpertsystems.ai**
423
+ - 🗂 Dataset: https://huggingface.co/datasets/xpertsystems/cyb003-sample
424
+ - 🤖 Companion models:
425
+ - https://huggingface.co/xpertsystems/cyb001-baseline-classifier (network traffic)
426
+ - https://huggingface.co/xpertsystems/cyb002-baseline-classifier (ATT&CK kill-chain)
427
+
428
+ ## Citation
429
+
430
+ ```bibtex
431
+ @misc{xpertsystems_cyb003_baseline_2026,
432
+ title = {CYB003 Baseline Classifier: XGBoost and MLP for Malware Execution Phase Classification},
433
+ author = {XpertSystems.ai},
434
+ year = {2026},
435
+ url = {https://huggingface.co/xpertsystems/cyb003-baseline-classifier},
436
+ note = {Baseline reference model trained on xpertsystems/cyb003-sample}
437
+ }
438
+ ```
ablation_results.json ADDED
@@ -0,0 +1,804 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "purpose": "Quantify how much each feature group contributes to the headline XGBoost score. Identical architecture, same group-aware split, with one feature group dropped at a time.",
3
+ "full_model_metrics": {
4
+ "model": "xgboost",
5
+ "accuracy": 0.9177777777777778,
6
+ "macro_f1": 0.7780699645112974,
7
+ "weighted_f1": 0.9064879129227142,
8
+ "per_class_f1": {
9
+ "c2_communication": 1.0,
10
+ "data_exfiltration": 0.9699570815450643,
11
+ "dormancy_dwell": 0.5301204819277109,
12
+ "initial_drop": 0.9453125,
13
+ "lateral_movement": 0.9917355371900827,
14
+ "payload_execution": 0.963302752293578,
15
+ "persistence_establishment": 0.9918032786885246,
16
+ "privilege_escalation": 0.9907407407407407,
17
+ "sandbox_evasion_stall": 0.125,
18
+ "self_destruct_cleanup": 0.2727272727272727
19
+ },
20
+ "confusion_matrix": {
21
+ "labels": [
22
+ "c2_communication",
23
+ "data_exfiltration",
24
+ "dormancy_dwell",
25
+ "initial_drop",
26
+ "lateral_movement",
27
+ "payload_execution",
28
+ "persistence_establishment",
29
+ "privilege_escalation",
30
+ "sandbox_evasion_stall",
31
+ "self_destruct_cleanup"
32
+ ],
33
+ "matrix": [
34
+ [
35
+ 108,
36
+ 0,
37
+ 0,
38
+ 0,
39
+ 0,
40
+ 0,
41
+ 0,
42
+ 0,
43
+ 0,
44
+ 0
45
+ ],
46
+ [
47
+ 0,
48
+ 113,
49
+ 0,
50
+ 0,
51
+ 0,
52
+ 0,
53
+ 0,
54
+ 0,
55
+ 0,
56
+ 0
57
+ ],
58
+ [
59
+ 0,
60
+ 4,
61
+ 22,
62
+ 7,
63
+ 0,
64
+ 1,
65
+ 0,
66
+ 0,
67
+ 2,
68
+ 4
69
+ ],
70
+ [
71
+ 0,
72
+ 0,
73
+ 2,
74
+ 121,
75
+ 0,
76
+ 0,
77
+ 0,
78
+ 0,
79
+ 0,
80
+ 0
81
+ ],
82
+ [
83
+ 0,
84
+ 0,
85
+ 0,
86
+ 0,
87
+ 120,
88
+ 0,
89
+ 0,
90
+ 0,
91
+ 0,
92
+ 1
93
+ ],
94
+ [
95
+ 0,
96
+ 0,
97
+ 1,
98
+ 0,
99
+ 0,
100
+ 105,
101
+ 0,
102
+ 0,
103
+ 0,
104
+ 0
105
+ ],
106
+ [
107
+ 0,
108
+ 0,
109
+ 1,
110
+ 0,
111
+ 0,
112
+ 0,
113
+ 121,
114
+ 0,
115
+ 0,
116
+ 0
117
+ ],
118
+ [
119
+ 0,
120
+ 0,
121
+ 0,
122
+ 0,
123
+ 0,
124
+ 0,
125
+ 0,
126
+ 107,
127
+ 0,
128
+ 0
129
+ ],
130
+ [
131
+ 0,
132
+ 0,
133
+ 17,
134
+ 3,
135
+ 0,
136
+ 1,
137
+ 1,
138
+ 2,
139
+ 3,
140
+ 5
141
+ ],
142
+ [
143
+ 0,
144
+ 3,
145
+ 0,
146
+ 2,
147
+ 1,
148
+ 5,
149
+ 0,
150
+ 0,
151
+ 11,
152
+ 6
153
+ ]
154
+ ]
155
+ },
156
+ "macro_roc_auc_ovr": 0.979171667321058
157
+ },
158
+ "ablations": {
159
+ "no_pe_static": {
160
+ "n_features": 58,
161
+ "dropped_count": 11,
162
+ "metrics": {
163
+ "model": "xgboost_no_pe_static",
164
+ "accuracy": 0.9166666666666666,
165
+ "macro_f1": 0.7808429949060417,
166
+ "weighted_f1": 0.9063054516980296,
167
+ "per_class_f1": {
168
+ "c2_communication": 1.0,
169
+ "data_exfiltration": 0.9783549783549783,
170
+ "dormancy_dwell": 0.4675324675324675,
171
+ "initial_drop": 0.9494163424124513,
172
+ "lateral_movement": 0.995850622406639,
173
+ "payload_execution": 0.963302752293578,
174
+ "persistence_establishment": 0.9836065573770492,
175
+ "privilege_escalation": 0.9771689497716894,
176
+ "sandbox_evasion_stall": 0.16666666666666666,
177
+ "self_destruct_cleanup": 0.32653061224489793
178
+ },
179
+ "confusion_matrix": {
180
+ "labels": [
181
+ "c2_communication",
182
+ "data_exfiltration",
183
+ "dormancy_dwell",
184
+ "initial_drop",
185
+ "lateral_movement",
186
+ "payload_execution",
187
+ "persistence_establishment",
188
+ "privilege_escalation",
189
+ "sandbox_evasion_stall",
190
+ "self_destruct_cleanup"
191
+ ],
192
+ "matrix": [
193
+ [
194
+ 108,
195
+ 0,
196
+ 0,
197
+ 0,
198
+ 0,
199
+ 0,
200
+ 0,
201
+ 0,
202
+ 0,
203
+ 0
204
+ ],
205
+ [
206
+ 0,
207
+ 113,
208
+ 0,
209
+ 0,
210
+ 0,
211
+ 0,
212
+ 0,
213
+ 0,
214
+ 0,
215
+ 0
216
+ ],
217
+ [
218
+ 0,
219
+ 3,
220
+ 18,
221
+ 7,
222
+ 0,
223
+ 1,
224
+ 0,
225
+ 0,
226
+ 6,
227
+ 5
228
+ ],
229
+ [
230
+ 0,
231
+ 0,
232
+ 1,
233
+ 122,
234
+ 0,
235
+ 0,
236
+ 0,
237
+ 0,
238
+ 0,
239
+ 0
240
+ ],
241
+ [
242
+ 0,
243
+ 0,
244
+ 0,
245
+ 0,
246
+ 120,
247
+ 0,
248
+ 0,
249
+ 0,
250
+ 0,
251
+ 1
252
+ ],
253
+ [
254
+ 0,
255
+ 0,
256
+ 1,
257
+ 0,
258
+ 0,
259
+ 105,
260
+ 0,
261
+ 0,
262
+ 0,
263
+ 0
264
+ ],
265
+ [
266
+ 0,
267
+ 0,
268
+ 1,
269
+ 0,
270
+ 0,
271
+ 0,
272
+ 120,
273
+ 0,
274
+ 0,
275
+ 1
276
+ ],
277
+ [
278
+ 0,
279
+ 0,
280
+ 0,
281
+ 0,
282
+ 0,
283
+ 0,
284
+ 0,
285
+ 107,
286
+ 0,
287
+ 0
288
+ ],
289
+ [
290
+ 0,
291
+ 0,
292
+ 15,
293
+ 3,
294
+ 0,
295
+ 1,
296
+ 1,
297
+ 2,
298
+ 4,
299
+ 6
300
+ ],
301
+ [
302
+ 0,
303
+ 2,
304
+ 1,
305
+ 2,
306
+ 0,
307
+ 5,
308
+ 1,
309
+ 3,
310
+ 6,
311
+ 8
312
+ ]
313
+ ]
314
+ },
315
+ "macro_roc_auc_ovr": 0.9785892106991877
316
+ },
317
+ "delta_accuracy": 0.0011111111111111738,
318
+ "delta_macro_f1": -0.0027730303947443025
319
+ },
320
+ "no_behavioural": {
321
+ "n_features": 60,
322
+ "dropped_count": 9,
323
+ "metrics": {
324
+ "model": "xgboost_no_behavioural",
325
+ "accuracy": 0.9088888888888889,
326
+ "macro_f1": 0.7578825763491894,
327
+ "weighted_f1": 0.8916039125438652,
328
+ "per_class_f1": {
329
+ "c2_communication": 1.0,
330
+ "data_exfiltration": 0.9372384937238494,
331
+ "dormancy_dwell": 0.463768115942029,
332
+ "initial_drop": 0.9494163424124513,
333
+ "lateral_movement": 0.9596774193548387,
334
+ "payload_execution": 0.9422222222222222,
335
+ "persistence_establishment": 0.9876543209876543,
336
+ "privilege_escalation": 0.9907407407407407,
337
+ "sandbox_evasion_stall": 0.24,
338
+ "self_destruct_cleanup": 0.10810810810810811
339
+ },
340
+ "confusion_matrix": {
341
+ "labels": [
342
+ "c2_communication",
343
+ "data_exfiltration",
344
+ "dormancy_dwell",
345
+ "initial_drop",
346
+ "lateral_movement",
347
+ "payload_execution",
348
+ "persistence_establishment",
349
+ "privilege_escalation",
350
+ "sandbox_evasion_stall",
351
+ "self_destruct_cleanup"
352
+ ],
353
+ "matrix": [
354
+ [
355
+ 108,
356
+ 0,
357
+ 0,
358
+ 0,
359
+ 0,
360
+ 0,
361
+ 0,
362
+ 0,
363
+ 0,
364
+ 0
365
+ ],
366
+ [
367
+ 0,
368
+ 112,
369
+ 1,
370
+ 0,
371
+ 0,
372
+ 0,
373
+ 0,
374
+ 0,
375
+ 0,
376
+ 0
377
+ ],
378
+ [
379
+ 0,
380
+ 6,
381
+ 16,
382
+ 7,
383
+ 2,
384
+ 5,
385
+ 0,
386
+ 0,
387
+ 3,
388
+ 1
389
+ ],
390
+ [
391
+ 0,
392
+ 0,
393
+ 0,
394
+ 122,
395
+ 0,
396
+ 0,
397
+ 0,
398
+ 0,
399
+ 1,
400
+ 0
401
+ ],
402
+ [
403
+ 0,
404
+ 0,
405
+ 0,
406
+ 0,
407
+ 119,
408
+ 0,
409
+ 0,
410
+ 0,
411
+ 1,
412
+ 1
413
+ ],
414
+ [
415
+ 0,
416
+ 0,
417
+ 0,
418
+ 0,
419
+ 0,
420
+ 106,
421
+ 0,
422
+ 0,
423
+ 0,
424
+ 0
425
+ ],
426
+ [
427
+ 0,
428
+ 0,
429
+ 2,
430
+ 0,
431
+ 0,
432
+ 0,
433
+ 120,
434
+ 0,
435
+ 0,
436
+ 0
437
+ ],
438
+ [
439
+ 0,
440
+ 0,
441
+ 0,
442
+ 0,
443
+ 0,
444
+ 0,
445
+ 0,
446
+ 107,
447
+ 0,
448
+ 0
449
+ ],
450
+ [
451
+ 0,
452
+ 2,
453
+ 8,
454
+ 3,
455
+ 2,
456
+ 3,
457
+ 1,
458
+ 2,
459
+ 6,
460
+ 5
461
+ ],
462
+ [
463
+ 0,
464
+ 6,
465
+ 2,
466
+ 2,
467
+ 4,
468
+ 5,
469
+ 0,
470
+ 0,
471
+ 7,
472
+ 2
473
+ ]
474
+ ]
475
+ },
476
+ "macro_roc_auc_ovr": 0.9704768382021074
477
+ },
478
+ "delta_accuracy": 0.008888888888888946,
479
+ "delta_macro_f1": 0.020187388162107966
480
+ },
481
+ "no_timestep": {
482
+ "n_features": 68,
483
+ "dropped_count": 1,
484
+ "metrics": {
485
+ "model": "xgboost_no_timestep",
486
+ "accuracy": 0.6933333333333334,
487
+ "macro_f1": 0.5963303534115096,
488
+ "weighted_f1": 0.6919482762076271,
489
+ "per_class_f1": {
490
+ "c2_communication": 1.0,
491
+ "data_exfiltration": 0.7619047619047619,
492
+ "dormancy_dwell": 0.5882352941176471,
493
+ "initial_drop": 0.5072463768115942,
494
+ "lateral_movement": 0.6985645933014354,
495
+ "payload_execution": 0.5106382978723404,
496
+ "persistence_establishment": 0.8433734939759037,
497
+ "privilege_escalation": 0.9047619047619048,
498
+ "sandbox_evasion_stall": 0.05555555555555555,
499
+ "self_destruct_cleanup": 0.09302325581395349
500
+ },
501
+ "confusion_matrix": {
502
+ "labels": [
503
+ "c2_communication",
504
+ "data_exfiltration",
505
+ "dormancy_dwell",
506
+ "initial_drop",
507
+ "lateral_movement",
508
+ "payload_execution",
509
+ "persistence_establishment",
510
+ "privilege_escalation",
511
+ "sandbox_evasion_stall",
512
+ "self_destruct_cleanup"
513
+ ],
514
+ "matrix": [
515
+ [
516
+ 108,
517
+ 0,
518
+ 0,
519
+ 0,
520
+ 0,
521
+ 0,
522
+ 0,
523
+ 0,
524
+ 0,
525
+ 0
526
+ ],
527
+ [
528
+ 0,
529
+ 96,
530
+ 0,
531
+ 4,
532
+ 9,
533
+ 2,
534
+ 1,
535
+ 0,
536
+ 0,
537
+ 1
538
+ ],
539
+ [
540
+ 0,
541
+ 0,
542
+ 25,
543
+ 10,
544
+ 0,
545
+ 1,
546
+ 0,
547
+ 0,
548
+ 4,
549
+ 0
550
+ ],
551
+ [
552
+ 0,
553
+ 2,
554
+ 6,
555
+ 70,
556
+ 1,
557
+ 12,
558
+ 7,
559
+ 0,
560
+ 22,
561
+ 3
562
+ ],
563
+ [
564
+ 0,
565
+ 39,
566
+ 0,
567
+ 1,
568
+ 73,
569
+ 7,
570
+ 0,
571
+ 1,
572
+ 0,
573
+ 0
574
+ ],
575
+ [
576
+ 0,
577
+ 1,
578
+ 0,
579
+ 37,
580
+ 5,
581
+ 48,
582
+ 2,
583
+ 1,
584
+ 5,
585
+ 7
586
+ ],
587
+ [
588
+ 0,
589
+ 0,
590
+ 1,
591
+ 7,
592
+ 0,
593
+ 2,
594
+ 105,
595
+ 6,
596
+ 1,
597
+ 0
598
+ ],
599
+ [
600
+ 0,
601
+ 0,
602
+ 0,
603
+ 0,
604
+ 0,
605
+ 2,
606
+ 9,
607
+ 95,
608
+ 1,
609
+ 0
610
+ ],
611
+ [
612
+ 0,
613
+ 0,
614
+ 13,
615
+ 12,
616
+ 0,
617
+ 2,
618
+ 1,
619
+ 0,
620
+ 2,
621
+ 2
622
+ ],
623
+ [
624
+ 0,
625
+ 1,
626
+ 0,
627
+ 12,
628
+ 0,
629
+ 6,
630
+ 2,
631
+ 0,
632
+ 5,
633
+ 2
634
+ ]
635
+ ]
636
+ },
637
+ "macro_roc_auc_ovr": 0.9263760295591874
638
+ },
639
+ "delta_accuracy": 0.22444444444444445,
640
+ "delta_macro_f1": 0.18173961109978776
641
+ },
642
+ "no_engineered": {
643
+ "n_features": 63,
644
+ "dropped_count": 6,
645
+ "metrics": {
646
+ "model": "xgboost_no_engineered",
647
+ "accuracy": 0.92,
648
+ "macro_f1": 0.7931081498668057,
649
+ "weighted_f1": 0.9099535506095557,
650
+ "per_class_f1": {
651
+ "c2_communication": 0.9906542056074766,
652
+ "data_exfiltration": 0.9617021276595744,
653
+ "dormancy_dwell": 0.5205479452054794,
654
+ "initial_drop": 0.9534883720930233,
655
+ "lateral_movement": 0.9958847736625515,
656
+ "payload_execution": 0.963302752293578,
657
+ "persistence_establishment": 0.9836065573770492,
658
+ "privilege_escalation": 0.9861751152073732,
659
+ "sandbox_evasion_stall": 0.23529411764705882,
660
+ "self_destruct_cleanup": 0.3404255319148936
661
+ },
662
+ "confusion_matrix": {
663
+ "labels": [
664
+ "c2_communication",
665
+ "data_exfiltration",
666
+ "dormancy_dwell",
667
+ "initial_drop",
668
+ "lateral_movement",
669
+ "payload_execution",
670
+ "persistence_establishment",
671
+ "privilege_escalation",
672
+ "sandbox_evasion_stall",
673
+ "self_destruct_cleanup"
674
+ ],
675
+ "matrix": [
676
+ [
677
+ 106,
678
+ 2,
679
+ 0,
680
+ 0,
681
+ 0,
682
+ 0,
683
+ 0,
684
+ 0,
685
+ 0,
686
+ 0
687
+ ],
688
+ [
689
+ 0,
690
+ 113,
691
+ 0,
692
+ 0,
693
+ 0,
694
+ 0,
695
+ 0,
696
+ 0,
697
+ 0,
698
+ 0
699
+ ],
700
+ [
701
+ 0,
702
+ 4,
703
+ 19,
704
+ 7,
705
+ 0,
706
+ 1,
707
+ 0,
708
+ 0,
709
+ 4,
710
+ 5
711
+ ],
712
+ [
713
+ 0,
714
+ 0,
715
+ 0,
716
+ 123,
717
+ 0,
718
+ 0,
719
+ 0,
720
+ 0,
721
+ 0,
722
+ 0
723
+ ],
724
+ [
725
+ 0,
726
+ 0,
727
+ 0,
728
+ 0,
729
+ 121,
730
+ 0,
731
+ 0,
732
+ 0,
733
+ 0,
734
+ 0
735
+ ],
736
+ [
737
+ 0,
738
+ 0,
739
+ 1,
740
+ 0,
741
+ 0,
742
+ 105,
743
+ 0,
744
+ 0,
745
+ 0,
746
+ 0
747
+ ],
748
+ [
749
+ 0,
750
+ 0,
751
+ 0,
752
+ 0,
753
+ 0,
754
+ 0,
755
+ 120,
756
+ 0,
757
+ 1,
758
+ 1
759
+ ],
760
+ [
761
+ 0,
762
+ 0,
763
+ 0,
764
+ 0,
765
+ 0,
766
+ 0,
767
+ 0,
768
+ 107,
769
+ 0,
770
+ 0
771
+ ],
772
+ [
773
+ 0,
774
+ 0,
775
+ 13,
776
+ 3,
777
+ 0,
778
+ 1,
779
+ 1,
780
+ 3,
781
+ 6,
782
+ 5
783
+ ],
784
+ [
785
+ 0,
786
+ 3,
787
+ 0,
788
+ 2,
789
+ 1,
790
+ 5,
791
+ 1,
792
+ 0,
793
+ 8,
794
+ 8
795
+ ]
796
+ ]
797
+ },
798
+ "macro_roc_auc_ovr": 0.9796965243561164
799
+ },
800
+ "delta_accuracy": -0.0022222222222222365,
801
+ "delta_macro_f1": -0.015038185355508271
802
+ }
803
+ }
804
+ }
feature_engineering.py ADDED
@@ -0,0 +1,325 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ feature_engineering.py
3
+ ======================
4
+
5
+ Feature pipeline for the CYB003 baseline classifier.
6
+
7
+ Predicts `execution_phase` (10-class) from per-timestep malware execution
8
+ telemetry on the CYB003 sample dataset.
9
+
10
+ CSV inputs:
11
+ malware_samples.csv (primary, one row per timestep, 60 timesteps
12
+ per sample, 100 samples = 6000 rows)
13
+ sample_summary.csv (per-sample aggregates; reserved for future
14
+ work — joining inflates per-sample features
15
+ across 60 identical replications, which hurt
16
+ the model in pilot experiments)
17
+ environment_profiles.csv (reserved for future work)
18
+ execution_events.csv (reserved for future work)
19
+
20
+ Target classes (10 execution phases observed in the sample):
21
+ initial_drop, persistence_establishment, privilege_escalation,
22
+ lateral_movement, payload_execution, data_exfiltration,
23
+ c2_communication, dormancy_dwell, sandbox_evasion_stall,
24
+ self_destruct_cleanup
25
+
26
+ This corresponds to the SOC / sandbox-analyst use case: given the malware's
27
+ current behavioural state, what phase of execution is it in? Useful for
28
+ dynamic-analysis tools, EDR phase tagging, and behavioural classifiers.
29
+
30
+ The pivot to execution_phase (away from malware_family) happened because
31
+ malware family classification on n=100 samples with group-aware splitting
32
+ landed at majority-baseline accuracy (~15%, ROC-AUC ~0.58). execution_phase
33
+ sits on 6,000 rows of per-timestep data with strong, stable signal across
34
+ seeds (~91% accuracy, ROC-AUC ~0.98). See the model card for details.
35
+
36
+ Leakage analysis
37
+ ----------------
38
+ No categorical feature has phase->phase purity above 0.17 (uniform random
39
+ baseline is 0.10), so nothing in the data is an oracle for the target.
40
+ The model relies on a mix of `timestep` (strong but not deterministic —
41
+ most phases have tight timestep windows, but `dormancy_dwell`,
42
+ `sandbox_evasion_stall`, and `self_destruct_cleanup` span the full
43
+ 0-59 range) and behavioural features.
44
+
45
+ Public API
46
+ ----------
47
+ build_features(samples_path) -> (X, y, groups, meta)
48
+ transform_single(record, meta) -> np.ndarray
49
+ save_meta(meta, path) / load_meta(path)
50
+
51
+ License
52
+ -------
53
+ Ships with the public model on Hugging Face under CC-BY-NC-4.0, matching
54
+ the dataset license. See README.md.
55
+ """
56
+
57
+ from __future__ import annotations
58
+
59
+ import json
60
+ from pathlib import Path
61
+ from typing import Any
62
+
63
+ import numpy as np
64
+ import pandas as pd
65
+
66
+ # ---------------------------------------------------------------------------
67
+ # Label space
68
+ # ---------------------------------------------------------------------------
69
+
70
+ # Alphabetical for stable indexing.
71
+ LABEL_ORDER = [
72
+ "c2_communication",
73
+ "data_exfiltration",
74
+ "dormancy_dwell",
75
+ "initial_drop",
76
+ "lateral_movement",
77
+ "payload_execution",
78
+ "persistence_establishment",
79
+ "privilege_escalation",
80
+ "sandbox_evasion_stall",
81
+ "self_destruct_cleanup",
82
+ ]
83
+ LABEL_TO_INT = {lbl: i for i, lbl in enumerate(LABEL_ORDER)}
84
+ INT_TO_LABEL = {i: lbl for lbl, i in LABEL_TO_INT.items()}
85
+
86
+ # ---------------------------------------------------------------------------
87
+ # Identifier and target columns - not features
88
+ # ---------------------------------------------------------------------------
89
+
90
+ ID_COLUMNS = ["sample_id", "family_id", "threat_actor_id"]
91
+ TARGET_COLUMN = "execution_phase"
92
+
93
+ # Note: malware_family is kept as a FEATURE for phase prediction (family
94
+ # is a useful observable - a SOC analyst knows what family they're looking
95
+ # at). It's not a leakage source for phase since phase->family purity is
96
+ # only 0.16. Same logic for threat_actor_tier, ep_stack, target_platform -
97
+ # these are environmental context, not oracles for phase.
98
+
99
+ # ---------------------------------------------------------------------------
100
+ # Per-timestep numeric features
101
+ # ---------------------------------------------------------------------------
102
+
103
+ DIRECT_NUMERIC_TIMESTEP_FEATURES = [
104
+ "timestep", # strong but non-deterministic phase signal
105
+ "api_call_rate",
106
+ "registry_write_count",
107
+ "network_connection_count",
108
+ "process_injection_flag",
109
+ "c2_beacon_interval_sec",
110
+ "av_signature_hit_flag",
111
+ "sandbox_evasion_flag",
112
+ "lateral_propagation_count",
113
+ "privilege_escalation_flag",
114
+ # PE static features (constant per sample but informative for phase
115
+ # given that the model sees these alongside per-step behaviour)
116
+ "pe_entropy_mean",
117
+ "pe_entropy_std",
118
+ "import_hash_cluster",
119
+ "section_count",
120
+ "packed_section_ratio",
121
+ "string_entropy_mean",
122
+ "byte_histogram_chi2",
123
+ "code_section_rx_ratio",
124
+ "resource_section_entropy",
125
+ "suspicious_import_count",
126
+ "packer_detected_flag",
127
+ ]
128
+
129
+ CATEGORICAL_TIMESTEP_FEATURES = [
130
+ "malware_family", # kept as feature: phase prediction conditions
131
+ # on family (a known observable in SOC workflows)
132
+ "threat_actor_tier",
133
+ "target_platform",
134
+ "obfuscation_technique",
135
+ "detection_outcome",
136
+ "ep_stack",
137
+ ]
138
+
139
+ # ---------------------------------------------------------------------------
140
+ # Engineered features (none derived from phase or timestep alone)
141
+ # ---------------------------------------------------------------------------
142
+
143
+ def _add_engineered_features(df: pd.DataFrame) -> pd.DataFrame:
144
+ """
145
+ Six engineered features. None directly encode phase (that would be
146
+ a tautology); each is a behavioural composite that disambiguates
147
+ phases sharing similar timestep ranges.
148
+ """
149
+ df = df.copy()
150
+
151
+ # 1. API burst score: high for execution-heavy phases (payload_execution,
152
+ # privilege_escalation), low for stealth phases (dormancy, evasion).
153
+ df["api_burst_score"] = (
154
+ df["api_call_rate"] * df["registry_write_count"].clip(upper=50)
155
+ ).astype(float)
156
+
157
+ # 2. C2 active flag: positive c2_beacon_interval_sec indicates active
158
+ # beaconing. Strongly correlates with c2_communication phase.
159
+ df["is_c2_active"] = (df["c2_beacon_interval_sec"] > 0).astype(int)
160
+
161
+ # 3. High network volume step: above-threshold connection count, common
162
+ # in lateral_movement, data_exfiltration, c2_communication.
163
+ df["is_high_net_volume"] = (df["network_connection_count"] > 5).astype(int)
164
+
165
+ # 4. Stealth indicator: low api_call_rate AND no AV/sandbox hit. Used
166
+ # to disambiguate dormancy_dwell / sandbox_evasion_stall from active
167
+ # phases that happen to land in similar timestep windows.
168
+ df["is_stealth_step"] = (
169
+ (df["api_call_rate"] < 5)
170
+ & (df["av_signature_hit_flag"] == 0)
171
+ & (df["sandbox_evasion_flag"] == 0)
172
+ ).astype(int)
173
+
174
+ # 5. Destructive action indicator: combines privilege escalation flag
175
+ # and registry-write count. High in persistence_establishment and
176
+ # self_destruct_cleanup.
177
+ df["is_destructive_step"] = (
178
+ (df["privilege_escalation_flag"] == 1)
179
+ | (df["registry_write_count"] > 10)
180
+ ).astype(int)
181
+
182
+ # 6. Lateral activity: network connections combined with lateral_propagation
183
+ # count > 0. Distinguishes lateral_movement from other network phases.
184
+ df["lateral_activity_score"] = (
185
+ df["lateral_propagation_count"] * df["network_connection_count"]
186
+ ).astype(float)
187
+
188
+ return df
189
+
190
+
191
+ # ---------------------------------------------------------------------------
192
+ # Public API
193
+ # ---------------------------------------------------------------------------
194
+
195
+ def build_features(
196
+ samples_path: str | Path,
197
+ ) -> tuple[pd.DataFrame, pd.Series, pd.Series, dict[str, Any]]:
198
+ """
199
+ Load CSV, drop identifier columns and target, engineer features,
200
+ one-hot encode, return (X, y, groups, meta).
201
+
202
+ `groups` is a Series of sample_id values aligned with X. Use it
203
+ with GroupShuffleSplit / GroupKFold: a single sample contains 60
204
+ correlated timesteps, and row-level random splitting inflates metrics.
205
+ """
206
+ samples = pd.read_csv(samples_path)
207
+
208
+ # Extract target + groups
209
+ y = samples[TARGET_COLUMN].map(LABEL_TO_INT)
210
+ if y.isna().any():
211
+ bad = samples.loc[y.isna(), TARGET_COLUMN].unique()
212
+ raise ValueError(f"Unknown execution_phase values: {bad}")
213
+ y = y.astype(int)
214
+ groups = samples["sample_id"].copy()
215
+
216
+ # Drop target + identifiers from feature pool
217
+ samples = samples.drop(columns=ID_COLUMNS + [TARGET_COLUMN], errors="ignore")
218
+
219
+ # Engineered features
220
+ samples = _add_engineered_features(samples)
221
+
222
+ # Numeric features
223
+ numeric_features = (
224
+ DIRECT_NUMERIC_TIMESTEP_FEATURES
225
+ + [
226
+ "api_burst_score", "is_c2_active", "is_high_net_volume",
227
+ "is_stealth_step", "is_destructive_step", "lateral_activity_score",
228
+ ]
229
+ )
230
+ X_numeric = samples[numeric_features].astype(float)
231
+
232
+ # One-hot categoricals
233
+ categorical_levels: dict[str, list[str]] = {}
234
+ blocks: list[pd.DataFrame] = []
235
+ for col in CATEGORICAL_TIMESTEP_FEATURES:
236
+ if col not in samples.columns:
237
+ continue
238
+ levels = sorted(samples[col].dropna().unique().tolist())
239
+ categorical_levels[col] = levels
240
+ block = pd.get_dummies(
241
+ samples[col].astype("category").cat.set_categories(levels),
242
+ prefix=col, dummy_na=False,
243
+ ).astype(int)
244
+ blocks.append(block)
245
+
246
+ X = pd.concat(
247
+ [X_numeric.reset_index(drop=True)]
248
+ + [b.reset_index(drop=True) for b in blocks],
249
+ axis=1,
250
+ ).fillna(0.0)
251
+
252
+ meta = {
253
+ "feature_names": X.columns.tolist(),
254
+ "numeric_features": numeric_features,
255
+ "categorical_levels": categorical_levels,
256
+ "label_to_int": LABEL_TO_INT,
257
+ "int_to_label": INT_TO_LABEL,
258
+ }
259
+ return X, y, groups, meta
260
+
261
+
262
+ def transform_single(
263
+ record: dict | pd.DataFrame,
264
+ meta: dict[str, Any],
265
+ ) -> np.ndarray:
266
+ """Encode a single timestep record for inference."""
267
+ if isinstance(record, dict):
268
+ df = pd.DataFrame([record.copy()])
269
+ else:
270
+ df = record.copy()
271
+
272
+ df = _add_engineered_features(df)
273
+
274
+ numeric = pd.DataFrame({
275
+ col: df.get(col, pd.Series([0.0] * len(df))).astype(float).values
276
+ for col in meta["numeric_features"]
277
+ })
278
+ blocks: list[pd.DataFrame] = [numeric]
279
+ for col, levels in meta["categorical_levels"].items():
280
+ val = df.get(col, pd.Series([None] * len(df)))
281
+ block = pd.get_dummies(
282
+ val.astype("category").cat.set_categories(levels),
283
+ prefix=col, dummy_na=False,
284
+ ).astype(int)
285
+ for lvl in levels:
286
+ cname = f"{col}_{lvl}"
287
+ if cname not in block.columns:
288
+ block[cname] = 0
289
+ block = block[[f"{col}_{lvl}" for lvl in levels]]
290
+ blocks.append(block)
291
+
292
+ X = pd.concat(blocks, axis=1).fillna(0.0)
293
+ X = X.reindex(columns=meta["feature_names"], fill_value=0.0)
294
+ return X.values.astype(np.float32)
295
+
296
+
297
+ def save_meta(meta: dict[str, Any], path: str | Path) -> None:
298
+ serializable = {
299
+ "feature_names": meta["feature_names"],
300
+ "numeric_features": meta["numeric_features"],
301
+ "categorical_levels": meta["categorical_levels"],
302
+ "label_to_int": meta["label_to_int"],
303
+ "int_to_label": {str(k): v for k, v in meta["int_to_label"].items()},
304
+ }
305
+ with open(path, "w") as f:
306
+ json.dump(serializable, f, indent=2)
307
+
308
+
309
+ def load_meta(path: str | Path) -> dict[str, Any]:
310
+ with open(path) as f:
311
+ meta = json.load(f)
312
+ meta["int_to_label"] = {int(k): v for k, v in meta["int_to_label"].items()}
313
+ return meta
314
+
315
+
316
+ if __name__ == "__main__":
317
+ import sys
318
+ base = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("/mnt/user-data/uploads")
319
+ X, y, groups, meta = build_features(base / "malware_samples.csv")
320
+ print(f"X shape: {X.shape}")
321
+ print(f"y shape: {y.shape}")
322
+ print(f"groups: {groups.nunique()} samples")
323
+ print(f"n features: {len(meta['feature_names'])}")
324
+ print(f"label distribution:\n{y.map(INT_TO_LABEL).value_counts()}")
325
+ print(f"X has NaN: {X.isnull().any().any()}")
feature_meta.json ADDED
@@ -0,0 +1,182 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "feature_names": [
3
+ "timestep",
4
+ "api_call_rate",
5
+ "registry_write_count",
6
+ "network_connection_count",
7
+ "process_injection_flag",
8
+ "c2_beacon_interval_sec",
9
+ "av_signature_hit_flag",
10
+ "sandbox_evasion_flag",
11
+ "lateral_propagation_count",
12
+ "privilege_escalation_flag",
13
+ "pe_entropy_mean",
14
+ "pe_entropy_std",
15
+ "import_hash_cluster",
16
+ "section_count",
17
+ "packed_section_ratio",
18
+ "string_entropy_mean",
19
+ "byte_histogram_chi2",
20
+ "code_section_rx_ratio",
21
+ "resource_section_entropy",
22
+ "suspicious_import_count",
23
+ "packer_detected_flag",
24
+ "api_burst_score",
25
+ "is_c2_active",
26
+ "is_high_net_volume",
27
+ "is_stealth_step",
28
+ "is_destructive_step",
29
+ "lateral_activity_score",
30
+ "malware_family_apt_implant",
31
+ "malware_family_botnet_agent",
32
+ "malware_family_cryptominer",
33
+ "malware_family_dropper",
34
+ "malware_family_fileless_malware",
35
+ "malware_family_ransomware",
36
+ "malware_family_rootkit",
37
+ "malware_family_spyware",
38
+ "malware_family_trojan",
39
+ "malware_family_worm",
40
+ "threat_actor_tier_apt",
41
+ "threat_actor_tier_commodity",
42
+ "threat_actor_tier_crimeware",
43
+ "threat_actor_tier_nation_state",
44
+ "target_platform_android_13",
45
+ "target_platform_embedded_ot_firmware",
46
+ "target_platform_linux_rhel_9",
47
+ "target_platform_linux_ubuntu_22",
48
+ "target_platform_macos_ventura",
49
+ "target_platform_windows_10_enterprise",
50
+ "target_platform_windows_11_pro",
51
+ "target_platform_windows_server_2022",
52
+ "obfuscation_technique_anti_analysis_stall",
53
+ "obfuscation_technique_code_signing_abuse",
54
+ "obfuscation_technique_lotl_binary",
55
+ "obfuscation_technique_packing",
56
+ "obfuscation_technique_polymorphic_mutation",
57
+ "obfuscation_technique_sandbox_evasion",
58
+ "obfuscation_technique_string_encryption",
59
+ "detection_outcome_behavioural_flag",
60
+ "detection_outcome_definitive_detection",
61
+ "detection_outcome_heuristic_alert",
62
+ "detection_outcome_sandbox_evasion_confirmed",
63
+ "detection_outcome_signature_miss",
64
+ "ep_stack_av_plus_firewall",
65
+ "ep_stack_deception_honeypot",
66
+ "ep_stack_edr_endpoint_detect",
67
+ "ep_stack_legacy_av_only",
68
+ "ep_stack_managed_detection_response",
69
+ "ep_stack_ngav_ml_based",
70
+ "ep_stack_no_protection",
71
+ "ep_stack_xdr_extended_detect"
72
+ ],
73
+ "numeric_features": [
74
+ "timestep",
75
+ "api_call_rate",
76
+ "registry_write_count",
77
+ "network_connection_count",
78
+ "process_injection_flag",
79
+ "c2_beacon_interval_sec",
80
+ "av_signature_hit_flag",
81
+ "sandbox_evasion_flag",
82
+ "lateral_propagation_count",
83
+ "privilege_escalation_flag",
84
+ "pe_entropy_mean",
85
+ "pe_entropy_std",
86
+ "import_hash_cluster",
87
+ "section_count",
88
+ "packed_section_ratio",
89
+ "string_entropy_mean",
90
+ "byte_histogram_chi2",
91
+ "code_section_rx_ratio",
92
+ "resource_section_entropy",
93
+ "suspicious_import_count",
94
+ "packer_detected_flag",
95
+ "api_burst_score",
96
+ "is_c2_active",
97
+ "is_high_net_volume",
98
+ "is_stealth_step",
99
+ "is_destructive_step",
100
+ "lateral_activity_score"
101
+ ],
102
+ "categorical_levels": {
103
+ "malware_family": [
104
+ "apt_implant",
105
+ "botnet_agent",
106
+ "cryptominer",
107
+ "dropper",
108
+ "fileless_malware",
109
+ "ransomware",
110
+ "rootkit",
111
+ "spyware",
112
+ "trojan",
113
+ "worm"
114
+ ],
115
+ "threat_actor_tier": [
116
+ "apt",
117
+ "commodity",
118
+ "crimeware",
119
+ "nation_state"
120
+ ],
121
+ "target_platform": [
122
+ "android_13",
123
+ "embedded_ot_firmware",
124
+ "linux_rhel_9",
125
+ "linux_ubuntu_22",
126
+ "macos_ventura",
127
+ "windows_10_enterprise",
128
+ "windows_11_pro",
129
+ "windows_server_2022"
130
+ ],
131
+ "obfuscation_technique": [
132
+ "anti_analysis_stall",
133
+ "code_signing_abuse",
134
+ "lotl_binary",
135
+ "packing",
136
+ "polymorphic_mutation",
137
+ "sandbox_evasion",
138
+ "string_encryption"
139
+ ],
140
+ "detection_outcome": [
141
+ "behavioural_flag",
142
+ "definitive_detection",
143
+ "heuristic_alert",
144
+ "sandbox_evasion_confirmed",
145
+ "signature_miss"
146
+ ],
147
+ "ep_stack": [
148
+ "av_plus_firewall",
149
+ "deception_honeypot",
150
+ "edr_endpoint_detect",
151
+ "legacy_av_only",
152
+ "managed_detection_response",
153
+ "ngav_ml_based",
154
+ "no_protection",
155
+ "xdr_extended_detect"
156
+ ]
157
+ },
158
+ "label_to_int": {
159
+ "c2_communication": 0,
160
+ "data_exfiltration": 1,
161
+ "dormancy_dwell": 2,
162
+ "initial_drop": 3,
163
+ "lateral_movement": 4,
164
+ "payload_execution": 5,
165
+ "persistence_establishment": 6,
166
+ "privilege_escalation": 7,
167
+ "sandbox_evasion_stall": 8,
168
+ "self_destruct_cleanup": 9
169
+ },
170
+ "int_to_label": {
171
+ "0": "c2_communication",
172
+ "1": "data_exfiltration",
173
+ "2": "dormancy_dwell",
174
+ "3": "initial_drop",
175
+ "4": "lateral_movement",
176
+ "5": "payload_execution",
177
+ "6": "persistence_establishment",
178
+ "7": "privilege_escalation",
179
+ "8": "sandbox_evasion_stall",
180
+ "9": "self_destruct_cleanup"
181
+ }
182
+ }
feature_scaler.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"mean": [29.5, 1.387591811594203, 2.5253623188405796, 4.403140096618357, 0.2543478260869565, 4.994391304347825, 0.29347826086956524, 0.34299516908212563, 0.03768115942028986, 0.08140096618357488, 0.8287420289855073, 0.18634782608695652, 274.6231884057971, 5.681159420289855, 0.42982463768115947, 0.5421188405797103, 41.10072463768116, 0.6250057971014492, 0.4523652173913043, 15.695652173913043, 0.463768115942029, 3.524582415458937, 0.11884057971014493, 0.33357487922705314, 0.45193236714975843, 0.0929951690821256, 0.3280193236714976, 0.13043478260869565, 0.13043478260869565, 0.13043478260869565, 0.07246376811594203, 0.057971014492753624, 0.08695652173913043, 0.08695652173913043, 0.13043478260869565, 0.08695652173913043, 0.08695652173913043, 0.21739130434782608, 0.3188405797101449, 0.42028985507246375, 0.043478260869565216, 0.08695652173913043, 0.057971014492753624, 0.07246376811594203, 0.11594202898550725, 0.057971014492753624, 0.3333333333333333, 0.13043478260869565, 0.14492753623188406, 0.14033816425120774, 0.14347826086956522, 0.1427536231884058, 0.14009661835748793, 0.15144927536231884, 0.14299516908212562, 0.1388888888888889, 0.0678743961352657, 0.17922705314009663, 0.08888888888888889, 0.10458937198067633, 0.5594202898550724, 0.11594202898550725, 0.11594202898550725, 0.08695652173913043, 0.15942028985507245, 0.14492753623188406, 0.15942028985507245, 0.13043478260869565, 0.08695652173913043], "std": [17.320194219715013, 0.13486579618110528, 2.8224558127303947, 3.855826464428149, 0.43554658867741924, 16.522749180589745, 0.45541065821011956, 0.4747672360871146, 0.22207359173815253, 0.2734829333055482, 0.13349684203848783, 0.0690646442535872, 164.83751594213814, 2.0467553940561625, 0.29063174139334635, 0.14071160667415852, 19.031317203687976, 0.16348965303394314, 0.17541357294450965, 5.309382618360122, 0.4987457613602604, 3.9756334300787786, 0.32363991799019004, 0.4715468040908369, 0.4977442571333736, 0.2904607481566321, 2.0197472660492055, 0.33682184196295206, 0.3368218419629521, 0.33682184196295206, 0.25928557483500797, 0.23371685876394413, 0.2818053712339797, 0.2818053712339797, 0.3368218419629521, 0.2818053712339797, 0.2818053712339797, 0.41252082351679387, 0.4660834006454619, 0.49366502689172936, 0.20395575381738024, 0.2818053712339797, 0.23371685876394416, 0.2592855748350079, 0.3201940649187907, 0.2337168587639441, 0.4714614640201808, 0.33682184196295206, 0.3520702854959198, 0.34737949256617373, 0.35060225443864834, 0.3498636771396811, 0.34712917169153373, 0.35852955456280744, 0.350110209377919, 0.34587231893054005, 0.25156062524270983, 0.3835886568811166, 0.2846176756328569, 0.30606055216695915, 0.4965166433941038, 0.3201940649187907, 0.32019406491879077, 0.2818053712339797, 0.3661117825566483, 0.3520702854959198, 0.36611178255664834, 0.3368218419629521, 0.2818053712339797]}
inference_example.ipynb ADDED
@@ -0,0 +1,314 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# CYB003 Baseline Classifier — Inference Example\n",
8
+ "\n",
9
+ "End-to-end demo: load the trained XGBoost and PyTorch MLP models from the Hugging Face repo and predict the **malware execution phase** of a new per-timestep telemetry record.\n",
10
+ "\n",
11
+ "**Models predict one of 10 phases:** `c2_communication`, `data_exfiltration`, `dormancy_dwell`, `initial_drop`, `lateral_movement`, `payload_execution`, `persistence_establishment`, `privilege_escalation`, `sandbox_evasion_stall`, `self_destruct_cleanup`.\n",
12
+ "\n",
13
+ "**This is a baseline reference model**, not a production sandbox or EDR. See the model card for full metrics and limitations."
14
+ ]
15
+ },
16
+ {
17
+ "cell_type": "markdown",
18
+ "metadata": {},
19
+ "source": [
20
+ "## 1. Install dependencies"
21
+ ]
22
+ },
23
+ {
24
+ "cell_type": "code",
25
+ "execution_count": null,
26
+ "metadata": {},
27
+ "outputs": [],
28
+ "source": [
29
+ "%pip install --quiet xgboost torch safetensors pandas numpy huggingface_hub"
30
+ ]
31
+ },
32
+ {
33
+ "cell_type": "markdown",
34
+ "metadata": {},
35
+ "source": [
36
+ "## 2. Download model artifacts from Hugging Face\n",
37
+ "\n",
38
+ "Five files are needed:\n",
39
+ "- `model_xgb.json` — XGBoost weights\n",
40
+ "- `model_mlp.safetensors` — PyTorch MLP weights\n",
41
+ "- `feature_engineering.py` — feature pipeline (must match training)\n",
42
+ "- `feature_meta.json` — feature column order + categorical levels\n",
43
+ "- `feature_scaler.json` — MLP input standardization"
44
+ ]
45
+ },
46
+ {
47
+ "cell_type": "code",
48
+ "execution_count": null,
49
+ "metadata": {},
50
+ "outputs": [],
51
+ "source": [
52
+ "from huggingface_hub import hf_hub_download\n",
53
+ "\n",
54
+ "REPO_ID = \"xpertsystems/cyb003-baseline-classifier\"\n",
55
+ "\n",
56
+ "files = {}\n",
57
+ "for name in [\"model_xgb.json\", \"model_mlp.safetensors\",\n",
58
+ " \"feature_engineering.py\", \"feature_meta.json\",\n",
59
+ " \"feature_scaler.json\"]:\n",
60
+ " files[name] = hf_hub_download(repo_id=REPO_ID, filename=name)\n",
61
+ " print(f\" downloaded: {name}\")"
62
+ ]
63
+ },
64
+ {
65
+ "cell_type": "code",
66
+ "execution_count": null,
67
+ "metadata": {},
68
+ "outputs": [],
69
+ "source": [
70
+ "import sys, os\n",
71
+ "fe_dir = os.path.dirname(files[\"feature_engineering.py\"])\n",
72
+ "if fe_dir not in sys.path:\n",
73
+ " sys.path.insert(0, fe_dir)\n",
74
+ "\n",
75
+ "from feature_engineering import transform_single, load_meta, INT_TO_LABEL"
76
+ ]
77
+ },
78
+ {
79
+ "cell_type": "markdown",
80
+ "metadata": {},
81
+ "source": [
82
+ "## 3. Load models and metadata"
83
+ ]
84
+ },
85
+ {
86
+ "cell_type": "code",
87
+ "execution_count": null,
88
+ "metadata": {},
89
+ "outputs": [],
90
+ "source": [
91
+ "import json\n",
92
+ "import numpy as np\n",
93
+ "import torch\n",
94
+ "import torch.nn as nn\n",
95
+ "import xgboost as xgb\n",
96
+ "from safetensors.torch import load_file\n",
97
+ "\n",
98
+ "meta = load_meta(files[\"feature_meta.json\"])\n",
99
+ "with open(files[\"feature_scaler.json\"]) as f:\n",
100
+ " scaler = json.load(f)\n",
101
+ "\n",
102
+ "N_FEATURES = len(meta[\"feature_names\"])\n",
103
+ "N_CLASSES = len(meta[\"int_to_label\"])\n",
104
+ "print(f\"feature count: {N_FEATURES}\")\n",
105
+ "print(f\"class count: {N_CLASSES}\")\n",
106
+ "print(f\"label classes: {list(meta['int_to_label'].values())}\")"
107
+ ]
108
+ },
109
+ {
110
+ "cell_type": "code",
111
+ "execution_count": null,
112
+ "metadata": {},
113
+ "outputs": [],
114
+ "source": [
115
+ "# XGBoost\n",
116
+ "xgb_model = xgb.XGBClassifier()\n",
117
+ "xgb_model.load_model(files[\"model_xgb.json\"])\n",
118
+ "\n",
119
+ "# MLP architecture (must match training)\n",
120
+ "class PhaseMLP(nn.Module):\n",
121
+ " def __init__(self, n_features, n_classes=10, hidden1=128, hidden2=64, dropout=0.3):\n",
122
+ " super().__init__()\n",
123
+ " self.net = nn.Sequential(\n",
124
+ " nn.Linear(n_features, hidden1),\n",
125
+ " nn.BatchNorm1d(hidden1),\n",
126
+ " nn.ReLU(),\n",
127
+ " nn.Dropout(dropout),\n",
128
+ " nn.Linear(hidden1, hidden2),\n",
129
+ " nn.BatchNorm1d(hidden2),\n",
130
+ " nn.ReLU(),\n",
131
+ " nn.Dropout(dropout),\n",
132
+ " nn.Linear(hidden2, n_classes),\n",
133
+ " )\n",
134
+ " def forward(self, x):\n",
135
+ " return self.net(x)\n",
136
+ "\n",
137
+ "mlp_model = PhaseMLP(N_FEATURES, n_classes=N_CLASSES)\n",
138
+ "mlp_model.load_state_dict(load_file(files[\"model_mlp.safetensors\"]))\n",
139
+ "mlp_model.eval()\n",
140
+ "print(\"models loaded\")"
141
+ ]
142
+ },
143
+ {
144
+ "cell_type": "markdown",
145
+ "metadata": {},
146
+ "source": [
147
+ "## 4. Prediction helper"
148
+ ]
149
+ },
150
+ {
151
+ "cell_type": "code",
152
+ "execution_count": null,
153
+ "metadata": {},
154
+ "outputs": [],
155
+ "source": [
156
+ "MU = np.array(scaler[\"mean\"], dtype=np.float32)\n",
157
+ "SD = np.array(scaler[\"std\"], dtype=np.float32)\n",
158
+ "\n",
159
+ "def predict_phase(record: dict) -> dict:\n",
160
+ " \"\"\"Predict the execution phase for one per-timestep telemetry record.\n",
161
+ "\n",
162
+ " Returns a dict with both models' predictions and per-class probabilities.\n",
163
+ " \"\"\"\n",
164
+ " X = transform_single(record, meta)\n",
165
+ "\n",
166
+ " xgb_proba = xgb_model.predict_proba(X)[0]\n",
167
+ " xgb_label = INT_TO_LABEL[int(np.argmax(xgb_proba))]\n",
168
+ "\n",
169
+ " Xs = ((X - MU) / SD).astype(np.float32)\n",
170
+ " with torch.no_grad():\n",
171
+ " logits = mlp_model(torch.tensor(Xs))\n",
172
+ " mlp_proba = torch.softmax(logits, dim=1).numpy()[0]\n",
173
+ " mlp_label = INT_TO_LABEL[int(np.argmax(mlp_proba))]\n",
174
+ "\n",
175
+ " return {\n",
176
+ " \"xgboost\": {\n",
177
+ " \"label\": xgb_label,\n",
178
+ " \"probabilities\": {INT_TO_LABEL[i]: float(p) for i, p in enumerate(xgb_proba)},\n",
179
+ " },\n",
180
+ " \"mlp\": {\n",
181
+ " \"label\": mlp_label,\n",
182
+ " \"probabilities\": {INT_TO_LABEL[i]: float(p) for i, p in enumerate(mlp_proba)},\n",
183
+ " },\n",
184
+ " }"
185
+ ]
186
+ },
187
+ {
188
+ "cell_type": "markdown",
189
+ "metadata": {},
190
+ "source": [
191
+ "## 5. Run on an example record\n",
192
+ "\n",
193
+ "Real `lateral_movement` event lifted from the sample dataset: an APT-tier cryptominer at timestep 26 propagating laterally with 2 propagation events and 10 network connections. Both models should predict `lateral_movement`."
194
+ ]
195
+ },
196
+ {
197
+ "cell_type": "code",
198
+ "execution_count": null,
199
+ "metadata": {},
200
+ "outputs": [],
201
+ "source": [
202
+ "# Real timestep record from the sample dataset (true phase: lateral_movement)\n",
203
+ "example_record = {\n",
204
+ " \"timestep\": 26,\n",
205
+ " \"malware_family\": \"cryptominer\",\n",
206
+ " \"threat_actor_tier\": \"apt\",\n",
207
+ " \"target_platform\": \"windows_10_enterprise\",\n",
208
+ " \"obfuscation_technique\": \"code_signing_abuse\",\n",
209
+ " \"api_call_rate\": 1.4167,\n",
210
+ " \"registry_write_count\": 0,\n",
211
+ " \"network_connection_count\": 10,\n",
212
+ " \"process_injection_flag\": 1,\n",
213
+ " \"c2_beacon_interval_sec\": 0.0,\n",
214
+ " \"detection_outcome\": \"signature_miss\",\n",
215
+ " \"av_signature_hit_flag\": 0,\n",
216
+ " \"sandbox_evasion_flag\": 0,\n",
217
+ " \"lateral_propagation_count\": 2,\n",
218
+ " \"privilege_escalation_flag\": 0,\n",
219
+ " \"ep_stack\": \"deception_honeypot\",\n",
220
+ " \"pe_entropy_mean\": 0.8336,\n",
221
+ " \"pe_entropy_std\": 0.25,\n",
222
+ " \"import_hash_cluster\": 498,\n",
223
+ " \"section_count\": 2,\n",
224
+ " \"packed_section_ratio\": 0.7558,\n",
225
+ " \"string_entropy_mean\": 0.5727,\n",
226
+ " \"byte_histogram_chi2\": 45.52,\n",
227
+ " \"code_section_rx_ratio\": 0.3628,\n",
228
+ " \"resource_section_entropy\": 0.4418,\n",
229
+ " \"suspicious_import_count\": 11,\n",
230
+ " \"packer_detected_flag\": 1,\n",
231
+ "}\n",
232
+ "\n",
233
+ "result = predict_phase(example_record)\n",
234
+ "\n",
235
+ "print(f\"XGBoost -> {result['xgboost']['label']}\")\n",
236
+ "for lbl, p in sorted(result['xgboost']['probabilities'].items(), key=lambda x: -x[1])[:5]:\n",
237
+ " print(f\" P({lbl:30s}) = {p:.4f}\")\n",
238
+ "\n",
239
+ "print(f\"\\nMLP -> {result['mlp']['label']}\")\n",
240
+ "for lbl, p in sorted(result['mlp']['probabilities'].items(), key=lambda x: -x[1])[:5]:\n",
241
+ " print(f\" P({lbl:30s}) = {p:.4f}\")"
242
+ ]
243
+ },
244
+ {
245
+ "cell_type": "markdown",
246
+ "metadata": {},
247
+ "source": [
248
+ "### Note: when the two models disagree\n",
249
+ "\n",
250
+ "XGBoost and the MLP can disagree on records far from the training-data manifold or in the three phases the baseline finds genuinely hard (`dormancy_dwell`, `sandbox_evasion_stall`, `self_destruct_cleanup`, each spanning the full timestep range). Disagreement is a useful signal: hand those cases to a human analyst or to a more expensive sequence-based detector."
251
+ ]
252
+ },
253
+ {
254
+ "cell_type": "markdown",
255
+ "metadata": {},
256
+ "source": [
257
+ "## 6. Batch prediction on the sample dataset"
258
+ ]
259
+ },
260
+ {
261
+ "cell_type": "code",
262
+ "execution_count": null,
263
+ "metadata": {},
264
+ "outputs": [],
265
+ "source": [
266
+ "from huggingface_hub import snapshot_download\n",
267
+ "import pandas as pd\n",
268
+ "\n",
269
+ "ds_path = snapshot_download(repo_id=\"xpertsystems/cyb003-sample\", repo_type=\"dataset\")\n",
270
+ "samples = pd.read_csv(f\"{ds_path}/malware_samples.csv\")\n",
271
+ "\n",
272
+ "# Score the first 200 timesteps\n",
273
+ "sample = samples.head(200).copy()\n",
274
+ "preds = [predict_phase(row.to_dict())[\"xgboost\"][\"label\"] for _, row in sample.iterrows()]\n",
275
+ "sample[\"xgb_pred\"] = preds\n",
276
+ "\n",
277
+ "ct = pd.crosstab(sample[\"execution_phase\"], sample[\"xgb_pred\"],\n",
278
+ " rownames=[\"true\"], colnames=[\"pred\"])\n",
279
+ "print(\"Confusion on first 200 sample rows (XGBoost):\")\n",
280
+ "print(ct)\n",
281
+ "acc = (sample[\"execution_phase\"] == sample[\"xgb_pred\"]).mean()\n",
282
+ "print(f\"\\nbatch accuracy on first 200 rows (in-distribution): {acc:.4f}\")\n",
283
+ "print(\"\\nNote: these rows include training-set samples. See validation_results.json\\n\"\n",
284
+ " \"for proper held-out test metrics from disjoint samples.\")"
285
+ ]
286
+ },
287
+ {
288
+ "cell_type": "markdown",
289
+ "metadata": {},
290
+ "source": [
291
+ "## 7. Next steps\n",
292
+ "\n",
293
+ "- See `validation_results.json` for held-out test metrics (15 disjoint samples, 900 timesteps).\n",
294
+ "- See `multi_seed_results.json` for the across-10-seeds robustness picture (accuracy 0.905 ± 0.010).\n",
295
+ "- See `ablation_results.json` for per-feature-group contribution. `timestep` carries the dominant signal — kill chains progress in time, malware execution does too.\n",
296
+ "- The model card's **Limitations** section explains why `dormancy_dwell`, `sandbox_evasion_stall`, and `self_destruct_cleanup` are hard.\n",
297
+ "- For the full 280k-row CYB003 dataset and commercial licensing, contact **pradeep@xpertsystems.ai**."
298
+ ]
299
+ }
300
+ ],
301
+ "metadata": {
302
+ "kernelspec": {
303
+ "display_name": "Python 3",
304
+ "language": "python",
305
+ "name": "python3"
306
+ },
307
+ "language_info": {
308
+ "name": "python",
309
+ "version": "3.10"
310
+ }
311
+ },
312
+ "nbformat": 4,
313
+ "nbformat_minor": 5
314
+ }
model_mlp.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5137ad720cf14877439db2fe50e5df589c6e2cbcc7598cc332548922bd5f8369
3
+ size 75760
model_xgb.json ADDED
The diff for this file is too large to render. See raw diff
 
multi_seed_results.json ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "purpose": "With n=100 samples and 10 classes, single-seed metrics carry test-fold variance. Multi-seed evaluation gives a more reliable performance picture.",
3
+ "seeds_evaluated": [
4
+ 42,
5
+ 7,
6
+ 13,
7
+ 17,
8
+ 23,
9
+ 31,
10
+ 45,
11
+ 99,
12
+ 123,
13
+ 200
14
+ ],
15
+ "per_seed": [
16
+ {
17
+ "seed": 42,
18
+ "test_n_classes": 10,
19
+ "accuracy": 0.9177777777777778,
20
+ "macro_f1": 0.7780699645112974,
21
+ "macro_roc_auc_ovr": 0.979171667321058
22
+ },
23
+ {
24
+ "seed": 7,
25
+ "test_n_classes": 10,
26
+ "accuracy": 0.8988888888888888,
27
+ "macro_f1": 0.7959031264581272,
28
+ "macro_roc_auc_ovr": 0.9762003477988086
29
+ },
30
+ {
31
+ "seed": 13,
32
+ "test_n_classes": 10,
33
+ "accuracy": 0.9077777777777778,
34
+ "macro_f1": 0.7844193419282306,
35
+ "macro_roc_auc_ovr": 0.9756039083537456
36
+ },
37
+ {
38
+ "seed": 17,
39
+ "test_n_classes": 10,
40
+ "accuracy": 0.9055555555555556,
41
+ "macro_f1": 0.7793567708150484,
42
+ "macro_roc_auc_ovr": 0.9725864270053698
43
+ },
44
+ {
45
+ "seed": 23,
46
+ "test_n_classes": 10,
47
+ "accuracy": 0.9011111111111111,
48
+ "macro_f1": 0.7669056364325609,
49
+ "macro_roc_auc_ovr": 0.9731577510354572
50
+ },
51
+ {
52
+ "seed": 31,
53
+ "test_n_classes": 10,
54
+ "accuracy": 0.9055555555555556,
55
+ "macro_f1": 0.7825811291140096,
56
+ "macro_roc_auc_ovr": 0.9757878099386051
57
+ },
58
+ {
59
+ "seed": 45,
60
+ "test_n_classes": 10,
61
+ "accuracy": 0.9211111111111111,
62
+ "macro_f1": 0.8065645535880511,
63
+ "macro_roc_auc_ovr": 0.9754272516460774
64
+ },
65
+ {
66
+ "seed": 99,
67
+ "test_n_classes": 10,
68
+ "accuracy": 0.8822222222222222,
69
+ "macro_f1": 0.7589855352578547,
70
+ "macro_roc_auc_ovr": 0.9722896806606615
71
+ },
72
+ {
73
+ "seed": 123,
74
+ "test_n_classes": 10,
75
+ "accuracy": 0.9088888888888889,
76
+ "macro_f1": 0.7938334664931561,
77
+ "macro_roc_auc_ovr": 0.9790976919379577
78
+ },
79
+ {
80
+ "seed": 200,
81
+ "test_n_classes": 10,
82
+ "accuracy": 0.8977777777777778,
83
+ "macro_f1": 0.7938099428748325,
84
+ "macro_roc_auc_ovr": 0.9734976569094487
85
+ }
86
+ ],
87
+ "aggregate": {
88
+ "accuracy_mean": 0.9046666666666667,
89
+ "accuracy_std": 0.010337514088544894,
90
+ "accuracy_min": 0.8822222222222222,
91
+ "accuracy_max": 0.9211111111111111,
92
+ "macro_f1_mean": 0.7840429467473169,
93
+ "macro_f1_std": 0.013493004664905476,
94
+ "roc_auc_mean": 0.9752820192607189,
95
+ "roc_auc_std": 0.0023415667609269276
96
+ },
97
+ "published_artifact_seed": 42
98
+ }
validation_results.json ADDED
@@ -0,0 +1,378 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0.0",
3
+ "dataset": "xpertsystems/cyb003-sample",
4
+ "task": "10-class execution_phase classification",
5
+ "baselines": {
6
+ "always_predict_majority_accuracy": 0.13666666666666666,
7
+ "majority_class": "initial_drop",
8
+ "random_guess_accuracy": 0.1
9
+ },
10
+ "split": {
11
+ "strategy": "group_aware (GroupShuffleSplit by sample_id, nested)",
12
+ "rationale": "100 unique malware samples generate 6,000 timesteps (60 per sample). Random row-split would leak per-sample correlations into the test fold. Group-aware split keeps train/val/test samples disjoint.",
13
+ "samples_train": 69,
14
+ "samples_val": 16,
15
+ "samples_test": 15,
16
+ "timesteps_train": 4140,
17
+ "timesteps_val": 960,
18
+ "timesteps_test": 900,
19
+ "seed": 42
20
+ },
21
+ "n_features": 69,
22
+ "label_classes": [
23
+ "c2_communication",
24
+ "data_exfiltration",
25
+ "dormancy_dwell",
26
+ "initial_drop",
27
+ "lateral_movement",
28
+ "payload_execution",
29
+ "persistence_establishment",
30
+ "privilege_escalation",
31
+ "sandbox_evasion_stall",
32
+ "self_destruct_cleanup"
33
+ ],
34
+ "class_distribution_train": {
35
+ "lateral_movement": 550,
36
+ "initial_drop": 549,
37
+ "data_exfiltration": 543,
38
+ "persistence_establishment": 541,
39
+ "c2_communication": 492,
40
+ "privilege_escalation": 489,
41
+ "payload_execution": 487,
42
+ "dormancy_dwell": 168,
43
+ "sandbox_evasion_stall": 166,
44
+ "self_destruct_cleanup": 155
45
+ },
46
+ "class_distribution_test": {
47
+ "initial_drop": 123,
48
+ "persistence_establishment": 122,
49
+ "lateral_movement": 121,
50
+ "data_exfiltration": 113,
51
+ "c2_communication": 108,
52
+ "privilege_escalation": 107,
53
+ "payload_execution": 106,
54
+ "dormancy_dwell": 40,
55
+ "sandbox_evasion_stall": 32,
56
+ "self_destruct_cleanup": 28
57
+ },
58
+ "models": {
59
+ "xgboost": {
60
+ "architecture": "Gradient-boosted decision trees, multi:softprob, 10 classes",
61
+ "framework": "xgboost",
62
+ "test_metrics": {
63
+ "model": "xgboost",
64
+ "accuracy": 0.9177777777777778,
65
+ "macro_f1": 0.7780699645112974,
66
+ "weighted_f1": 0.9064879129227142,
67
+ "per_class_f1": {
68
+ "c2_communication": 1.0,
69
+ "data_exfiltration": 0.9699570815450643,
70
+ "dormancy_dwell": 0.5301204819277109,
71
+ "initial_drop": 0.9453125,
72
+ "lateral_movement": 0.9917355371900827,
73
+ "payload_execution": 0.963302752293578,
74
+ "persistence_establishment": 0.9918032786885246,
75
+ "privilege_escalation": 0.9907407407407407,
76
+ "sandbox_evasion_stall": 0.125,
77
+ "self_destruct_cleanup": 0.2727272727272727
78
+ },
79
+ "confusion_matrix": {
80
+ "labels": [
81
+ "c2_communication",
82
+ "data_exfiltration",
83
+ "dormancy_dwell",
84
+ "initial_drop",
85
+ "lateral_movement",
86
+ "payload_execution",
87
+ "persistence_establishment",
88
+ "privilege_escalation",
89
+ "sandbox_evasion_stall",
90
+ "self_destruct_cleanup"
91
+ ],
92
+ "matrix": [
93
+ [
94
+ 108,
95
+ 0,
96
+ 0,
97
+ 0,
98
+ 0,
99
+ 0,
100
+ 0,
101
+ 0,
102
+ 0,
103
+ 0
104
+ ],
105
+ [
106
+ 0,
107
+ 113,
108
+ 0,
109
+ 0,
110
+ 0,
111
+ 0,
112
+ 0,
113
+ 0,
114
+ 0,
115
+ 0
116
+ ],
117
+ [
118
+ 0,
119
+ 4,
120
+ 22,
121
+ 7,
122
+ 0,
123
+ 1,
124
+ 0,
125
+ 0,
126
+ 2,
127
+ 4
128
+ ],
129
+ [
130
+ 0,
131
+ 0,
132
+ 2,
133
+ 121,
134
+ 0,
135
+ 0,
136
+ 0,
137
+ 0,
138
+ 0,
139
+ 0
140
+ ],
141
+ [
142
+ 0,
143
+ 0,
144
+ 0,
145
+ 0,
146
+ 120,
147
+ 0,
148
+ 0,
149
+ 0,
150
+ 0,
151
+ 1
152
+ ],
153
+ [
154
+ 0,
155
+ 0,
156
+ 1,
157
+ 0,
158
+ 0,
159
+ 105,
160
+ 0,
161
+ 0,
162
+ 0,
163
+ 0
164
+ ],
165
+ [
166
+ 0,
167
+ 0,
168
+ 1,
169
+ 0,
170
+ 0,
171
+ 0,
172
+ 121,
173
+ 0,
174
+ 0,
175
+ 0
176
+ ],
177
+ [
178
+ 0,
179
+ 0,
180
+ 0,
181
+ 0,
182
+ 0,
183
+ 0,
184
+ 0,
185
+ 107,
186
+ 0,
187
+ 0
188
+ ],
189
+ [
190
+ 0,
191
+ 0,
192
+ 17,
193
+ 3,
194
+ 0,
195
+ 1,
196
+ 1,
197
+ 2,
198
+ 3,
199
+ 5
200
+ ],
201
+ [
202
+ 0,
203
+ 3,
204
+ 0,
205
+ 2,
206
+ 1,
207
+ 5,
208
+ 0,
209
+ 0,
210
+ 11,
211
+ 6
212
+ ]
213
+ ]
214
+ },
215
+ "macro_roc_auc_ovr": 0.979171667321058
216
+ }
217
+ },
218
+ "mlp": {
219
+ "architecture": "PyTorch MLP, 69 -> 128 -> 64 -> 10, BatchNorm1d + ReLU + Dropout, weighted cross-entropy loss",
220
+ "framework": "pytorch",
221
+ "test_metrics": {
222
+ "model": "mlp",
223
+ "accuracy": 0.8222222222222222,
224
+ "macro_f1": 0.7071652710164154,
225
+ "weighted_f1": 0.8217291149270296,
226
+ "per_class_f1": {
227
+ "c2_communication": 1.0,
228
+ "data_exfiltration": 0.9181818181818182,
229
+ "dormancy_dwell": 0.5194805194805194,
230
+ "initial_drop": 0.8854961832061069,
231
+ "lateral_movement": 0.9067796610169492,
232
+ "payload_execution": 0.6981132075471698,
233
+ "persistence_establishment": 0.8695652173913043,
234
+ "privilege_escalation": 0.9154228855721394,
235
+ "sandbox_evasion_stall": 0.07692307692307693,
236
+ "self_destruct_cleanup": 0.28169014084507044
237
+ },
238
+ "confusion_matrix": {
239
+ "labels": [
240
+ "c2_communication",
241
+ "data_exfiltration",
242
+ "dormancy_dwell",
243
+ "initial_drop",
244
+ "lateral_movement",
245
+ "payload_execution",
246
+ "persistence_establishment",
247
+ "privilege_escalation",
248
+ "sandbox_evasion_stall",
249
+ "self_destruct_cleanup"
250
+ ],
251
+ "matrix": [
252
+ [
253
+ 108,
254
+ 0,
255
+ 0,
256
+ 0,
257
+ 0,
258
+ 0,
259
+ 0,
260
+ 0,
261
+ 0,
262
+ 0
263
+ ],
264
+ [
265
+ 0,
266
+ 101,
267
+ 0,
268
+ 0,
269
+ 6,
270
+ 3,
271
+ 0,
272
+ 0,
273
+ 0,
274
+ 3
275
+ ],
276
+ [
277
+ 0,
278
+ 1,
279
+ 20,
280
+ 5,
281
+ 0,
282
+ 7,
283
+ 0,
284
+ 0,
285
+ 4,
286
+ 3
287
+ ],
288
+ [
289
+ 0,
290
+ 0,
291
+ 3,
292
+ 116,
293
+ 0,
294
+ 0,
295
+ 4,
296
+ 0,
297
+ 0,
298
+ 0
299
+ ],
300
+ [
301
+ 0,
302
+ 2,
303
+ 0,
304
+ 0,
305
+ 107,
306
+ 7,
307
+ 0,
308
+ 0,
309
+ 3,
310
+ 2
311
+ ],
312
+ [
313
+ 0,
314
+ 1,
315
+ 0,
316
+ 0,
317
+ 2,
318
+ 74,
319
+ 1,
320
+ 0,
321
+ 9,
322
+ 19
323
+ ],
324
+ [
325
+ 0,
326
+ 0,
327
+ 2,
328
+ 7,
329
+ 0,
330
+ 0,
331
+ 110,
332
+ 2,
333
+ 1,
334
+ 0
335
+ ],
336
+ [
337
+ 0,
338
+ 0,
339
+ 0,
340
+ 0,
341
+ 0,
342
+ 2,
343
+ 13,
344
+ 92,
345
+ 0,
346
+ 0
347
+ ],
348
+ [
349
+ 0,
350
+ 1,
351
+ 12,
352
+ 7,
353
+ 0,
354
+ 3,
355
+ 1,
356
+ 0,
357
+ 2,
358
+ 6
359
+ ],
360
+ [
361
+ 0,
362
+ 1,
363
+ 0,
364
+ 4,
365
+ 0,
366
+ 10,
367
+ 2,
368
+ 0,
369
+ 1,
370
+ 10
371
+ ]
372
+ ]
373
+ },
374
+ "macro_roc_auc_ovr": 0.9680976851704761
375
+ }
376
+ }
377
+ }
378
+ }