Initial release: XGBoost + MLP for malware execution phase classification
Browse files- README.md +438 -0
- ablation_results.json +804 -0
- feature_engineering.py +325 -0
- feature_meta.json +182 -0
- feature_scaler.json +1 -0
- inference_example.ipynb +314 -0
- model_mlp.safetensors +3 -0
- model_xgb.json +0 -0
- multi_seed_results.json +98 -0
- validation_results.json +378 -0
README.md
ADDED
|
@@ -0,0 +1,438 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-4.0
|
| 3 |
+
library_name: pytorch
|
| 4 |
+
tags:
|
| 5 |
+
- cybersecurity
|
| 6 |
+
- malware
|
| 7 |
+
- malware-behaviour
|
| 8 |
+
- sandbox-analysis
|
| 9 |
+
- edr
|
| 10 |
+
- tabular-classification
|
| 11 |
+
- synthetic-data
|
| 12 |
+
- xgboost
|
| 13 |
+
- baseline
|
| 14 |
+
pipeline_tag: tabular-classification
|
| 15 |
+
base_model: []
|
| 16 |
+
datasets:
|
| 17 |
+
- xpertsystems/cyb003-sample
|
| 18 |
+
metrics:
|
| 19 |
+
- accuracy
|
| 20 |
+
- f1
|
| 21 |
+
- roc_auc
|
| 22 |
+
model-index:
|
| 23 |
+
- name: cyb003-baseline-classifier
|
| 24 |
+
results:
|
| 25 |
+
- task:
|
| 26 |
+
type: tabular-classification
|
| 27 |
+
name: 10-class malware execution phase classification
|
| 28 |
+
dataset:
|
| 29 |
+
type: xpertsystems/cyb003-sample
|
| 30 |
+
name: CYB003 Synthetic Malware Behaviour & Classification Dataset (Sample)
|
| 31 |
+
metrics:
|
| 32 |
+
- type: roc_auc
|
| 33 |
+
value: 0.9792
|
| 34 |
+
name: Test macro ROC-AUC OvR (XGBoost, seed 42)
|
| 35 |
+
- type: accuracy
|
| 36 |
+
value: 0.9178
|
| 37 |
+
name: Test accuracy (XGBoost, seed 42)
|
| 38 |
+
- type: f1
|
| 39 |
+
value: 0.7781
|
| 40 |
+
name: Test macro-F1 (XGBoost, seed 42)
|
| 41 |
+
- type: accuracy
|
| 42 |
+
value: 0.905
|
| 43 |
+
name: Multi-seed accuracy mean ± 0.010 (XGBoost, 10 seeds)
|
| 44 |
+
- type: roc_auc
|
| 45 |
+
value: 0.975
|
| 46 |
+
name: Multi-seed ROC-AUC mean ± 0.002 (XGBoost, 10 seeds)
|
| 47 |
+
- type: roc_auc
|
| 48 |
+
value: 0.9681
|
| 49 |
+
name: Test macro ROC-AUC OvR (MLP, seed 42)
|
| 50 |
+
- type: accuracy
|
| 51 |
+
value: 0.8222
|
| 52 |
+
name: Test accuracy (MLP, seed 42)
|
| 53 |
+
- type: f1
|
| 54 |
+
value: 0.7072
|
| 55 |
+
name: Test macro-F1 (MLP, seed 42)
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
# CYB003 Baseline Classifier
|
| 59 |
+
|
| 60 |
+
**Malware execution-phase classifier trained on the CYB003 synthetic
|
| 61 |
+
malware behaviour sample. Predicts which of 10 execution phases a
|
| 62 |
+
per-timestep telemetry record belongs to, from observable behavioural
|
| 63 |
+
and PE-static features.**
|
| 64 |
+
|
| 65 |
+
> **Baseline reference, not for production use.** This model demonstrates
|
| 66 |
+
> that the [CYB003 sample dataset](https://huggingface.co/datasets/xpertsystems/cyb003-sample)
|
| 67 |
+
> is learnable end-to-end and gives prospective buyers a working starting
|
| 68 |
+
> point. It is not a production sandbox, EDR, or threat-detection system.
|
| 69 |
+
> See [Limitations](#limitations).
|
| 70 |
+
|
| 71 |
+
## Model overview
|
| 72 |
+
|
| 73 |
+
| Property | Value |
|
| 74 |
+
|---|---|
|
| 75 |
+
| Task | 10-class execution_phase classification |
|
| 76 |
+
| Training data | `xpertsystems/cyb003-sample` (6,000 timesteps across 100 malware samples) |
|
| 77 |
+
| Models | XGBoost + PyTorch MLP |
|
| 78 |
+
| Input features | 69 (after one-hot encoding) |
|
| 79 |
+
| Split | **Group-aware by sample_id** (disjoint train/val/test samples) |
|
| 80 |
+
| Validation | Single seed (artifact) + multi-seed aggregate across 10 seeds |
|
| 81 |
+
| License | CC-BY-NC-4.0 (matches dataset) |
|
| 82 |
+
| Status | Reference baseline |
|
| 83 |
+
|
| 84 |
+
## Why this task instead of malware family classification?
|
| 85 |
+
|
| 86 |
+
The CYB003 dataset README leads with "training malware family classifiers"
|
| 87 |
+
as a suggested use case. We piloted that target first and found it is
|
| 88 |
+
**not learnable from the sample dataset** under proper group-aware
|
| 89 |
+
evaluation: with only 100 unique samples spread across 10 families,
|
| 90 |
+
XGBoost on per-timestep features lands at ~15% accuracy and ROC-AUC ~0.58
|
| 91 |
+
— at majority baseline. Per-sample aggregation gives the same result.
|
| 92 |
+
|
| 93 |
+
This is a **sample-size constraint**, not a feature-engineering failure.
|
| 94 |
+
With ~7 samples per family on average, a held-out test set of 15 samples
|
| 95 |
+
covers at most ~8 families and yields a model that cannot generalize.
|
| 96 |
+
The full 280k-row CYB003 product, with ~28 samples per family at the
|
| 97 |
+
sample's distribution, will not have this constraint.
|
| 98 |
+
|
| 99 |
+
We pivoted to **execution_phase prediction**, which has 6,000 rows of
|
| 100 |
+
per-timestep data and learns cleanly: 91% accuracy, ROC-AUC 0.98, stable
|
| 101 |
+
across seeds. This is a legitimate SOC use case — dynamic-analysis tools
|
| 102 |
+
and EDR systems regularly need to tag what phase of execution observed
|
| 103 |
+
malware activity belongs to — and it shows the dataset is well-calibrated
|
| 104 |
+
even when the headline product use case needs more data.
|
| 105 |
+
|
| 106 |
+
Two model artifacts are published. They are designed to be used together — disagreement is a useful triage signal:
|
| 107 |
+
|
| 108 |
+
- `model_xgb.json` — gradient-boosted trees, primary recommendation
|
| 109 |
+
- `model_mlp.safetensors` — PyTorch MLP in SafeTensors format
|
| 110 |
+
|
| 111 |
+
## Quick start
|
| 112 |
+
|
| 113 |
+
```bash
|
| 114 |
+
pip install xgboost torch safetensors pandas huggingface_hub
|
| 115 |
+
```
|
| 116 |
+
|
| 117 |
+
```python
|
| 118 |
+
from huggingface_hub import hf_hub_download
|
| 119 |
+
import json, numpy as np, torch, xgboost as xgb
|
| 120 |
+
from safetensors.torch import load_file
|
| 121 |
+
|
| 122 |
+
REPO = "xpertsystems/cyb003-baseline-classifier"
|
| 123 |
+
|
| 124 |
+
paths = {n: hf_hub_download(REPO, n) for n in [
|
| 125 |
+
"model_xgb.json", "model_mlp.safetensors",
|
| 126 |
+
"feature_engineering.py", "feature_meta.json", "feature_scaler.json",
|
| 127 |
+
]}
|
| 128 |
+
|
| 129 |
+
import sys, os
|
| 130 |
+
sys.path.insert(0, os.path.dirname(paths["feature_engineering.py"]))
|
| 131 |
+
from feature_engineering import transform_single, load_meta, INT_TO_LABEL
|
| 132 |
+
|
| 133 |
+
meta = load_meta(paths["feature_meta.json"])
|
| 134 |
+
xgb_model = xgb.XGBClassifier(); xgb_model.load_model(paths["model_xgb.json"])
|
| 135 |
+
|
| 136 |
+
# Predict (see inference_example.ipynb for the full pattern)
|
| 137 |
+
X = transform_single(my_timestep_record, meta)
|
| 138 |
+
proba = xgb_model.predict_proba(X)[0]
|
| 139 |
+
print(INT_TO_LABEL[int(np.argmax(proba))])
|
| 140 |
+
```
|
| 141 |
+
|
| 142 |
+
See [`inference_example.ipynb`](./inference_example.ipynb) for the full
|
| 143 |
+
copy-paste demo.
|
| 144 |
+
|
| 145 |
+
## Training data
|
| 146 |
+
|
| 147 |
+
Trained on the public sample of CYB003, 6,000 per-timestep telemetry
|
| 148 |
+
rows from 100 malware samples (60 timesteps per sample):
|
| 149 |
+
|
| 150 |
+
| Phase | Total rows | Train share | Test rows (seed 42) |
|
| 151 |
+
|---|---:|---:|---:|
|
| 152 |
+
| `initial_drop` | 801 | 13.4% | 120 |
|
| 153 |
+
| `lateral_movement` | 799 | 13.3% | 120 |
|
| 154 |
+
| `persistence_establishment` | 787 | 13.1% | 119 |
|
| 155 |
+
| `data_exfiltration` | 783 | 13.1% | 100 |
|
| 156 |
+
| `c2_communication` | 709 | 11.8% | 87 |
|
| 157 |
+
| `privilege_escalation` | 705 | 11.8% | 107 |
|
| 158 |
+
| `payload_execution` | 705 | 11.8% | 109 |
|
| 159 |
+
| `dormancy_dwell` | 250 | 4.2% | 83 |
|
| 160 |
+
| `sandbox_evasion_stall` | 234 | 3.9% | 32 |
|
| 161 |
+
| `self_destruct_cleanup` | 227 | 3.8% | 23 |
|
| 162 |
+
|
| 163 |
+
### Group-aware split
|
| 164 |
+
|
| 165 |
+
A single malware sample generates 60 highly-correlated timesteps. Random
|
| 166 |
+
row-level splitting would put timesteps from the same sample in both
|
| 167 |
+
train and test, inflating metrics in a way that does not generalize to
|
| 168 |
+
new samples.
|
| 169 |
+
|
| 170 |
+
This release uses **GroupShuffleSplit by `sample_id`** (nested, 70/15/15):
|
| 171 |
+
|
| 172 |
+
| Fold | Samples | Timesteps |
|
| 173 |
+
|---|---:|---:|
|
| 174 |
+
| Train | 69 | 4,140 |
|
| 175 |
+
| Validation | 16 | 960 |
|
| 176 |
+
| Test | 15 | 900 |
|
| 177 |
+
|
| 178 |
+
All test samples are completely unseen during training. Class imbalance
|
| 179 |
+
is addressed with `class_weight='balanced'` (XGBoost `sample_weight`) and
|
| 180 |
+
weighted cross-entropy (MLP).
|
| 181 |
+
|
| 182 |
+
## Feature pipeline
|
| 183 |
+
|
| 184 |
+
The bundled `feature_engineering.py` is the canonical feature recipe.
|
| 185 |
+
69 features survive after encoding, drawn from:
|
| 186 |
+
|
| 187 |
+
- **Per-timestep numeric** (10): `timestep`, `api_call_rate`, `registry_write_count`, `network_connection_count`, `process_injection_flag`, `c2_beacon_interval_sec`, `av_signature_hit_flag`, `sandbox_evasion_flag`, `lateral_propagation_count`, `privilege_escalation_flag`
|
| 188 |
+
- **PE static features** (11): `pe_entropy_mean`, `pe_entropy_std`, `import_hash_cluster`, `section_count`, `packed_section_ratio`, `string_entropy_mean`, `byte_histogram_chi2`, `code_section_rx_ratio`, `resource_section_entropy`, `suspicious_import_count`, `packer_detected_flag`
|
| 189 |
+
- **Categorical** (6, one-hot encoded): `malware_family`, `threat_actor_tier`, `target_platform`, `obfuscation_technique`, `detection_outcome`, `ep_stack`
|
| 190 |
+
- **Engineered** (6): `api_burst_score`, `is_c2_active`, `is_high_net_volume`, `is_stealth_step`, `is_destructive_step`, `lateral_activity_score`
|
| 191 |
+
|
| 192 |
+
### Leakage audit
|
| 193 |
+
|
| 194 |
+
No categorical feature has phase->phase purity above 0.17 (uniform
|
| 195 |
+
random baseline is 0.10), so nothing in the dataset is an oracle for
|
| 196 |
+
the target. The model relies on a mix of `timestep` (strong but not
|
| 197 |
+
deterministic) and behavioural features.
|
| 198 |
+
|
| 199 |
+
## Evaluation
|
| 200 |
+
|
| 201 |
+
### Test-set metrics, seed 42 (n = 900 timesteps from 15 disjoint samples)
|
| 202 |
+
|
| 203 |
+
**XGBoost** (the published `model_xgb.json` artifact)
|
| 204 |
+
|
| 205 |
+
| Metric | Value |
|
| 206 |
+
|---|---:|
|
| 207 |
+
| Macro ROC-AUC (OvR) | **0.9792** |
|
| 208 |
+
| Accuracy | **0.9178** |
|
| 209 |
+
| Macro-F1 | 0.7781 |
|
| 210 |
+
| Weighted-F1 | 0.9173 |
|
| 211 |
+
|
| 212 |
+
**MLP** (the published `model_mlp.safetensors` artifact)
|
| 213 |
+
|
| 214 |
+
| Metric | Value |
|
| 215 |
+
|---|---:|
|
| 216 |
+
| Macro ROC-AUC (OvR) | 0.9681 |
|
| 217 |
+
| Accuracy | 0.8222 |
|
| 218 |
+
| Macro-F1 | 0.7072 |
|
| 219 |
+
| Weighted-F1 | 0.8278 |
|
| 220 |
+
|
| 221 |
+
### Multi-seed robustness (XGBoost, 10 seeds)
|
| 222 |
+
|
| 223 |
+
Accuracy and ROC-AUC are tight across seeds — the task is genuinely
|
| 224 |
+
learnable, not seed-lucky:
|
| 225 |
+
|
| 226 |
+
| Metric | Mean | Std | Min | Max |
|
| 227 |
+
|---|---:|---:|---:|---:|
|
| 228 |
+
| Accuracy | 0.905 | 0.010 | 0.882 | 0.921 |
|
| 229 |
+
| Macro-F1 | 0.784 | 0.013 | 0.759 | 0.807 |
|
| 230 |
+
| Macro ROC-AUC OvR | 0.975 | 0.002 | 0.972 | 0.979 |
|
| 231 |
+
|
| 232 |
+
Full per-seed results in [`multi_seed_results.json`](./multi_seed_results.json).
|
| 233 |
+
All 10 seeds yielded all 10 classes in the test fold, supporting clean
|
| 234 |
+
multi-class ROC-AUC computation.
|
| 235 |
+
|
| 236 |
+
### Per-class F1 (seed 42) — where the signal is and isn't
|
| 237 |
+
|
| 238 |
+
| Phase | XGBoost F1 | MLP F1 | Note |
|
| 239 |
+
|---|---:|---:|---|
|
| 240 |
+
| `c2_communication` | **1.000** | 1.000 | Trivial: tight timestep window 52-59 + c2_beacon signal |
|
| 241 |
+
| `persistence_establishment` | **0.992** | 0.870 | Tight timestep window 9-17 + registry writes |
|
| 242 |
+
| `lateral_movement` | **0.992** | 0.907 | Tight timestep window 26-34 + lateral_propagation |
|
| 243 |
+
| `privilege_escalation` | **0.991** | 0.915 | Tight timestep window 18-25 + privilege flag |
|
| 244 |
+
| `data_exfiltration` | **0.970** | 0.918 | Tight timestep window 43-51 + network volume |
|
| 245 |
+
| `payload_execution` | **0.963** | 0.698 | Tight timestep window 35-42 + API bursts |
|
| 246 |
+
| `initial_drop` | **0.945** | 0.886 | Tight timestep window 0-8 |
|
| 247 |
+
| `dormancy_dwell` | 0.530 | 0.520 | Hard: spans full 0-59 timestep range |
|
| 248 |
+
| `self_destruct_cleanup` | 0.273 | 0.282 | Hard: spans full 0-59, low row count (227) |
|
| 249 |
+
| `sandbox_evasion_stall` | 0.125 | 0.077 | Hard: spans full 0-59, low row count (234) |
|
| 250 |
+
|
| 251 |
+
Seven phases are near-trivially classified because they sit in tight
|
| 252 |
+
timestep windows with characteristic behavioural signatures. **Three
|
| 253 |
+
phases — `dormancy_dwell`, `sandbox_evasion_stall`, `self_destruct_cleanup`
|
| 254 |
+
— scatter across the full 0–59 timestep range** and lack distinctive
|
| 255 |
+
behavioural features (idle/evasion phases have low activity by design),
|
| 256 |
+
so a flat-tabular event-level model can't reliably disambiguate them.
|
| 257 |
+
Sequence models that consider neighbouring timesteps would help here.
|
| 258 |
+
|
| 259 |
+
### Ablation: which feature groups matter
|
| 260 |
+
|
| 261 |
+
| Configuration | Accuracy | Macro-F1 | ROC-AUC | Δ accuracy |
|
| 262 |
+
|---|---:|---:|---:|---:|
|
| 263 |
+
| Full feature set (published) | 0.9178 | 0.7781 | 0.9792 | — |
|
| 264 |
+
| No `timestep` | 0.6933 | 0.5963 | 0.9264 | **−0.2244** |
|
| 265 |
+
| No behavioural features | 0.9089 | 0.7579 | 0.9705 | −0.0089 |
|
| 266 |
+
| No PE static features | 0.9167 | 0.7808 | 0.9786 | −0.0011 |
|
| 267 |
+
| No engineered features | 0.9200 | 0.7931 | 0.9797 | +0.0022 |
|
| 268 |
+
|
| 269 |
+
Three clear findings:
|
| 270 |
+
|
| 271 |
+
1. **`timestep` is by far the dominant feature** (drops 22 pp when removed,
|
| 272 |
+
ROC-AUC still 0.93). Malware execution progresses in time, and where
|
| 273 |
+
you are in that timeline carries most of the phase signal.
|
| 274 |
+
2. **PE static features are barely used for phase prediction.** This is
|
| 275 |
+
honest: PE features (entropy, packed sections, import hashes) inform
|
| 276 |
+
family classification, not phase classification. A buyer doing family
|
| 277 |
+
work should expect to use them; for phase work they can be dropped.
|
| 278 |
+
3. **Engineered features and behavioural features each contribute ~1 pp.**
|
| 279 |
+
Trees recover most of the engineered features on their own.
|
| 280 |
+
|
| 281 |
+
### Architecture
|
| 282 |
+
|
| 283 |
+
**XGBoost:** multi-class gradient boosting (`multi:softprob`, 10 classes),
|
| 284 |
+
`hist` tree method, class-balanced sample weights, early stopping on
|
| 285 |
+
validation mlogloss.
|
| 286 |
+
|
| 287 |
+
**MLP:** `69 → 128 → 64 → 10`, each hidden layer followed by `BatchNorm1d`
|
| 288 |
+
→ `ReLU` → `Dropout(0.3)`, weighted cross-entropy loss, AdamW optimizer,
|
| 289 |
+
early stopping on validation macro-F1.
|
| 290 |
+
|
| 291 |
+
Training hyperparameters (learning rate, batch size, n_estimators,
|
| 292 |
+
early-stopping patience, weight decay, class-weighting strategy) are
|
| 293 |
+
held internally by XpertSystems and are not part of this release.
|
| 294 |
+
|
| 295 |
+
## Limitations
|
| 296 |
+
|
| 297 |
+
**This is a baseline reference, not a production sandbox or threat detector.**
|
| 298 |
+
|
| 299 |
+
1. **Three phases are genuinely hard at sample size.** `dormancy_dwell`,
|
| 300 |
+
`sandbox_evasion_stall`, and `self_destruct_cleanup` span the full
|
| 301 |
+
0–59 timestep range and have low row counts. Per-class F1 = 0.13–0.53.
|
| 302 |
+
These are the phases by design lacking distinctive moment-to-moment
|
| 303 |
+
features (the malware is being quiet to evade detection). Sequence
|
| 304 |
+
models or per-sample aggregation would substantially improve these.
|
| 305 |
+
|
| 306 |
+
2. **The pivot away from malware family classification is dataset-limited,
|
| 307 |
+
not method-limited.** Family classification on 100 samples with 10
|
| 308 |
+
classes is at majority baseline. The full 280k-row CYB003 product
|
| 309 |
+
provides ~5,600 samples and supports proper family classification.
|
| 310 |
+
|
| 311 |
+
3. **Synthetic-vs-real transfer.** The dataset is synthetic and calibrated
|
| 312 |
+
to threat-intelligence and AV-testing benchmark targets (VirusTotal,
|
| 313 |
+
AV-TEST, MITRE ATT&CK Evaluations, Mandiant M-Trends, CrowdStrike GTR,
|
| 314 |
+
Verizon DBIR). Real malware telemetry has different noise
|
| 315 |
+
characteristics, adversary adaptation, and instrumentation gaps. Do
|
| 316 |
+
not assume metrics transfer.
|
| 317 |
+
|
| 318 |
+
4. **Adversarial robustness not evaluated.** The dataset is not
|
| 319 |
+
adversarially generated; the model has not been red-teamed against
|
| 320 |
+
evasive samples.
|
| 321 |
+
|
| 322 |
+
5. **MLP brittleness on OOD inputs.** With ~4k training timesteps, the
|
| 323 |
+
MLP can produce confidently-wrong predictions on hand-crafted records
|
| 324 |
+
far from the training manifold. XGBoost is more robust. Use both;
|
| 325 |
+
treat disagreement as a signal for human review.
|
| 326 |
+
|
| 327 |
+
6. **`timestep` dominance is a property of the dataset.** Real malware
|
| 328 |
+
in production doesn't have a clean "timestep" feature on a per-sample
|
| 329 |
+
60-step normalized timeline — that's a simulator artifact. A buyer
|
| 330 |
+
transferring this baseline to real sandbox traces would need to
|
| 331 |
+
recover an equivalent temporal-position feature from execution-trace
|
| 332 |
+
timestamps relative to detonation.
|
| 333 |
+
|
| 334 |
+
## Notes on dataset schema
|
| 335 |
+
|
| 336 |
+
The CYB003 sample dataset README describes some fields differently from
|
| 337 |
+
the actual schema. The model was trained on the actual schema; this note
|
| 338 |
+
helps buyers reconcile what they read with what they receive.
|
| 339 |
+
|
| 340 |
+
| What the README says | What the data actually contains |
|
| 341 |
+
|---|---|
|
| 342 |
+
| `pe_entropy` (one column) | `pe_entropy_mean` + `pe_entropy_std` (two columns) |
|
| 343 |
+
| `process_injection_count` | `process_injection_flag` (binary, not a count) |
|
| 344 |
+
| `c2_beacon_active` | `c2_beacon_interval_sec` (seconds, 0 when inactive) |
|
| 345 |
+
| `av_detected`, `edr_detected`, `sandbox_evaded`, `dwell_time_hours`, `persistence_mechanism`, `lotl_technique_used` (per-timestep) | None of these exist on per-timestep; equivalents (`av_signature_hit_flag`, `sandbox_evasion_flag`) do exist with different names |
|
| 346 |
+
| `ep_stack`: 3 values (`legacy_av`, `ngav_ml_based`, `edr_full`) | `ep_stack`: 8 values (`legacy_av_only`, `ngav_ml_based`, `edr_endpoint_detect`, `av_plus_firewall`, `xdr_extended_detect`, `managed_detection_response`, `deception_honeypot`, `no_protection`) |
|
| 347 |
+
| 9 malware families listed | 10 families in the data (`apt_implant` is the additional one) |
|
| 348 |
+
| `coordinated_campaign_flag` (described as a flag) | Constant = 1 for all rows in the sample (uninformative) |
|
| 349 |
+
|
| 350 |
+
The actual per-timestep table also contains rich PE-static features not
|
| 351 |
+
listed in the README: `import_hash_cluster`, `section_count`,
|
| 352 |
+
`packed_section_ratio`, `string_entropy_mean`, `byte_histogram_chi2`,
|
| 353 |
+
`code_section_rx_ratio`, `resource_section_entropy`,
|
| 354 |
+
`suspicious_import_count`. These are excellent features for family
|
| 355 |
+
classification work and are documented in the model's
|
| 356 |
+
`feature_engineering.py`.
|
| 357 |
+
|
| 358 |
+
None of these discrepancies affects model correctness — the feature
|
| 359 |
+
pipeline uses the actual column names. If you build your own pipeline
|
| 360 |
+
against the dataset, use the actual columns, not the README descriptions.
|
| 361 |
+
|
| 362 |
+
## Intended use
|
| 363 |
+
|
| 364 |
+
- **Evaluating fit** of the CYB003 dataset for your malware-analysis
|
| 365 |
+
or sandbox-detection research
|
| 366 |
+
- **Baseline reference** for new model architectures (especially sequence
|
| 367 |
+
models, which should beat this baseline on the late/scattered phases)
|
| 368 |
+
- **Teaching and demo** for tabular classification on malware telemetry
|
| 369 |
+
- **Feature engineering reference** for per-timestep behavioural data
|
| 370 |
+
|
| 371 |
+
## Out-of-scope use
|
| 372 |
+
|
| 373 |
+
- Production sandbox analysis on real malware
|
| 374 |
+
- EDR phase tagging on real systems
|
| 375 |
+
- Family attribution (this baseline does not address that task; see why above)
|
| 376 |
+
- Adversarial-evasion evaluation (dataset not adversarially generated)
|
| 377 |
+
- Any operational security decision
|
| 378 |
+
|
| 379 |
+
## Reproducibility
|
| 380 |
+
|
| 381 |
+
Outputs above were produced with `seed = 42` (published artifact),
|
| 382 |
+
group-aware nested `GroupShuffleSplit` (70/15/15 by sample_id), on the
|
| 383 |
+
published sample (`xpertsystems/cyb003-sample`, version 1.0.0, generated
|
| 384 |
+
2026-05-16). The feature pipeline in `feature_engineering.py` is
|
| 385 |
+
deterministic and the trained weights in this repo correspond exactly
|
| 386 |
+
to the metrics above.
|
| 387 |
+
|
| 388 |
+
Multi-seed results (seeds 42, 7, 13, 17, 23, 31, 45, 99, 123, 200) in
|
| 389 |
+
`multi_seed_results.json` confirm robust performance across splits.
|
| 390 |
+
|
| 391 |
+
The training script itself is private to XpertSystems. The published
|
| 392 |
+
artifacts contain the feature pipeline, model weights, scaler, metadata,
|
| 393 |
+
and validation results — sufficient to reproduce inference but not
|
| 394 |
+
training.
|
| 395 |
+
|
| 396 |
+
## Files in this repo
|
| 397 |
+
|
| 398 |
+
| File | Purpose |
|
| 399 |
+
|---|---|
|
| 400 |
+
| `model_xgb.json` | XGBoost weights (seed 42) |
|
| 401 |
+
| `model_mlp.safetensors` | PyTorch MLP weights (seed 42) |
|
| 402 |
+
| `feature_engineering.py` | Feature pipeline (load → engineer → encode) |
|
| 403 |
+
| `feature_meta.json` | Feature column order + categorical levels |
|
| 404 |
+
| `feature_scaler.json` | MLP input mean/std (XGBoost ignores) |
|
| 405 |
+
| `validation_results.json` | Per-class metrics, confusion matrix, architecture |
|
| 406 |
+
| `ablation_results.json` | Per-feature-group ablation (timestep, behavioural, PE static, engineered) |
|
| 407 |
+
| `multi_seed_results.json` | XGBoost metrics across 10 seeds with aggregate statistics |
|
| 408 |
+
| `inference_example.ipynb` | End-to-end inference demo notebook |
|
| 409 |
+
| `README.md` | This file |
|
| 410 |
+
|
| 411 |
+
## Contact and full product
|
| 412 |
+
|
| 413 |
+
The full **CYB003** dataset contains ~349,000 rows across four files,
|
| 414 |
+
with calibrated benchmark validation against 12 metrics drawn from
|
| 415 |
+
authoritative threat intelligence and AV-testing sources (VirusTotal,
|
| 416 |
+
AV-TEST, MITRE ATT&CK Evaluations, Mandiant, CrowdStrike, Verizon).
|
| 417 |
+
The full XpertSystems.ai synthetic data catalogue spans 41 SKUs across
|
| 418 |
+
Cybersecurity, Healthcare, Insurance & Risk, Oil & Gas, and Materials
|
| 419 |
+
& Energy.
|
| 420 |
+
|
| 421 |
+
- 📧 **pradeep@xpertsystems.ai**
|
| 422 |
+
- 🌐 **https://xpertsystems.ai**
|
| 423 |
+
- 🗂 Dataset: https://huggingface.co/datasets/xpertsystems/cyb003-sample
|
| 424 |
+
- 🤖 Companion models:
|
| 425 |
+
- https://huggingface.co/xpertsystems/cyb001-baseline-classifier (network traffic)
|
| 426 |
+
- https://huggingface.co/xpertsystems/cyb002-baseline-classifier (ATT&CK kill-chain)
|
| 427 |
+
|
| 428 |
+
## Citation
|
| 429 |
+
|
| 430 |
+
```bibtex
|
| 431 |
+
@misc{xpertsystems_cyb003_baseline_2026,
|
| 432 |
+
title = {CYB003 Baseline Classifier: XGBoost and MLP for Malware Execution Phase Classification},
|
| 433 |
+
author = {XpertSystems.ai},
|
| 434 |
+
year = {2026},
|
| 435 |
+
url = {https://huggingface.co/xpertsystems/cyb003-baseline-classifier},
|
| 436 |
+
note = {Baseline reference model trained on xpertsystems/cyb003-sample}
|
| 437 |
+
}
|
| 438 |
+
```
|
ablation_results.json
ADDED
|
@@ -0,0 +1,804 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"purpose": "Quantify how much each feature group contributes to the headline XGBoost score. Identical architecture, same group-aware split, with one feature group dropped at a time.",
|
| 3 |
+
"full_model_metrics": {
|
| 4 |
+
"model": "xgboost",
|
| 5 |
+
"accuracy": 0.9177777777777778,
|
| 6 |
+
"macro_f1": 0.7780699645112974,
|
| 7 |
+
"weighted_f1": 0.9064879129227142,
|
| 8 |
+
"per_class_f1": {
|
| 9 |
+
"c2_communication": 1.0,
|
| 10 |
+
"data_exfiltration": 0.9699570815450643,
|
| 11 |
+
"dormancy_dwell": 0.5301204819277109,
|
| 12 |
+
"initial_drop": 0.9453125,
|
| 13 |
+
"lateral_movement": 0.9917355371900827,
|
| 14 |
+
"payload_execution": 0.963302752293578,
|
| 15 |
+
"persistence_establishment": 0.9918032786885246,
|
| 16 |
+
"privilege_escalation": 0.9907407407407407,
|
| 17 |
+
"sandbox_evasion_stall": 0.125,
|
| 18 |
+
"self_destruct_cleanup": 0.2727272727272727
|
| 19 |
+
},
|
| 20 |
+
"confusion_matrix": {
|
| 21 |
+
"labels": [
|
| 22 |
+
"c2_communication",
|
| 23 |
+
"data_exfiltration",
|
| 24 |
+
"dormancy_dwell",
|
| 25 |
+
"initial_drop",
|
| 26 |
+
"lateral_movement",
|
| 27 |
+
"payload_execution",
|
| 28 |
+
"persistence_establishment",
|
| 29 |
+
"privilege_escalation",
|
| 30 |
+
"sandbox_evasion_stall",
|
| 31 |
+
"self_destruct_cleanup"
|
| 32 |
+
],
|
| 33 |
+
"matrix": [
|
| 34 |
+
[
|
| 35 |
+
108,
|
| 36 |
+
0,
|
| 37 |
+
0,
|
| 38 |
+
0,
|
| 39 |
+
0,
|
| 40 |
+
0,
|
| 41 |
+
0,
|
| 42 |
+
0,
|
| 43 |
+
0,
|
| 44 |
+
0
|
| 45 |
+
],
|
| 46 |
+
[
|
| 47 |
+
0,
|
| 48 |
+
113,
|
| 49 |
+
0,
|
| 50 |
+
0,
|
| 51 |
+
0,
|
| 52 |
+
0,
|
| 53 |
+
0,
|
| 54 |
+
0,
|
| 55 |
+
0,
|
| 56 |
+
0
|
| 57 |
+
],
|
| 58 |
+
[
|
| 59 |
+
0,
|
| 60 |
+
4,
|
| 61 |
+
22,
|
| 62 |
+
7,
|
| 63 |
+
0,
|
| 64 |
+
1,
|
| 65 |
+
0,
|
| 66 |
+
0,
|
| 67 |
+
2,
|
| 68 |
+
4
|
| 69 |
+
],
|
| 70 |
+
[
|
| 71 |
+
0,
|
| 72 |
+
0,
|
| 73 |
+
2,
|
| 74 |
+
121,
|
| 75 |
+
0,
|
| 76 |
+
0,
|
| 77 |
+
0,
|
| 78 |
+
0,
|
| 79 |
+
0,
|
| 80 |
+
0
|
| 81 |
+
],
|
| 82 |
+
[
|
| 83 |
+
0,
|
| 84 |
+
0,
|
| 85 |
+
0,
|
| 86 |
+
0,
|
| 87 |
+
120,
|
| 88 |
+
0,
|
| 89 |
+
0,
|
| 90 |
+
0,
|
| 91 |
+
0,
|
| 92 |
+
1
|
| 93 |
+
],
|
| 94 |
+
[
|
| 95 |
+
0,
|
| 96 |
+
0,
|
| 97 |
+
1,
|
| 98 |
+
0,
|
| 99 |
+
0,
|
| 100 |
+
105,
|
| 101 |
+
0,
|
| 102 |
+
0,
|
| 103 |
+
0,
|
| 104 |
+
0
|
| 105 |
+
],
|
| 106 |
+
[
|
| 107 |
+
0,
|
| 108 |
+
0,
|
| 109 |
+
1,
|
| 110 |
+
0,
|
| 111 |
+
0,
|
| 112 |
+
0,
|
| 113 |
+
121,
|
| 114 |
+
0,
|
| 115 |
+
0,
|
| 116 |
+
0
|
| 117 |
+
],
|
| 118 |
+
[
|
| 119 |
+
0,
|
| 120 |
+
0,
|
| 121 |
+
0,
|
| 122 |
+
0,
|
| 123 |
+
0,
|
| 124 |
+
0,
|
| 125 |
+
0,
|
| 126 |
+
107,
|
| 127 |
+
0,
|
| 128 |
+
0
|
| 129 |
+
],
|
| 130 |
+
[
|
| 131 |
+
0,
|
| 132 |
+
0,
|
| 133 |
+
17,
|
| 134 |
+
3,
|
| 135 |
+
0,
|
| 136 |
+
1,
|
| 137 |
+
1,
|
| 138 |
+
2,
|
| 139 |
+
3,
|
| 140 |
+
5
|
| 141 |
+
],
|
| 142 |
+
[
|
| 143 |
+
0,
|
| 144 |
+
3,
|
| 145 |
+
0,
|
| 146 |
+
2,
|
| 147 |
+
1,
|
| 148 |
+
5,
|
| 149 |
+
0,
|
| 150 |
+
0,
|
| 151 |
+
11,
|
| 152 |
+
6
|
| 153 |
+
]
|
| 154 |
+
]
|
| 155 |
+
},
|
| 156 |
+
"macro_roc_auc_ovr": 0.979171667321058
|
| 157 |
+
},
|
| 158 |
+
"ablations": {
|
| 159 |
+
"no_pe_static": {
|
| 160 |
+
"n_features": 58,
|
| 161 |
+
"dropped_count": 11,
|
| 162 |
+
"metrics": {
|
| 163 |
+
"model": "xgboost_no_pe_static",
|
| 164 |
+
"accuracy": 0.9166666666666666,
|
| 165 |
+
"macro_f1": 0.7808429949060417,
|
| 166 |
+
"weighted_f1": 0.9063054516980296,
|
| 167 |
+
"per_class_f1": {
|
| 168 |
+
"c2_communication": 1.0,
|
| 169 |
+
"data_exfiltration": 0.9783549783549783,
|
| 170 |
+
"dormancy_dwell": 0.4675324675324675,
|
| 171 |
+
"initial_drop": 0.9494163424124513,
|
| 172 |
+
"lateral_movement": 0.995850622406639,
|
| 173 |
+
"payload_execution": 0.963302752293578,
|
| 174 |
+
"persistence_establishment": 0.9836065573770492,
|
| 175 |
+
"privilege_escalation": 0.9771689497716894,
|
| 176 |
+
"sandbox_evasion_stall": 0.16666666666666666,
|
| 177 |
+
"self_destruct_cleanup": 0.32653061224489793
|
| 178 |
+
},
|
| 179 |
+
"confusion_matrix": {
|
| 180 |
+
"labels": [
|
| 181 |
+
"c2_communication",
|
| 182 |
+
"data_exfiltration",
|
| 183 |
+
"dormancy_dwell",
|
| 184 |
+
"initial_drop",
|
| 185 |
+
"lateral_movement",
|
| 186 |
+
"payload_execution",
|
| 187 |
+
"persistence_establishment",
|
| 188 |
+
"privilege_escalation",
|
| 189 |
+
"sandbox_evasion_stall",
|
| 190 |
+
"self_destruct_cleanup"
|
| 191 |
+
],
|
| 192 |
+
"matrix": [
|
| 193 |
+
[
|
| 194 |
+
108,
|
| 195 |
+
0,
|
| 196 |
+
0,
|
| 197 |
+
0,
|
| 198 |
+
0,
|
| 199 |
+
0,
|
| 200 |
+
0,
|
| 201 |
+
0,
|
| 202 |
+
0,
|
| 203 |
+
0
|
| 204 |
+
],
|
| 205 |
+
[
|
| 206 |
+
0,
|
| 207 |
+
113,
|
| 208 |
+
0,
|
| 209 |
+
0,
|
| 210 |
+
0,
|
| 211 |
+
0,
|
| 212 |
+
0,
|
| 213 |
+
0,
|
| 214 |
+
0,
|
| 215 |
+
0
|
| 216 |
+
],
|
| 217 |
+
[
|
| 218 |
+
0,
|
| 219 |
+
3,
|
| 220 |
+
18,
|
| 221 |
+
7,
|
| 222 |
+
0,
|
| 223 |
+
1,
|
| 224 |
+
0,
|
| 225 |
+
0,
|
| 226 |
+
6,
|
| 227 |
+
5
|
| 228 |
+
],
|
| 229 |
+
[
|
| 230 |
+
0,
|
| 231 |
+
0,
|
| 232 |
+
1,
|
| 233 |
+
122,
|
| 234 |
+
0,
|
| 235 |
+
0,
|
| 236 |
+
0,
|
| 237 |
+
0,
|
| 238 |
+
0,
|
| 239 |
+
0
|
| 240 |
+
],
|
| 241 |
+
[
|
| 242 |
+
0,
|
| 243 |
+
0,
|
| 244 |
+
0,
|
| 245 |
+
0,
|
| 246 |
+
120,
|
| 247 |
+
0,
|
| 248 |
+
0,
|
| 249 |
+
0,
|
| 250 |
+
0,
|
| 251 |
+
1
|
| 252 |
+
],
|
| 253 |
+
[
|
| 254 |
+
0,
|
| 255 |
+
0,
|
| 256 |
+
1,
|
| 257 |
+
0,
|
| 258 |
+
0,
|
| 259 |
+
105,
|
| 260 |
+
0,
|
| 261 |
+
0,
|
| 262 |
+
0,
|
| 263 |
+
0
|
| 264 |
+
],
|
| 265 |
+
[
|
| 266 |
+
0,
|
| 267 |
+
0,
|
| 268 |
+
1,
|
| 269 |
+
0,
|
| 270 |
+
0,
|
| 271 |
+
0,
|
| 272 |
+
120,
|
| 273 |
+
0,
|
| 274 |
+
0,
|
| 275 |
+
1
|
| 276 |
+
],
|
| 277 |
+
[
|
| 278 |
+
0,
|
| 279 |
+
0,
|
| 280 |
+
0,
|
| 281 |
+
0,
|
| 282 |
+
0,
|
| 283 |
+
0,
|
| 284 |
+
0,
|
| 285 |
+
107,
|
| 286 |
+
0,
|
| 287 |
+
0
|
| 288 |
+
],
|
| 289 |
+
[
|
| 290 |
+
0,
|
| 291 |
+
0,
|
| 292 |
+
15,
|
| 293 |
+
3,
|
| 294 |
+
0,
|
| 295 |
+
1,
|
| 296 |
+
1,
|
| 297 |
+
2,
|
| 298 |
+
4,
|
| 299 |
+
6
|
| 300 |
+
],
|
| 301 |
+
[
|
| 302 |
+
0,
|
| 303 |
+
2,
|
| 304 |
+
1,
|
| 305 |
+
2,
|
| 306 |
+
0,
|
| 307 |
+
5,
|
| 308 |
+
1,
|
| 309 |
+
3,
|
| 310 |
+
6,
|
| 311 |
+
8
|
| 312 |
+
]
|
| 313 |
+
]
|
| 314 |
+
},
|
| 315 |
+
"macro_roc_auc_ovr": 0.9785892106991877
|
| 316 |
+
},
|
| 317 |
+
"delta_accuracy": 0.0011111111111111738,
|
| 318 |
+
"delta_macro_f1": -0.0027730303947443025
|
| 319 |
+
},
|
| 320 |
+
"no_behavioural": {
|
| 321 |
+
"n_features": 60,
|
| 322 |
+
"dropped_count": 9,
|
| 323 |
+
"metrics": {
|
| 324 |
+
"model": "xgboost_no_behavioural",
|
| 325 |
+
"accuracy": 0.9088888888888889,
|
| 326 |
+
"macro_f1": 0.7578825763491894,
|
| 327 |
+
"weighted_f1": 0.8916039125438652,
|
| 328 |
+
"per_class_f1": {
|
| 329 |
+
"c2_communication": 1.0,
|
| 330 |
+
"data_exfiltration": 0.9372384937238494,
|
| 331 |
+
"dormancy_dwell": 0.463768115942029,
|
| 332 |
+
"initial_drop": 0.9494163424124513,
|
| 333 |
+
"lateral_movement": 0.9596774193548387,
|
| 334 |
+
"payload_execution": 0.9422222222222222,
|
| 335 |
+
"persistence_establishment": 0.9876543209876543,
|
| 336 |
+
"privilege_escalation": 0.9907407407407407,
|
| 337 |
+
"sandbox_evasion_stall": 0.24,
|
| 338 |
+
"self_destruct_cleanup": 0.10810810810810811
|
| 339 |
+
},
|
| 340 |
+
"confusion_matrix": {
|
| 341 |
+
"labels": [
|
| 342 |
+
"c2_communication",
|
| 343 |
+
"data_exfiltration",
|
| 344 |
+
"dormancy_dwell",
|
| 345 |
+
"initial_drop",
|
| 346 |
+
"lateral_movement",
|
| 347 |
+
"payload_execution",
|
| 348 |
+
"persistence_establishment",
|
| 349 |
+
"privilege_escalation",
|
| 350 |
+
"sandbox_evasion_stall",
|
| 351 |
+
"self_destruct_cleanup"
|
| 352 |
+
],
|
| 353 |
+
"matrix": [
|
| 354 |
+
[
|
| 355 |
+
108,
|
| 356 |
+
0,
|
| 357 |
+
0,
|
| 358 |
+
0,
|
| 359 |
+
0,
|
| 360 |
+
0,
|
| 361 |
+
0,
|
| 362 |
+
0,
|
| 363 |
+
0,
|
| 364 |
+
0
|
| 365 |
+
],
|
| 366 |
+
[
|
| 367 |
+
0,
|
| 368 |
+
112,
|
| 369 |
+
1,
|
| 370 |
+
0,
|
| 371 |
+
0,
|
| 372 |
+
0,
|
| 373 |
+
0,
|
| 374 |
+
0,
|
| 375 |
+
0,
|
| 376 |
+
0
|
| 377 |
+
],
|
| 378 |
+
[
|
| 379 |
+
0,
|
| 380 |
+
6,
|
| 381 |
+
16,
|
| 382 |
+
7,
|
| 383 |
+
2,
|
| 384 |
+
5,
|
| 385 |
+
0,
|
| 386 |
+
0,
|
| 387 |
+
3,
|
| 388 |
+
1
|
| 389 |
+
],
|
| 390 |
+
[
|
| 391 |
+
0,
|
| 392 |
+
0,
|
| 393 |
+
0,
|
| 394 |
+
122,
|
| 395 |
+
0,
|
| 396 |
+
0,
|
| 397 |
+
0,
|
| 398 |
+
0,
|
| 399 |
+
1,
|
| 400 |
+
0
|
| 401 |
+
],
|
| 402 |
+
[
|
| 403 |
+
0,
|
| 404 |
+
0,
|
| 405 |
+
0,
|
| 406 |
+
0,
|
| 407 |
+
119,
|
| 408 |
+
0,
|
| 409 |
+
0,
|
| 410 |
+
0,
|
| 411 |
+
1,
|
| 412 |
+
1
|
| 413 |
+
],
|
| 414 |
+
[
|
| 415 |
+
0,
|
| 416 |
+
0,
|
| 417 |
+
0,
|
| 418 |
+
0,
|
| 419 |
+
0,
|
| 420 |
+
106,
|
| 421 |
+
0,
|
| 422 |
+
0,
|
| 423 |
+
0,
|
| 424 |
+
0
|
| 425 |
+
],
|
| 426 |
+
[
|
| 427 |
+
0,
|
| 428 |
+
0,
|
| 429 |
+
2,
|
| 430 |
+
0,
|
| 431 |
+
0,
|
| 432 |
+
0,
|
| 433 |
+
120,
|
| 434 |
+
0,
|
| 435 |
+
0,
|
| 436 |
+
0
|
| 437 |
+
],
|
| 438 |
+
[
|
| 439 |
+
0,
|
| 440 |
+
0,
|
| 441 |
+
0,
|
| 442 |
+
0,
|
| 443 |
+
0,
|
| 444 |
+
0,
|
| 445 |
+
0,
|
| 446 |
+
107,
|
| 447 |
+
0,
|
| 448 |
+
0
|
| 449 |
+
],
|
| 450 |
+
[
|
| 451 |
+
0,
|
| 452 |
+
2,
|
| 453 |
+
8,
|
| 454 |
+
3,
|
| 455 |
+
2,
|
| 456 |
+
3,
|
| 457 |
+
1,
|
| 458 |
+
2,
|
| 459 |
+
6,
|
| 460 |
+
5
|
| 461 |
+
],
|
| 462 |
+
[
|
| 463 |
+
0,
|
| 464 |
+
6,
|
| 465 |
+
2,
|
| 466 |
+
2,
|
| 467 |
+
4,
|
| 468 |
+
5,
|
| 469 |
+
0,
|
| 470 |
+
0,
|
| 471 |
+
7,
|
| 472 |
+
2
|
| 473 |
+
]
|
| 474 |
+
]
|
| 475 |
+
},
|
| 476 |
+
"macro_roc_auc_ovr": 0.9704768382021074
|
| 477 |
+
},
|
| 478 |
+
"delta_accuracy": 0.008888888888888946,
|
| 479 |
+
"delta_macro_f1": 0.020187388162107966
|
| 480 |
+
},
|
| 481 |
+
"no_timestep": {
|
| 482 |
+
"n_features": 68,
|
| 483 |
+
"dropped_count": 1,
|
| 484 |
+
"metrics": {
|
| 485 |
+
"model": "xgboost_no_timestep",
|
| 486 |
+
"accuracy": 0.6933333333333334,
|
| 487 |
+
"macro_f1": 0.5963303534115096,
|
| 488 |
+
"weighted_f1": 0.6919482762076271,
|
| 489 |
+
"per_class_f1": {
|
| 490 |
+
"c2_communication": 1.0,
|
| 491 |
+
"data_exfiltration": 0.7619047619047619,
|
| 492 |
+
"dormancy_dwell": 0.5882352941176471,
|
| 493 |
+
"initial_drop": 0.5072463768115942,
|
| 494 |
+
"lateral_movement": 0.6985645933014354,
|
| 495 |
+
"payload_execution": 0.5106382978723404,
|
| 496 |
+
"persistence_establishment": 0.8433734939759037,
|
| 497 |
+
"privilege_escalation": 0.9047619047619048,
|
| 498 |
+
"sandbox_evasion_stall": 0.05555555555555555,
|
| 499 |
+
"self_destruct_cleanup": 0.09302325581395349
|
| 500 |
+
},
|
| 501 |
+
"confusion_matrix": {
|
| 502 |
+
"labels": [
|
| 503 |
+
"c2_communication",
|
| 504 |
+
"data_exfiltration",
|
| 505 |
+
"dormancy_dwell",
|
| 506 |
+
"initial_drop",
|
| 507 |
+
"lateral_movement",
|
| 508 |
+
"payload_execution",
|
| 509 |
+
"persistence_establishment",
|
| 510 |
+
"privilege_escalation",
|
| 511 |
+
"sandbox_evasion_stall",
|
| 512 |
+
"self_destruct_cleanup"
|
| 513 |
+
],
|
| 514 |
+
"matrix": [
|
| 515 |
+
[
|
| 516 |
+
108,
|
| 517 |
+
0,
|
| 518 |
+
0,
|
| 519 |
+
0,
|
| 520 |
+
0,
|
| 521 |
+
0,
|
| 522 |
+
0,
|
| 523 |
+
0,
|
| 524 |
+
0,
|
| 525 |
+
0
|
| 526 |
+
],
|
| 527 |
+
[
|
| 528 |
+
0,
|
| 529 |
+
96,
|
| 530 |
+
0,
|
| 531 |
+
4,
|
| 532 |
+
9,
|
| 533 |
+
2,
|
| 534 |
+
1,
|
| 535 |
+
0,
|
| 536 |
+
0,
|
| 537 |
+
1
|
| 538 |
+
],
|
| 539 |
+
[
|
| 540 |
+
0,
|
| 541 |
+
0,
|
| 542 |
+
25,
|
| 543 |
+
10,
|
| 544 |
+
0,
|
| 545 |
+
1,
|
| 546 |
+
0,
|
| 547 |
+
0,
|
| 548 |
+
4,
|
| 549 |
+
0
|
| 550 |
+
],
|
| 551 |
+
[
|
| 552 |
+
0,
|
| 553 |
+
2,
|
| 554 |
+
6,
|
| 555 |
+
70,
|
| 556 |
+
1,
|
| 557 |
+
12,
|
| 558 |
+
7,
|
| 559 |
+
0,
|
| 560 |
+
22,
|
| 561 |
+
3
|
| 562 |
+
],
|
| 563 |
+
[
|
| 564 |
+
0,
|
| 565 |
+
39,
|
| 566 |
+
0,
|
| 567 |
+
1,
|
| 568 |
+
73,
|
| 569 |
+
7,
|
| 570 |
+
0,
|
| 571 |
+
1,
|
| 572 |
+
0,
|
| 573 |
+
0
|
| 574 |
+
],
|
| 575 |
+
[
|
| 576 |
+
0,
|
| 577 |
+
1,
|
| 578 |
+
0,
|
| 579 |
+
37,
|
| 580 |
+
5,
|
| 581 |
+
48,
|
| 582 |
+
2,
|
| 583 |
+
1,
|
| 584 |
+
5,
|
| 585 |
+
7
|
| 586 |
+
],
|
| 587 |
+
[
|
| 588 |
+
0,
|
| 589 |
+
0,
|
| 590 |
+
1,
|
| 591 |
+
7,
|
| 592 |
+
0,
|
| 593 |
+
2,
|
| 594 |
+
105,
|
| 595 |
+
6,
|
| 596 |
+
1,
|
| 597 |
+
0
|
| 598 |
+
],
|
| 599 |
+
[
|
| 600 |
+
0,
|
| 601 |
+
0,
|
| 602 |
+
0,
|
| 603 |
+
0,
|
| 604 |
+
0,
|
| 605 |
+
2,
|
| 606 |
+
9,
|
| 607 |
+
95,
|
| 608 |
+
1,
|
| 609 |
+
0
|
| 610 |
+
],
|
| 611 |
+
[
|
| 612 |
+
0,
|
| 613 |
+
0,
|
| 614 |
+
13,
|
| 615 |
+
12,
|
| 616 |
+
0,
|
| 617 |
+
2,
|
| 618 |
+
1,
|
| 619 |
+
0,
|
| 620 |
+
2,
|
| 621 |
+
2
|
| 622 |
+
],
|
| 623 |
+
[
|
| 624 |
+
0,
|
| 625 |
+
1,
|
| 626 |
+
0,
|
| 627 |
+
12,
|
| 628 |
+
0,
|
| 629 |
+
6,
|
| 630 |
+
2,
|
| 631 |
+
0,
|
| 632 |
+
5,
|
| 633 |
+
2
|
| 634 |
+
]
|
| 635 |
+
]
|
| 636 |
+
},
|
| 637 |
+
"macro_roc_auc_ovr": 0.9263760295591874
|
| 638 |
+
},
|
| 639 |
+
"delta_accuracy": 0.22444444444444445,
|
| 640 |
+
"delta_macro_f1": 0.18173961109978776
|
| 641 |
+
},
|
| 642 |
+
"no_engineered": {
|
| 643 |
+
"n_features": 63,
|
| 644 |
+
"dropped_count": 6,
|
| 645 |
+
"metrics": {
|
| 646 |
+
"model": "xgboost_no_engineered",
|
| 647 |
+
"accuracy": 0.92,
|
| 648 |
+
"macro_f1": 0.7931081498668057,
|
| 649 |
+
"weighted_f1": 0.9099535506095557,
|
| 650 |
+
"per_class_f1": {
|
| 651 |
+
"c2_communication": 0.9906542056074766,
|
| 652 |
+
"data_exfiltration": 0.9617021276595744,
|
| 653 |
+
"dormancy_dwell": 0.5205479452054794,
|
| 654 |
+
"initial_drop": 0.9534883720930233,
|
| 655 |
+
"lateral_movement": 0.9958847736625515,
|
| 656 |
+
"payload_execution": 0.963302752293578,
|
| 657 |
+
"persistence_establishment": 0.9836065573770492,
|
| 658 |
+
"privilege_escalation": 0.9861751152073732,
|
| 659 |
+
"sandbox_evasion_stall": 0.23529411764705882,
|
| 660 |
+
"self_destruct_cleanup": 0.3404255319148936
|
| 661 |
+
},
|
| 662 |
+
"confusion_matrix": {
|
| 663 |
+
"labels": [
|
| 664 |
+
"c2_communication",
|
| 665 |
+
"data_exfiltration",
|
| 666 |
+
"dormancy_dwell",
|
| 667 |
+
"initial_drop",
|
| 668 |
+
"lateral_movement",
|
| 669 |
+
"payload_execution",
|
| 670 |
+
"persistence_establishment",
|
| 671 |
+
"privilege_escalation",
|
| 672 |
+
"sandbox_evasion_stall",
|
| 673 |
+
"self_destruct_cleanup"
|
| 674 |
+
],
|
| 675 |
+
"matrix": [
|
| 676 |
+
[
|
| 677 |
+
106,
|
| 678 |
+
2,
|
| 679 |
+
0,
|
| 680 |
+
0,
|
| 681 |
+
0,
|
| 682 |
+
0,
|
| 683 |
+
0,
|
| 684 |
+
0,
|
| 685 |
+
0,
|
| 686 |
+
0
|
| 687 |
+
],
|
| 688 |
+
[
|
| 689 |
+
0,
|
| 690 |
+
113,
|
| 691 |
+
0,
|
| 692 |
+
0,
|
| 693 |
+
0,
|
| 694 |
+
0,
|
| 695 |
+
0,
|
| 696 |
+
0,
|
| 697 |
+
0,
|
| 698 |
+
0
|
| 699 |
+
],
|
| 700 |
+
[
|
| 701 |
+
0,
|
| 702 |
+
4,
|
| 703 |
+
19,
|
| 704 |
+
7,
|
| 705 |
+
0,
|
| 706 |
+
1,
|
| 707 |
+
0,
|
| 708 |
+
0,
|
| 709 |
+
4,
|
| 710 |
+
5
|
| 711 |
+
],
|
| 712 |
+
[
|
| 713 |
+
0,
|
| 714 |
+
0,
|
| 715 |
+
0,
|
| 716 |
+
123,
|
| 717 |
+
0,
|
| 718 |
+
0,
|
| 719 |
+
0,
|
| 720 |
+
0,
|
| 721 |
+
0,
|
| 722 |
+
0
|
| 723 |
+
],
|
| 724 |
+
[
|
| 725 |
+
0,
|
| 726 |
+
0,
|
| 727 |
+
0,
|
| 728 |
+
0,
|
| 729 |
+
121,
|
| 730 |
+
0,
|
| 731 |
+
0,
|
| 732 |
+
0,
|
| 733 |
+
0,
|
| 734 |
+
0
|
| 735 |
+
],
|
| 736 |
+
[
|
| 737 |
+
0,
|
| 738 |
+
0,
|
| 739 |
+
1,
|
| 740 |
+
0,
|
| 741 |
+
0,
|
| 742 |
+
105,
|
| 743 |
+
0,
|
| 744 |
+
0,
|
| 745 |
+
0,
|
| 746 |
+
0
|
| 747 |
+
],
|
| 748 |
+
[
|
| 749 |
+
0,
|
| 750 |
+
0,
|
| 751 |
+
0,
|
| 752 |
+
0,
|
| 753 |
+
0,
|
| 754 |
+
0,
|
| 755 |
+
120,
|
| 756 |
+
0,
|
| 757 |
+
1,
|
| 758 |
+
1
|
| 759 |
+
],
|
| 760 |
+
[
|
| 761 |
+
0,
|
| 762 |
+
0,
|
| 763 |
+
0,
|
| 764 |
+
0,
|
| 765 |
+
0,
|
| 766 |
+
0,
|
| 767 |
+
0,
|
| 768 |
+
107,
|
| 769 |
+
0,
|
| 770 |
+
0
|
| 771 |
+
],
|
| 772 |
+
[
|
| 773 |
+
0,
|
| 774 |
+
0,
|
| 775 |
+
13,
|
| 776 |
+
3,
|
| 777 |
+
0,
|
| 778 |
+
1,
|
| 779 |
+
1,
|
| 780 |
+
3,
|
| 781 |
+
6,
|
| 782 |
+
5
|
| 783 |
+
],
|
| 784 |
+
[
|
| 785 |
+
0,
|
| 786 |
+
3,
|
| 787 |
+
0,
|
| 788 |
+
2,
|
| 789 |
+
1,
|
| 790 |
+
5,
|
| 791 |
+
1,
|
| 792 |
+
0,
|
| 793 |
+
8,
|
| 794 |
+
8
|
| 795 |
+
]
|
| 796 |
+
]
|
| 797 |
+
},
|
| 798 |
+
"macro_roc_auc_ovr": 0.9796965243561164
|
| 799 |
+
},
|
| 800 |
+
"delta_accuracy": -0.0022222222222222365,
|
| 801 |
+
"delta_macro_f1": -0.015038185355508271
|
| 802 |
+
}
|
| 803 |
+
}
|
| 804 |
+
}
|
feature_engineering.py
ADDED
|
@@ -0,0 +1,325 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
feature_engineering.py
|
| 3 |
+
======================
|
| 4 |
+
|
| 5 |
+
Feature pipeline for the CYB003 baseline classifier.
|
| 6 |
+
|
| 7 |
+
Predicts `execution_phase` (10-class) from per-timestep malware execution
|
| 8 |
+
telemetry on the CYB003 sample dataset.
|
| 9 |
+
|
| 10 |
+
CSV inputs:
|
| 11 |
+
malware_samples.csv (primary, one row per timestep, 60 timesteps
|
| 12 |
+
per sample, 100 samples = 6000 rows)
|
| 13 |
+
sample_summary.csv (per-sample aggregates; reserved for future
|
| 14 |
+
work — joining inflates per-sample features
|
| 15 |
+
across 60 identical replications, which hurt
|
| 16 |
+
the model in pilot experiments)
|
| 17 |
+
environment_profiles.csv (reserved for future work)
|
| 18 |
+
execution_events.csv (reserved for future work)
|
| 19 |
+
|
| 20 |
+
Target classes (10 execution phases observed in the sample):
|
| 21 |
+
initial_drop, persistence_establishment, privilege_escalation,
|
| 22 |
+
lateral_movement, payload_execution, data_exfiltration,
|
| 23 |
+
c2_communication, dormancy_dwell, sandbox_evasion_stall,
|
| 24 |
+
self_destruct_cleanup
|
| 25 |
+
|
| 26 |
+
This corresponds to the SOC / sandbox-analyst use case: given the malware's
|
| 27 |
+
current behavioural state, what phase of execution is it in? Useful for
|
| 28 |
+
dynamic-analysis tools, EDR phase tagging, and behavioural classifiers.
|
| 29 |
+
|
| 30 |
+
The pivot to execution_phase (away from malware_family) happened because
|
| 31 |
+
malware family classification on n=100 samples with group-aware splitting
|
| 32 |
+
landed at majority-baseline accuracy (~15%, ROC-AUC ~0.58). execution_phase
|
| 33 |
+
sits on 6,000 rows of per-timestep data with strong, stable signal across
|
| 34 |
+
seeds (~91% accuracy, ROC-AUC ~0.98). See the model card for details.
|
| 35 |
+
|
| 36 |
+
Leakage analysis
|
| 37 |
+
----------------
|
| 38 |
+
No categorical feature has phase->phase purity above 0.17 (uniform random
|
| 39 |
+
baseline is 0.10), so nothing in the data is an oracle for the target.
|
| 40 |
+
The model relies on a mix of `timestep` (strong but not deterministic —
|
| 41 |
+
most phases have tight timestep windows, but `dormancy_dwell`,
|
| 42 |
+
`sandbox_evasion_stall`, and `self_destruct_cleanup` span the full
|
| 43 |
+
0-59 range) and behavioural features.
|
| 44 |
+
|
| 45 |
+
Public API
|
| 46 |
+
----------
|
| 47 |
+
build_features(samples_path) -> (X, y, groups, meta)
|
| 48 |
+
transform_single(record, meta) -> np.ndarray
|
| 49 |
+
save_meta(meta, path) / load_meta(path)
|
| 50 |
+
|
| 51 |
+
License
|
| 52 |
+
-------
|
| 53 |
+
Ships with the public model on Hugging Face under CC-BY-NC-4.0, matching
|
| 54 |
+
the dataset license. See README.md.
|
| 55 |
+
"""
|
| 56 |
+
|
| 57 |
+
from __future__ import annotations
|
| 58 |
+
|
| 59 |
+
import json
|
| 60 |
+
from pathlib import Path
|
| 61 |
+
from typing import Any
|
| 62 |
+
|
| 63 |
+
import numpy as np
|
| 64 |
+
import pandas as pd
|
| 65 |
+
|
| 66 |
+
# ---------------------------------------------------------------------------
|
| 67 |
+
# Label space
|
| 68 |
+
# ---------------------------------------------------------------------------
|
| 69 |
+
|
| 70 |
+
# Alphabetical for stable indexing.
|
| 71 |
+
LABEL_ORDER = [
|
| 72 |
+
"c2_communication",
|
| 73 |
+
"data_exfiltration",
|
| 74 |
+
"dormancy_dwell",
|
| 75 |
+
"initial_drop",
|
| 76 |
+
"lateral_movement",
|
| 77 |
+
"payload_execution",
|
| 78 |
+
"persistence_establishment",
|
| 79 |
+
"privilege_escalation",
|
| 80 |
+
"sandbox_evasion_stall",
|
| 81 |
+
"self_destruct_cleanup",
|
| 82 |
+
]
|
| 83 |
+
LABEL_TO_INT = {lbl: i for i, lbl in enumerate(LABEL_ORDER)}
|
| 84 |
+
INT_TO_LABEL = {i: lbl for lbl, i in LABEL_TO_INT.items()}
|
| 85 |
+
|
| 86 |
+
# ---------------------------------------------------------------------------
|
| 87 |
+
# Identifier and target columns - not features
|
| 88 |
+
# ---------------------------------------------------------------------------
|
| 89 |
+
|
| 90 |
+
ID_COLUMNS = ["sample_id", "family_id", "threat_actor_id"]
|
| 91 |
+
TARGET_COLUMN = "execution_phase"
|
| 92 |
+
|
| 93 |
+
# Note: malware_family is kept as a FEATURE for phase prediction (family
|
| 94 |
+
# is a useful observable - a SOC analyst knows what family they're looking
|
| 95 |
+
# at). It's not a leakage source for phase since phase->family purity is
|
| 96 |
+
# only 0.16. Same logic for threat_actor_tier, ep_stack, target_platform -
|
| 97 |
+
# these are environmental context, not oracles for phase.
|
| 98 |
+
|
| 99 |
+
# ---------------------------------------------------------------------------
|
| 100 |
+
# Per-timestep numeric features
|
| 101 |
+
# ---------------------------------------------------------------------------
|
| 102 |
+
|
| 103 |
+
DIRECT_NUMERIC_TIMESTEP_FEATURES = [
|
| 104 |
+
"timestep", # strong but non-deterministic phase signal
|
| 105 |
+
"api_call_rate",
|
| 106 |
+
"registry_write_count",
|
| 107 |
+
"network_connection_count",
|
| 108 |
+
"process_injection_flag",
|
| 109 |
+
"c2_beacon_interval_sec",
|
| 110 |
+
"av_signature_hit_flag",
|
| 111 |
+
"sandbox_evasion_flag",
|
| 112 |
+
"lateral_propagation_count",
|
| 113 |
+
"privilege_escalation_flag",
|
| 114 |
+
# PE static features (constant per sample but informative for phase
|
| 115 |
+
# given that the model sees these alongside per-step behaviour)
|
| 116 |
+
"pe_entropy_mean",
|
| 117 |
+
"pe_entropy_std",
|
| 118 |
+
"import_hash_cluster",
|
| 119 |
+
"section_count",
|
| 120 |
+
"packed_section_ratio",
|
| 121 |
+
"string_entropy_mean",
|
| 122 |
+
"byte_histogram_chi2",
|
| 123 |
+
"code_section_rx_ratio",
|
| 124 |
+
"resource_section_entropy",
|
| 125 |
+
"suspicious_import_count",
|
| 126 |
+
"packer_detected_flag",
|
| 127 |
+
]
|
| 128 |
+
|
| 129 |
+
CATEGORICAL_TIMESTEP_FEATURES = [
|
| 130 |
+
"malware_family", # kept as feature: phase prediction conditions
|
| 131 |
+
# on family (a known observable in SOC workflows)
|
| 132 |
+
"threat_actor_tier",
|
| 133 |
+
"target_platform",
|
| 134 |
+
"obfuscation_technique",
|
| 135 |
+
"detection_outcome",
|
| 136 |
+
"ep_stack",
|
| 137 |
+
]
|
| 138 |
+
|
| 139 |
+
# ---------------------------------------------------------------------------
|
| 140 |
+
# Engineered features (none derived from phase or timestep alone)
|
| 141 |
+
# ---------------------------------------------------------------------------
|
| 142 |
+
|
| 143 |
+
def _add_engineered_features(df: pd.DataFrame) -> pd.DataFrame:
|
| 144 |
+
"""
|
| 145 |
+
Six engineered features. None directly encode phase (that would be
|
| 146 |
+
a tautology); each is a behavioural composite that disambiguates
|
| 147 |
+
phases sharing similar timestep ranges.
|
| 148 |
+
"""
|
| 149 |
+
df = df.copy()
|
| 150 |
+
|
| 151 |
+
# 1. API burst score: high for execution-heavy phases (payload_execution,
|
| 152 |
+
# privilege_escalation), low for stealth phases (dormancy, evasion).
|
| 153 |
+
df["api_burst_score"] = (
|
| 154 |
+
df["api_call_rate"] * df["registry_write_count"].clip(upper=50)
|
| 155 |
+
).astype(float)
|
| 156 |
+
|
| 157 |
+
# 2. C2 active flag: positive c2_beacon_interval_sec indicates active
|
| 158 |
+
# beaconing. Strongly correlates with c2_communication phase.
|
| 159 |
+
df["is_c2_active"] = (df["c2_beacon_interval_sec"] > 0).astype(int)
|
| 160 |
+
|
| 161 |
+
# 3. High network volume step: above-threshold connection count, common
|
| 162 |
+
# in lateral_movement, data_exfiltration, c2_communication.
|
| 163 |
+
df["is_high_net_volume"] = (df["network_connection_count"] > 5).astype(int)
|
| 164 |
+
|
| 165 |
+
# 4. Stealth indicator: low api_call_rate AND no AV/sandbox hit. Used
|
| 166 |
+
# to disambiguate dormancy_dwell / sandbox_evasion_stall from active
|
| 167 |
+
# phases that happen to land in similar timestep windows.
|
| 168 |
+
df["is_stealth_step"] = (
|
| 169 |
+
(df["api_call_rate"] < 5)
|
| 170 |
+
& (df["av_signature_hit_flag"] == 0)
|
| 171 |
+
& (df["sandbox_evasion_flag"] == 0)
|
| 172 |
+
).astype(int)
|
| 173 |
+
|
| 174 |
+
# 5. Destructive action indicator: combines privilege escalation flag
|
| 175 |
+
# and registry-write count. High in persistence_establishment and
|
| 176 |
+
# self_destruct_cleanup.
|
| 177 |
+
df["is_destructive_step"] = (
|
| 178 |
+
(df["privilege_escalation_flag"] == 1)
|
| 179 |
+
| (df["registry_write_count"] > 10)
|
| 180 |
+
).astype(int)
|
| 181 |
+
|
| 182 |
+
# 6. Lateral activity: network connections combined with lateral_propagation
|
| 183 |
+
# count > 0. Distinguishes lateral_movement from other network phases.
|
| 184 |
+
df["lateral_activity_score"] = (
|
| 185 |
+
df["lateral_propagation_count"] * df["network_connection_count"]
|
| 186 |
+
).astype(float)
|
| 187 |
+
|
| 188 |
+
return df
|
| 189 |
+
|
| 190 |
+
|
| 191 |
+
# ---------------------------------------------------------------------------
|
| 192 |
+
# Public API
|
| 193 |
+
# ---------------------------------------------------------------------------
|
| 194 |
+
|
| 195 |
+
def build_features(
|
| 196 |
+
samples_path: str | Path,
|
| 197 |
+
) -> tuple[pd.DataFrame, pd.Series, pd.Series, dict[str, Any]]:
|
| 198 |
+
"""
|
| 199 |
+
Load CSV, drop identifier columns and target, engineer features,
|
| 200 |
+
one-hot encode, return (X, y, groups, meta).
|
| 201 |
+
|
| 202 |
+
`groups` is a Series of sample_id values aligned with X. Use it
|
| 203 |
+
with GroupShuffleSplit / GroupKFold: a single sample contains 60
|
| 204 |
+
correlated timesteps, and row-level random splitting inflates metrics.
|
| 205 |
+
"""
|
| 206 |
+
samples = pd.read_csv(samples_path)
|
| 207 |
+
|
| 208 |
+
# Extract target + groups
|
| 209 |
+
y = samples[TARGET_COLUMN].map(LABEL_TO_INT)
|
| 210 |
+
if y.isna().any():
|
| 211 |
+
bad = samples.loc[y.isna(), TARGET_COLUMN].unique()
|
| 212 |
+
raise ValueError(f"Unknown execution_phase values: {bad}")
|
| 213 |
+
y = y.astype(int)
|
| 214 |
+
groups = samples["sample_id"].copy()
|
| 215 |
+
|
| 216 |
+
# Drop target + identifiers from feature pool
|
| 217 |
+
samples = samples.drop(columns=ID_COLUMNS + [TARGET_COLUMN], errors="ignore")
|
| 218 |
+
|
| 219 |
+
# Engineered features
|
| 220 |
+
samples = _add_engineered_features(samples)
|
| 221 |
+
|
| 222 |
+
# Numeric features
|
| 223 |
+
numeric_features = (
|
| 224 |
+
DIRECT_NUMERIC_TIMESTEP_FEATURES
|
| 225 |
+
+ [
|
| 226 |
+
"api_burst_score", "is_c2_active", "is_high_net_volume",
|
| 227 |
+
"is_stealth_step", "is_destructive_step", "lateral_activity_score",
|
| 228 |
+
]
|
| 229 |
+
)
|
| 230 |
+
X_numeric = samples[numeric_features].astype(float)
|
| 231 |
+
|
| 232 |
+
# One-hot categoricals
|
| 233 |
+
categorical_levels: dict[str, list[str]] = {}
|
| 234 |
+
blocks: list[pd.DataFrame] = []
|
| 235 |
+
for col in CATEGORICAL_TIMESTEP_FEATURES:
|
| 236 |
+
if col not in samples.columns:
|
| 237 |
+
continue
|
| 238 |
+
levels = sorted(samples[col].dropna().unique().tolist())
|
| 239 |
+
categorical_levels[col] = levels
|
| 240 |
+
block = pd.get_dummies(
|
| 241 |
+
samples[col].astype("category").cat.set_categories(levels),
|
| 242 |
+
prefix=col, dummy_na=False,
|
| 243 |
+
).astype(int)
|
| 244 |
+
blocks.append(block)
|
| 245 |
+
|
| 246 |
+
X = pd.concat(
|
| 247 |
+
[X_numeric.reset_index(drop=True)]
|
| 248 |
+
+ [b.reset_index(drop=True) for b in blocks],
|
| 249 |
+
axis=1,
|
| 250 |
+
).fillna(0.0)
|
| 251 |
+
|
| 252 |
+
meta = {
|
| 253 |
+
"feature_names": X.columns.tolist(),
|
| 254 |
+
"numeric_features": numeric_features,
|
| 255 |
+
"categorical_levels": categorical_levels,
|
| 256 |
+
"label_to_int": LABEL_TO_INT,
|
| 257 |
+
"int_to_label": INT_TO_LABEL,
|
| 258 |
+
}
|
| 259 |
+
return X, y, groups, meta
|
| 260 |
+
|
| 261 |
+
|
| 262 |
+
def transform_single(
|
| 263 |
+
record: dict | pd.DataFrame,
|
| 264 |
+
meta: dict[str, Any],
|
| 265 |
+
) -> np.ndarray:
|
| 266 |
+
"""Encode a single timestep record for inference."""
|
| 267 |
+
if isinstance(record, dict):
|
| 268 |
+
df = pd.DataFrame([record.copy()])
|
| 269 |
+
else:
|
| 270 |
+
df = record.copy()
|
| 271 |
+
|
| 272 |
+
df = _add_engineered_features(df)
|
| 273 |
+
|
| 274 |
+
numeric = pd.DataFrame({
|
| 275 |
+
col: df.get(col, pd.Series([0.0] * len(df))).astype(float).values
|
| 276 |
+
for col in meta["numeric_features"]
|
| 277 |
+
})
|
| 278 |
+
blocks: list[pd.DataFrame] = [numeric]
|
| 279 |
+
for col, levels in meta["categorical_levels"].items():
|
| 280 |
+
val = df.get(col, pd.Series([None] * len(df)))
|
| 281 |
+
block = pd.get_dummies(
|
| 282 |
+
val.astype("category").cat.set_categories(levels),
|
| 283 |
+
prefix=col, dummy_na=False,
|
| 284 |
+
).astype(int)
|
| 285 |
+
for lvl in levels:
|
| 286 |
+
cname = f"{col}_{lvl}"
|
| 287 |
+
if cname not in block.columns:
|
| 288 |
+
block[cname] = 0
|
| 289 |
+
block = block[[f"{col}_{lvl}" for lvl in levels]]
|
| 290 |
+
blocks.append(block)
|
| 291 |
+
|
| 292 |
+
X = pd.concat(blocks, axis=1).fillna(0.0)
|
| 293 |
+
X = X.reindex(columns=meta["feature_names"], fill_value=0.0)
|
| 294 |
+
return X.values.astype(np.float32)
|
| 295 |
+
|
| 296 |
+
|
| 297 |
+
def save_meta(meta: dict[str, Any], path: str | Path) -> None:
|
| 298 |
+
serializable = {
|
| 299 |
+
"feature_names": meta["feature_names"],
|
| 300 |
+
"numeric_features": meta["numeric_features"],
|
| 301 |
+
"categorical_levels": meta["categorical_levels"],
|
| 302 |
+
"label_to_int": meta["label_to_int"],
|
| 303 |
+
"int_to_label": {str(k): v for k, v in meta["int_to_label"].items()},
|
| 304 |
+
}
|
| 305 |
+
with open(path, "w") as f:
|
| 306 |
+
json.dump(serializable, f, indent=2)
|
| 307 |
+
|
| 308 |
+
|
| 309 |
+
def load_meta(path: str | Path) -> dict[str, Any]:
|
| 310 |
+
with open(path) as f:
|
| 311 |
+
meta = json.load(f)
|
| 312 |
+
meta["int_to_label"] = {int(k): v for k, v in meta["int_to_label"].items()}
|
| 313 |
+
return meta
|
| 314 |
+
|
| 315 |
+
|
| 316 |
+
if __name__ == "__main__":
|
| 317 |
+
import sys
|
| 318 |
+
base = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("/mnt/user-data/uploads")
|
| 319 |
+
X, y, groups, meta = build_features(base / "malware_samples.csv")
|
| 320 |
+
print(f"X shape: {X.shape}")
|
| 321 |
+
print(f"y shape: {y.shape}")
|
| 322 |
+
print(f"groups: {groups.nunique()} samples")
|
| 323 |
+
print(f"n features: {len(meta['feature_names'])}")
|
| 324 |
+
print(f"label distribution:\n{y.map(INT_TO_LABEL).value_counts()}")
|
| 325 |
+
print(f"X has NaN: {X.isnull().any().any()}")
|
feature_meta.json
ADDED
|
@@ -0,0 +1,182 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"feature_names": [
|
| 3 |
+
"timestep",
|
| 4 |
+
"api_call_rate",
|
| 5 |
+
"registry_write_count",
|
| 6 |
+
"network_connection_count",
|
| 7 |
+
"process_injection_flag",
|
| 8 |
+
"c2_beacon_interval_sec",
|
| 9 |
+
"av_signature_hit_flag",
|
| 10 |
+
"sandbox_evasion_flag",
|
| 11 |
+
"lateral_propagation_count",
|
| 12 |
+
"privilege_escalation_flag",
|
| 13 |
+
"pe_entropy_mean",
|
| 14 |
+
"pe_entropy_std",
|
| 15 |
+
"import_hash_cluster",
|
| 16 |
+
"section_count",
|
| 17 |
+
"packed_section_ratio",
|
| 18 |
+
"string_entropy_mean",
|
| 19 |
+
"byte_histogram_chi2",
|
| 20 |
+
"code_section_rx_ratio",
|
| 21 |
+
"resource_section_entropy",
|
| 22 |
+
"suspicious_import_count",
|
| 23 |
+
"packer_detected_flag",
|
| 24 |
+
"api_burst_score",
|
| 25 |
+
"is_c2_active",
|
| 26 |
+
"is_high_net_volume",
|
| 27 |
+
"is_stealth_step",
|
| 28 |
+
"is_destructive_step",
|
| 29 |
+
"lateral_activity_score",
|
| 30 |
+
"malware_family_apt_implant",
|
| 31 |
+
"malware_family_botnet_agent",
|
| 32 |
+
"malware_family_cryptominer",
|
| 33 |
+
"malware_family_dropper",
|
| 34 |
+
"malware_family_fileless_malware",
|
| 35 |
+
"malware_family_ransomware",
|
| 36 |
+
"malware_family_rootkit",
|
| 37 |
+
"malware_family_spyware",
|
| 38 |
+
"malware_family_trojan",
|
| 39 |
+
"malware_family_worm",
|
| 40 |
+
"threat_actor_tier_apt",
|
| 41 |
+
"threat_actor_tier_commodity",
|
| 42 |
+
"threat_actor_tier_crimeware",
|
| 43 |
+
"threat_actor_tier_nation_state",
|
| 44 |
+
"target_platform_android_13",
|
| 45 |
+
"target_platform_embedded_ot_firmware",
|
| 46 |
+
"target_platform_linux_rhel_9",
|
| 47 |
+
"target_platform_linux_ubuntu_22",
|
| 48 |
+
"target_platform_macos_ventura",
|
| 49 |
+
"target_platform_windows_10_enterprise",
|
| 50 |
+
"target_platform_windows_11_pro",
|
| 51 |
+
"target_platform_windows_server_2022",
|
| 52 |
+
"obfuscation_technique_anti_analysis_stall",
|
| 53 |
+
"obfuscation_technique_code_signing_abuse",
|
| 54 |
+
"obfuscation_technique_lotl_binary",
|
| 55 |
+
"obfuscation_technique_packing",
|
| 56 |
+
"obfuscation_technique_polymorphic_mutation",
|
| 57 |
+
"obfuscation_technique_sandbox_evasion",
|
| 58 |
+
"obfuscation_technique_string_encryption",
|
| 59 |
+
"detection_outcome_behavioural_flag",
|
| 60 |
+
"detection_outcome_definitive_detection",
|
| 61 |
+
"detection_outcome_heuristic_alert",
|
| 62 |
+
"detection_outcome_sandbox_evasion_confirmed",
|
| 63 |
+
"detection_outcome_signature_miss",
|
| 64 |
+
"ep_stack_av_plus_firewall",
|
| 65 |
+
"ep_stack_deception_honeypot",
|
| 66 |
+
"ep_stack_edr_endpoint_detect",
|
| 67 |
+
"ep_stack_legacy_av_only",
|
| 68 |
+
"ep_stack_managed_detection_response",
|
| 69 |
+
"ep_stack_ngav_ml_based",
|
| 70 |
+
"ep_stack_no_protection",
|
| 71 |
+
"ep_stack_xdr_extended_detect"
|
| 72 |
+
],
|
| 73 |
+
"numeric_features": [
|
| 74 |
+
"timestep",
|
| 75 |
+
"api_call_rate",
|
| 76 |
+
"registry_write_count",
|
| 77 |
+
"network_connection_count",
|
| 78 |
+
"process_injection_flag",
|
| 79 |
+
"c2_beacon_interval_sec",
|
| 80 |
+
"av_signature_hit_flag",
|
| 81 |
+
"sandbox_evasion_flag",
|
| 82 |
+
"lateral_propagation_count",
|
| 83 |
+
"privilege_escalation_flag",
|
| 84 |
+
"pe_entropy_mean",
|
| 85 |
+
"pe_entropy_std",
|
| 86 |
+
"import_hash_cluster",
|
| 87 |
+
"section_count",
|
| 88 |
+
"packed_section_ratio",
|
| 89 |
+
"string_entropy_mean",
|
| 90 |
+
"byte_histogram_chi2",
|
| 91 |
+
"code_section_rx_ratio",
|
| 92 |
+
"resource_section_entropy",
|
| 93 |
+
"suspicious_import_count",
|
| 94 |
+
"packer_detected_flag",
|
| 95 |
+
"api_burst_score",
|
| 96 |
+
"is_c2_active",
|
| 97 |
+
"is_high_net_volume",
|
| 98 |
+
"is_stealth_step",
|
| 99 |
+
"is_destructive_step",
|
| 100 |
+
"lateral_activity_score"
|
| 101 |
+
],
|
| 102 |
+
"categorical_levels": {
|
| 103 |
+
"malware_family": [
|
| 104 |
+
"apt_implant",
|
| 105 |
+
"botnet_agent",
|
| 106 |
+
"cryptominer",
|
| 107 |
+
"dropper",
|
| 108 |
+
"fileless_malware",
|
| 109 |
+
"ransomware",
|
| 110 |
+
"rootkit",
|
| 111 |
+
"spyware",
|
| 112 |
+
"trojan",
|
| 113 |
+
"worm"
|
| 114 |
+
],
|
| 115 |
+
"threat_actor_tier": [
|
| 116 |
+
"apt",
|
| 117 |
+
"commodity",
|
| 118 |
+
"crimeware",
|
| 119 |
+
"nation_state"
|
| 120 |
+
],
|
| 121 |
+
"target_platform": [
|
| 122 |
+
"android_13",
|
| 123 |
+
"embedded_ot_firmware",
|
| 124 |
+
"linux_rhel_9",
|
| 125 |
+
"linux_ubuntu_22",
|
| 126 |
+
"macos_ventura",
|
| 127 |
+
"windows_10_enterprise",
|
| 128 |
+
"windows_11_pro",
|
| 129 |
+
"windows_server_2022"
|
| 130 |
+
],
|
| 131 |
+
"obfuscation_technique": [
|
| 132 |
+
"anti_analysis_stall",
|
| 133 |
+
"code_signing_abuse",
|
| 134 |
+
"lotl_binary",
|
| 135 |
+
"packing",
|
| 136 |
+
"polymorphic_mutation",
|
| 137 |
+
"sandbox_evasion",
|
| 138 |
+
"string_encryption"
|
| 139 |
+
],
|
| 140 |
+
"detection_outcome": [
|
| 141 |
+
"behavioural_flag",
|
| 142 |
+
"definitive_detection",
|
| 143 |
+
"heuristic_alert",
|
| 144 |
+
"sandbox_evasion_confirmed",
|
| 145 |
+
"signature_miss"
|
| 146 |
+
],
|
| 147 |
+
"ep_stack": [
|
| 148 |
+
"av_plus_firewall",
|
| 149 |
+
"deception_honeypot",
|
| 150 |
+
"edr_endpoint_detect",
|
| 151 |
+
"legacy_av_only",
|
| 152 |
+
"managed_detection_response",
|
| 153 |
+
"ngav_ml_based",
|
| 154 |
+
"no_protection",
|
| 155 |
+
"xdr_extended_detect"
|
| 156 |
+
]
|
| 157 |
+
},
|
| 158 |
+
"label_to_int": {
|
| 159 |
+
"c2_communication": 0,
|
| 160 |
+
"data_exfiltration": 1,
|
| 161 |
+
"dormancy_dwell": 2,
|
| 162 |
+
"initial_drop": 3,
|
| 163 |
+
"lateral_movement": 4,
|
| 164 |
+
"payload_execution": 5,
|
| 165 |
+
"persistence_establishment": 6,
|
| 166 |
+
"privilege_escalation": 7,
|
| 167 |
+
"sandbox_evasion_stall": 8,
|
| 168 |
+
"self_destruct_cleanup": 9
|
| 169 |
+
},
|
| 170 |
+
"int_to_label": {
|
| 171 |
+
"0": "c2_communication",
|
| 172 |
+
"1": "data_exfiltration",
|
| 173 |
+
"2": "dormancy_dwell",
|
| 174 |
+
"3": "initial_drop",
|
| 175 |
+
"4": "lateral_movement",
|
| 176 |
+
"5": "payload_execution",
|
| 177 |
+
"6": "persistence_establishment",
|
| 178 |
+
"7": "privilege_escalation",
|
| 179 |
+
"8": "sandbox_evasion_stall",
|
| 180 |
+
"9": "self_destruct_cleanup"
|
| 181 |
+
}
|
| 182 |
+
}
|
feature_scaler.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"mean": [29.5, 1.387591811594203, 2.5253623188405796, 4.403140096618357, 0.2543478260869565, 4.994391304347825, 0.29347826086956524, 0.34299516908212563, 0.03768115942028986, 0.08140096618357488, 0.8287420289855073, 0.18634782608695652, 274.6231884057971, 5.681159420289855, 0.42982463768115947, 0.5421188405797103, 41.10072463768116, 0.6250057971014492, 0.4523652173913043, 15.695652173913043, 0.463768115942029, 3.524582415458937, 0.11884057971014493, 0.33357487922705314, 0.45193236714975843, 0.0929951690821256, 0.3280193236714976, 0.13043478260869565, 0.13043478260869565, 0.13043478260869565, 0.07246376811594203, 0.057971014492753624, 0.08695652173913043, 0.08695652173913043, 0.13043478260869565, 0.08695652173913043, 0.08695652173913043, 0.21739130434782608, 0.3188405797101449, 0.42028985507246375, 0.043478260869565216, 0.08695652173913043, 0.057971014492753624, 0.07246376811594203, 0.11594202898550725, 0.057971014492753624, 0.3333333333333333, 0.13043478260869565, 0.14492753623188406, 0.14033816425120774, 0.14347826086956522, 0.1427536231884058, 0.14009661835748793, 0.15144927536231884, 0.14299516908212562, 0.1388888888888889, 0.0678743961352657, 0.17922705314009663, 0.08888888888888889, 0.10458937198067633, 0.5594202898550724, 0.11594202898550725, 0.11594202898550725, 0.08695652173913043, 0.15942028985507245, 0.14492753623188406, 0.15942028985507245, 0.13043478260869565, 0.08695652173913043], "std": [17.320194219715013, 0.13486579618110528, 2.8224558127303947, 3.855826464428149, 0.43554658867741924, 16.522749180589745, 0.45541065821011956, 0.4747672360871146, 0.22207359173815253, 0.2734829333055482, 0.13349684203848783, 0.0690646442535872, 164.83751594213814, 2.0467553940561625, 0.29063174139334635, 0.14071160667415852, 19.031317203687976, 0.16348965303394314, 0.17541357294450965, 5.309382618360122, 0.4987457613602604, 3.9756334300787786, 0.32363991799019004, 0.4715468040908369, 0.4977442571333736, 0.2904607481566321, 2.0197472660492055, 0.33682184196295206, 0.3368218419629521, 0.33682184196295206, 0.25928557483500797, 0.23371685876394413, 0.2818053712339797, 0.2818053712339797, 0.3368218419629521, 0.2818053712339797, 0.2818053712339797, 0.41252082351679387, 0.4660834006454619, 0.49366502689172936, 0.20395575381738024, 0.2818053712339797, 0.23371685876394416, 0.2592855748350079, 0.3201940649187907, 0.2337168587639441, 0.4714614640201808, 0.33682184196295206, 0.3520702854959198, 0.34737949256617373, 0.35060225443864834, 0.3498636771396811, 0.34712917169153373, 0.35852955456280744, 0.350110209377919, 0.34587231893054005, 0.25156062524270983, 0.3835886568811166, 0.2846176756328569, 0.30606055216695915, 0.4965166433941038, 0.3201940649187907, 0.32019406491879077, 0.2818053712339797, 0.3661117825566483, 0.3520702854959198, 0.36611178255664834, 0.3368218419629521, 0.2818053712339797]}
|
inference_example.ipynb
ADDED
|
@@ -0,0 +1,314 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"cells": [
|
| 3 |
+
{
|
| 4 |
+
"cell_type": "markdown",
|
| 5 |
+
"metadata": {},
|
| 6 |
+
"source": [
|
| 7 |
+
"# CYB003 Baseline Classifier — Inference Example\n",
|
| 8 |
+
"\n",
|
| 9 |
+
"End-to-end demo: load the trained XGBoost and PyTorch MLP models from the Hugging Face repo and predict the **malware execution phase** of a new per-timestep telemetry record.\n",
|
| 10 |
+
"\n",
|
| 11 |
+
"**Models predict one of 10 phases:** `c2_communication`, `data_exfiltration`, `dormancy_dwell`, `initial_drop`, `lateral_movement`, `payload_execution`, `persistence_establishment`, `privilege_escalation`, `sandbox_evasion_stall`, `self_destruct_cleanup`.\n",
|
| 12 |
+
"\n",
|
| 13 |
+
"**This is a baseline reference model**, not a production sandbox or EDR. See the model card for full metrics and limitations."
|
| 14 |
+
]
|
| 15 |
+
},
|
| 16 |
+
{
|
| 17 |
+
"cell_type": "markdown",
|
| 18 |
+
"metadata": {},
|
| 19 |
+
"source": [
|
| 20 |
+
"## 1. Install dependencies"
|
| 21 |
+
]
|
| 22 |
+
},
|
| 23 |
+
{
|
| 24 |
+
"cell_type": "code",
|
| 25 |
+
"execution_count": null,
|
| 26 |
+
"metadata": {},
|
| 27 |
+
"outputs": [],
|
| 28 |
+
"source": [
|
| 29 |
+
"%pip install --quiet xgboost torch safetensors pandas numpy huggingface_hub"
|
| 30 |
+
]
|
| 31 |
+
},
|
| 32 |
+
{
|
| 33 |
+
"cell_type": "markdown",
|
| 34 |
+
"metadata": {},
|
| 35 |
+
"source": [
|
| 36 |
+
"## 2. Download model artifacts from Hugging Face\n",
|
| 37 |
+
"\n",
|
| 38 |
+
"Five files are needed:\n",
|
| 39 |
+
"- `model_xgb.json` — XGBoost weights\n",
|
| 40 |
+
"- `model_mlp.safetensors` — PyTorch MLP weights\n",
|
| 41 |
+
"- `feature_engineering.py` — feature pipeline (must match training)\n",
|
| 42 |
+
"- `feature_meta.json` — feature column order + categorical levels\n",
|
| 43 |
+
"- `feature_scaler.json` — MLP input standardization"
|
| 44 |
+
]
|
| 45 |
+
},
|
| 46 |
+
{
|
| 47 |
+
"cell_type": "code",
|
| 48 |
+
"execution_count": null,
|
| 49 |
+
"metadata": {},
|
| 50 |
+
"outputs": [],
|
| 51 |
+
"source": [
|
| 52 |
+
"from huggingface_hub import hf_hub_download\n",
|
| 53 |
+
"\n",
|
| 54 |
+
"REPO_ID = \"xpertsystems/cyb003-baseline-classifier\"\n",
|
| 55 |
+
"\n",
|
| 56 |
+
"files = {}\n",
|
| 57 |
+
"for name in [\"model_xgb.json\", \"model_mlp.safetensors\",\n",
|
| 58 |
+
" \"feature_engineering.py\", \"feature_meta.json\",\n",
|
| 59 |
+
" \"feature_scaler.json\"]:\n",
|
| 60 |
+
" files[name] = hf_hub_download(repo_id=REPO_ID, filename=name)\n",
|
| 61 |
+
" print(f\" downloaded: {name}\")"
|
| 62 |
+
]
|
| 63 |
+
},
|
| 64 |
+
{
|
| 65 |
+
"cell_type": "code",
|
| 66 |
+
"execution_count": null,
|
| 67 |
+
"metadata": {},
|
| 68 |
+
"outputs": [],
|
| 69 |
+
"source": [
|
| 70 |
+
"import sys, os\n",
|
| 71 |
+
"fe_dir = os.path.dirname(files[\"feature_engineering.py\"])\n",
|
| 72 |
+
"if fe_dir not in sys.path:\n",
|
| 73 |
+
" sys.path.insert(0, fe_dir)\n",
|
| 74 |
+
"\n",
|
| 75 |
+
"from feature_engineering import transform_single, load_meta, INT_TO_LABEL"
|
| 76 |
+
]
|
| 77 |
+
},
|
| 78 |
+
{
|
| 79 |
+
"cell_type": "markdown",
|
| 80 |
+
"metadata": {},
|
| 81 |
+
"source": [
|
| 82 |
+
"## 3. Load models and metadata"
|
| 83 |
+
]
|
| 84 |
+
},
|
| 85 |
+
{
|
| 86 |
+
"cell_type": "code",
|
| 87 |
+
"execution_count": null,
|
| 88 |
+
"metadata": {},
|
| 89 |
+
"outputs": [],
|
| 90 |
+
"source": [
|
| 91 |
+
"import json\n",
|
| 92 |
+
"import numpy as np\n",
|
| 93 |
+
"import torch\n",
|
| 94 |
+
"import torch.nn as nn\n",
|
| 95 |
+
"import xgboost as xgb\n",
|
| 96 |
+
"from safetensors.torch import load_file\n",
|
| 97 |
+
"\n",
|
| 98 |
+
"meta = load_meta(files[\"feature_meta.json\"])\n",
|
| 99 |
+
"with open(files[\"feature_scaler.json\"]) as f:\n",
|
| 100 |
+
" scaler = json.load(f)\n",
|
| 101 |
+
"\n",
|
| 102 |
+
"N_FEATURES = len(meta[\"feature_names\"])\n",
|
| 103 |
+
"N_CLASSES = len(meta[\"int_to_label\"])\n",
|
| 104 |
+
"print(f\"feature count: {N_FEATURES}\")\n",
|
| 105 |
+
"print(f\"class count: {N_CLASSES}\")\n",
|
| 106 |
+
"print(f\"label classes: {list(meta['int_to_label'].values())}\")"
|
| 107 |
+
]
|
| 108 |
+
},
|
| 109 |
+
{
|
| 110 |
+
"cell_type": "code",
|
| 111 |
+
"execution_count": null,
|
| 112 |
+
"metadata": {},
|
| 113 |
+
"outputs": [],
|
| 114 |
+
"source": [
|
| 115 |
+
"# XGBoost\n",
|
| 116 |
+
"xgb_model = xgb.XGBClassifier()\n",
|
| 117 |
+
"xgb_model.load_model(files[\"model_xgb.json\"])\n",
|
| 118 |
+
"\n",
|
| 119 |
+
"# MLP architecture (must match training)\n",
|
| 120 |
+
"class PhaseMLP(nn.Module):\n",
|
| 121 |
+
" def __init__(self, n_features, n_classes=10, hidden1=128, hidden2=64, dropout=0.3):\n",
|
| 122 |
+
" super().__init__()\n",
|
| 123 |
+
" self.net = nn.Sequential(\n",
|
| 124 |
+
" nn.Linear(n_features, hidden1),\n",
|
| 125 |
+
" nn.BatchNorm1d(hidden1),\n",
|
| 126 |
+
" nn.ReLU(),\n",
|
| 127 |
+
" nn.Dropout(dropout),\n",
|
| 128 |
+
" nn.Linear(hidden1, hidden2),\n",
|
| 129 |
+
" nn.BatchNorm1d(hidden2),\n",
|
| 130 |
+
" nn.ReLU(),\n",
|
| 131 |
+
" nn.Dropout(dropout),\n",
|
| 132 |
+
" nn.Linear(hidden2, n_classes),\n",
|
| 133 |
+
" )\n",
|
| 134 |
+
" def forward(self, x):\n",
|
| 135 |
+
" return self.net(x)\n",
|
| 136 |
+
"\n",
|
| 137 |
+
"mlp_model = PhaseMLP(N_FEATURES, n_classes=N_CLASSES)\n",
|
| 138 |
+
"mlp_model.load_state_dict(load_file(files[\"model_mlp.safetensors\"]))\n",
|
| 139 |
+
"mlp_model.eval()\n",
|
| 140 |
+
"print(\"models loaded\")"
|
| 141 |
+
]
|
| 142 |
+
},
|
| 143 |
+
{
|
| 144 |
+
"cell_type": "markdown",
|
| 145 |
+
"metadata": {},
|
| 146 |
+
"source": [
|
| 147 |
+
"## 4. Prediction helper"
|
| 148 |
+
]
|
| 149 |
+
},
|
| 150 |
+
{
|
| 151 |
+
"cell_type": "code",
|
| 152 |
+
"execution_count": null,
|
| 153 |
+
"metadata": {},
|
| 154 |
+
"outputs": [],
|
| 155 |
+
"source": [
|
| 156 |
+
"MU = np.array(scaler[\"mean\"], dtype=np.float32)\n",
|
| 157 |
+
"SD = np.array(scaler[\"std\"], dtype=np.float32)\n",
|
| 158 |
+
"\n",
|
| 159 |
+
"def predict_phase(record: dict) -> dict:\n",
|
| 160 |
+
" \"\"\"Predict the execution phase for one per-timestep telemetry record.\n",
|
| 161 |
+
"\n",
|
| 162 |
+
" Returns a dict with both models' predictions and per-class probabilities.\n",
|
| 163 |
+
" \"\"\"\n",
|
| 164 |
+
" X = transform_single(record, meta)\n",
|
| 165 |
+
"\n",
|
| 166 |
+
" xgb_proba = xgb_model.predict_proba(X)[0]\n",
|
| 167 |
+
" xgb_label = INT_TO_LABEL[int(np.argmax(xgb_proba))]\n",
|
| 168 |
+
"\n",
|
| 169 |
+
" Xs = ((X - MU) / SD).astype(np.float32)\n",
|
| 170 |
+
" with torch.no_grad():\n",
|
| 171 |
+
" logits = mlp_model(torch.tensor(Xs))\n",
|
| 172 |
+
" mlp_proba = torch.softmax(logits, dim=1).numpy()[0]\n",
|
| 173 |
+
" mlp_label = INT_TO_LABEL[int(np.argmax(mlp_proba))]\n",
|
| 174 |
+
"\n",
|
| 175 |
+
" return {\n",
|
| 176 |
+
" \"xgboost\": {\n",
|
| 177 |
+
" \"label\": xgb_label,\n",
|
| 178 |
+
" \"probabilities\": {INT_TO_LABEL[i]: float(p) for i, p in enumerate(xgb_proba)},\n",
|
| 179 |
+
" },\n",
|
| 180 |
+
" \"mlp\": {\n",
|
| 181 |
+
" \"label\": mlp_label,\n",
|
| 182 |
+
" \"probabilities\": {INT_TO_LABEL[i]: float(p) for i, p in enumerate(mlp_proba)},\n",
|
| 183 |
+
" },\n",
|
| 184 |
+
" }"
|
| 185 |
+
]
|
| 186 |
+
},
|
| 187 |
+
{
|
| 188 |
+
"cell_type": "markdown",
|
| 189 |
+
"metadata": {},
|
| 190 |
+
"source": [
|
| 191 |
+
"## 5. Run on an example record\n",
|
| 192 |
+
"\n",
|
| 193 |
+
"Real `lateral_movement` event lifted from the sample dataset: an APT-tier cryptominer at timestep 26 propagating laterally with 2 propagation events and 10 network connections. Both models should predict `lateral_movement`."
|
| 194 |
+
]
|
| 195 |
+
},
|
| 196 |
+
{
|
| 197 |
+
"cell_type": "code",
|
| 198 |
+
"execution_count": null,
|
| 199 |
+
"metadata": {},
|
| 200 |
+
"outputs": [],
|
| 201 |
+
"source": [
|
| 202 |
+
"# Real timestep record from the sample dataset (true phase: lateral_movement)\n",
|
| 203 |
+
"example_record = {\n",
|
| 204 |
+
" \"timestep\": 26,\n",
|
| 205 |
+
" \"malware_family\": \"cryptominer\",\n",
|
| 206 |
+
" \"threat_actor_tier\": \"apt\",\n",
|
| 207 |
+
" \"target_platform\": \"windows_10_enterprise\",\n",
|
| 208 |
+
" \"obfuscation_technique\": \"code_signing_abuse\",\n",
|
| 209 |
+
" \"api_call_rate\": 1.4167,\n",
|
| 210 |
+
" \"registry_write_count\": 0,\n",
|
| 211 |
+
" \"network_connection_count\": 10,\n",
|
| 212 |
+
" \"process_injection_flag\": 1,\n",
|
| 213 |
+
" \"c2_beacon_interval_sec\": 0.0,\n",
|
| 214 |
+
" \"detection_outcome\": \"signature_miss\",\n",
|
| 215 |
+
" \"av_signature_hit_flag\": 0,\n",
|
| 216 |
+
" \"sandbox_evasion_flag\": 0,\n",
|
| 217 |
+
" \"lateral_propagation_count\": 2,\n",
|
| 218 |
+
" \"privilege_escalation_flag\": 0,\n",
|
| 219 |
+
" \"ep_stack\": \"deception_honeypot\",\n",
|
| 220 |
+
" \"pe_entropy_mean\": 0.8336,\n",
|
| 221 |
+
" \"pe_entropy_std\": 0.25,\n",
|
| 222 |
+
" \"import_hash_cluster\": 498,\n",
|
| 223 |
+
" \"section_count\": 2,\n",
|
| 224 |
+
" \"packed_section_ratio\": 0.7558,\n",
|
| 225 |
+
" \"string_entropy_mean\": 0.5727,\n",
|
| 226 |
+
" \"byte_histogram_chi2\": 45.52,\n",
|
| 227 |
+
" \"code_section_rx_ratio\": 0.3628,\n",
|
| 228 |
+
" \"resource_section_entropy\": 0.4418,\n",
|
| 229 |
+
" \"suspicious_import_count\": 11,\n",
|
| 230 |
+
" \"packer_detected_flag\": 1,\n",
|
| 231 |
+
"}\n",
|
| 232 |
+
"\n",
|
| 233 |
+
"result = predict_phase(example_record)\n",
|
| 234 |
+
"\n",
|
| 235 |
+
"print(f\"XGBoost -> {result['xgboost']['label']}\")\n",
|
| 236 |
+
"for lbl, p in sorted(result['xgboost']['probabilities'].items(), key=lambda x: -x[1])[:5]:\n",
|
| 237 |
+
" print(f\" P({lbl:30s}) = {p:.4f}\")\n",
|
| 238 |
+
"\n",
|
| 239 |
+
"print(f\"\\nMLP -> {result['mlp']['label']}\")\n",
|
| 240 |
+
"for lbl, p in sorted(result['mlp']['probabilities'].items(), key=lambda x: -x[1])[:5]:\n",
|
| 241 |
+
" print(f\" P({lbl:30s}) = {p:.4f}\")"
|
| 242 |
+
]
|
| 243 |
+
},
|
| 244 |
+
{
|
| 245 |
+
"cell_type": "markdown",
|
| 246 |
+
"metadata": {},
|
| 247 |
+
"source": [
|
| 248 |
+
"### Note: when the two models disagree\n",
|
| 249 |
+
"\n",
|
| 250 |
+
"XGBoost and the MLP can disagree on records far from the training-data manifold or in the three phases the baseline finds genuinely hard (`dormancy_dwell`, `sandbox_evasion_stall`, `self_destruct_cleanup`, each spanning the full timestep range). Disagreement is a useful signal: hand those cases to a human analyst or to a more expensive sequence-based detector."
|
| 251 |
+
]
|
| 252 |
+
},
|
| 253 |
+
{
|
| 254 |
+
"cell_type": "markdown",
|
| 255 |
+
"metadata": {},
|
| 256 |
+
"source": [
|
| 257 |
+
"## 6. Batch prediction on the sample dataset"
|
| 258 |
+
]
|
| 259 |
+
},
|
| 260 |
+
{
|
| 261 |
+
"cell_type": "code",
|
| 262 |
+
"execution_count": null,
|
| 263 |
+
"metadata": {},
|
| 264 |
+
"outputs": [],
|
| 265 |
+
"source": [
|
| 266 |
+
"from huggingface_hub import snapshot_download\n",
|
| 267 |
+
"import pandas as pd\n",
|
| 268 |
+
"\n",
|
| 269 |
+
"ds_path = snapshot_download(repo_id=\"xpertsystems/cyb003-sample\", repo_type=\"dataset\")\n",
|
| 270 |
+
"samples = pd.read_csv(f\"{ds_path}/malware_samples.csv\")\n",
|
| 271 |
+
"\n",
|
| 272 |
+
"# Score the first 200 timesteps\n",
|
| 273 |
+
"sample = samples.head(200).copy()\n",
|
| 274 |
+
"preds = [predict_phase(row.to_dict())[\"xgboost\"][\"label\"] for _, row in sample.iterrows()]\n",
|
| 275 |
+
"sample[\"xgb_pred\"] = preds\n",
|
| 276 |
+
"\n",
|
| 277 |
+
"ct = pd.crosstab(sample[\"execution_phase\"], sample[\"xgb_pred\"],\n",
|
| 278 |
+
" rownames=[\"true\"], colnames=[\"pred\"])\n",
|
| 279 |
+
"print(\"Confusion on first 200 sample rows (XGBoost):\")\n",
|
| 280 |
+
"print(ct)\n",
|
| 281 |
+
"acc = (sample[\"execution_phase\"] == sample[\"xgb_pred\"]).mean()\n",
|
| 282 |
+
"print(f\"\\nbatch accuracy on first 200 rows (in-distribution): {acc:.4f}\")\n",
|
| 283 |
+
"print(\"\\nNote: these rows include training-set samples. See validation_results.json\\n\"\n",
|
| 284 |
+
" \"for proper held-out test metrics from disjoint samples.\")"
|
| 285 |
+
]
|
| 286 |
+
},
|
| 287 |
+
{
|
| 288 |
+
"cell_type": "markdown",
|
| 289 |
+
"metadata": {},
|
| 290 |
+
"source": [
|
| 291 |
+
"## 7. Next steps\n",
|
| 292 |
+
"\n",
|
| 293 |
+
"- See `validation_results.json` for held-out test metrics (15 disjoint samples, 900 timesteps).\n",
|
| 294 |
+
"- See `multi_seed_results.json` for the across-10-seeds robustness picture (accuracy 0.905 ± 0.010).\n",
|
| 295 |
+
"- See `ablation_results.json` for per-feature-group contribution. `timestep` carries the dominant signal — kill chains progress in time, malware execution does too.\n",
|
| 296 |
+
"- The model card's **Limitations** section explains why `dormancy_dwell`, `sandbox_evasion_stall`, and `self_destruct_cleanup` are hard.\n",
|
| 297 |
+
"- For the full 280k-row CYB003 dataset and commercial licensing, contact **pradeep@xpertsystems.ai**."
|
| 298 |
+
]
|
| 299 |
+
}
|
| 300 |
+
],
|
| 301 |
+
"metadata": {
|
| 302 |
+
"kernelspec": {
|
| 303 |
+
"display_name": "Python 3",
|
| 304 |
+
"language": "python",
|
| 305 |
+
"name": "python3"
|
| 306 |
+
},
|
| 307 |
+
"language_info": {
|
| 308 |
+
"name": "python",
|
| 309 |
+
"version": "3.10"
|
| 310 |
+
}
|
| 311 |
+
},
|
| 312 |
+
"nbformat": 4,
|
| 313 |
+
"nbformat_minor": 5
|
| 314 |
+
}
|
model_mlp.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5137ad720cf14877439db2fe50e5df589c6e2cbcc7598cc332548922bd5f8369
|
| 3 |
+
size 75760
|
model_xgb.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
multi_seed_results.json
ADDED
|
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"purpose": "With n=100 samples and 10 classes, single-seed metrics carry test-fold variance. Multi-seed evaluation gives a more reliable performance picture.",
|
| 3 |
+
"seeds_evaluated": [
|
| 4 |
+
42,
|
| 5 |
+
7,
|
| 6 |
+
13,
|
| 7 |
+
17,
|
| 8 |
+
23,
|
| 9 |
+
31,
|
| 10 |
+
45,
|
| 11 |
+
99,
|
| 12 |
+
123,
|
| 13 |
+
200
|
| 14 |
+
],
|
| 15 |
+
"per_seed": [
|
| 16 |
+
{
|
| 17 |
+
"seed": 42,
|
| 18 |
+
"test_n_classes": 10,
|
| 19 |
+
"accuracy": 0.9177777777777778,
|
| 20 |
+
"macro_f1": 0.7780699645112974,
|
| 21 |
+
"macro_roc_auc_ovr": 0.979171667321058
|
| 22 |
+
},
|
| 23 |
+
{
|
| 24 |
+
"seed": 7,
|
| 25 |
+
"test_n_classes": 10,
|
| 26 |
+
"accuracy": 0.8988888888888888,
|
| 27 |
+
"macro_f1": 0.7959031264581272,
|
| 28 |
+
"macro_roc_auc_ovr": 0.9762003477988086
|
| 29 |
+
},
|
| 30 |
+
{
|
| 31 |
+
"seed": 13,
|
| 32 |
+
"test_n_classes": 10,
|
| 33 |
+
"accuracy": 0.9077777777777778,
|
| 34 |
+
"macro_f1": 0.7844193419282306,
|
| 35 |
+
"macro_roc_auc_ovr": 0.9756039083537456
|
| 36 |
+
},
|
| 37 |
+
{
|
| 38 |
+
"seed": 17,
|
| 39 |
+
"test_n_classes": 10,
|
| 40 |
+
"accuracy": 0.9055555555555556,
|
| 41 |
+
"macro_f1": 0.7793567708150484,
|
| 42 |
+
"macro_roc_auc_ovr": 0.9725864270053698
|
| 43 |
+
},
|
| 44 |
+
{
|
| 45 |
+
"seed": 23,
|
| 46 |
+
"test_n_classes": 10,
|
| 47 |
+
"accuracy": 0.9011111111111111,
|
| 48 |
+
"macro_f1": 0.7669056364325609,
|
| 49 |
+
"macro_roc_auc_ovr": 0.9731577510354572
|
| 50 |
+
},
|
| 51 |
+
{
|
| 52 |
+
"seed": 31,
|
| 53 |
+
"test_n_classes": 10,
|
| 54 |
+
"accuracy": 0.9055555555555556,
|
| 55 |
+
"macro_f1": 0.7825811291140096,
|
| 56 |
+
"macro_roc_auc_ovr": 0.9757878099386051
|
| 57 |
+
},
|
| 58 |
+
{
|
| 59 |
+
"seed": 45,
|
| 60 |
+
"test_n_classes": 10,
|
| 61 |
+
"accuracy": 0.9211111111111111,
|
| 62 |
+
"macro_f1": 0.8065645535880511,
|
| 63 |
+
"macro_roc_auc_ovr": 0.9754272516460774
|
| 64 |
+
},
|
| 65 |
+
{
|
| 66 |
+
"seed": 99,
|
| 67 |
+
"test_n_classes": 10,
|
| 68 |
+
"accuracy": 0.8822222222222222,
|
| 69 |
+
"macro_f1": 0.7589855352578547,
|
| 70 |
+
"macro_roc_auc_ovr": 0.9722896806606615
|
| 71 |
+
},
|
| 72 |
+
{
|
| 73 |
+
"seed": 123,
|
| 74 |
+
"test_n_classes": 10,
|
| 75 |
+
"accuracy": 0.9088888888888889,
|
| 76 |
+
"macro_f1": 0.7938334664931561,
|
| 77 |
+
"macro_roc_auc_ovr": 0.9790976919379577
|
| 78 |
+
},
|
| 79 |
+
{
|
| 80 |
+
"seed": 200,
|
| 81 |
+
"test_n_classes": 10,
|
| 82 |
+
"accuracy": 0.8977777777777778,
|
| 83 |
+
"macro_f1": 0.7938099428748325,
|
| 84 |
+
"macro_roc_auc_ovr": 0.9734976569094487
|
| 85 |
+
}
|
| 86 |
+
],
|
| 87 |
+
"aggregate": {
|
| 88 |
+
"accuracy_mean": 0.9046666666666667,
|
| 89 |
+
"accuracy_std": 0.010337514088544894,
|
| 90 |
+
"accuracy_min": 0.8822222222222222,
|
| 91 |
+
"accuracy_max": 0.9211111111111111,
|
| 92 |
+
"macro_f1_mean": 0.7840429467473169,
|
| 93 |
+
"macro_f1_std": 0.013493004664905476,
|
| 94 |
+
"roc_auc_mean": 0.9752820192607189,
|
| 95 |
+
"roc_auc_std": 0.0023415667609269276
|
| 96 |
+
},
|
| 97 |
+
"published_artifact_seed": 42
|
| 98 |
+
}
|
validation_results.json
ADDED
|
@@ -0,0 +1,378 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"version": "1.0.0",
|
| 3 |
+
"dataset": "xpertsystems/cyb003-sample",
|
| 4 |
+
"task": "10-class execution_phase classification",
|
| 5 |
+
"baselines": {
|
| 6 |
+
"always_predict_majority_accuracy": 0.13666666666666666,
|
| 7 |
+
"majority_class": "initial_drop",
|
| 8 |
+
"random_guess_accuracy": 0.1
|
| 9 |
+
},
|
| 10 |
+
"split": {
|
| 11 |
+
"strategy": "group_aware (GroupShuffleSplit by sample_id, nested)",
|
| 12 |
+
"rationale": "100 unique malware samples generate 6,000 timesteps (60 per sample). Random row-split would leak per-sample correlations into the test fold. Group-aware split keeps train/val/test samples disjoint.",
|
| 13 |
+
"samples_train": 69,
|
| 14 |
+
"samples_val": 16,
|
| 15 |
+
"samples_test": 15,
|
| 16 |
+
"timesteps_train": 4140,
|
| 17 |
+
"timesteps_val": 960,
|
| 18 |
+
"timesteps_test": 900,
|
| 19 |
+
"seed": 42
|
| 20 |
+
},
|
| 21 |
+
"n_features": 69,
|
| 22 |
+
"label_classes": [
|
| 23 |
+
"c2_communication",
|
| 24 |
+
"data_exfiltration",
|
| 25 |
+
"dormancy_dwell",
|
| 26 |
+
"initial_drop",
|
| 27 |
+
"lateral_movement",
|
| 28 |
+
"payload_execution",
|
| 29 |
+
"persistence_establishment",
|
| 30 |
+
"privilege_escalation",
|
| 31 |
+
"sandbox_evasion_stall",
|
| 32 |
+
"self_destruct_cleanup"
|
| 33 |
+
],
|
| 34 |
+
"class_distribution_train": {
|
| 35 |
+
"lateral_movement": 550,
|
| 36 |
+
"initial_drop": 549,
|
| 37 |
+
"data_exfiltration": 543,
|
| 38 |
+
"persistence_establishment": 541,
|
| 39 |
+
"c2_communication": 492,
|
| 40 |
+
"privilege_escalation": 489,
|
| 41 |
+
"payload_execution": 487,
|
| 42 |
+
"dormancy_dwell": 168,
|
| 43 |
+
"sandbox_evasion_stall": 166,
|
| 44 |
+
"self_destruct_cleanup": 155
|
| 45 |
+
},
|
| 46 |
+
"class_distribution_test": {
|
| 47 |
+
"initial_drop": 123,
|
| 48 |
+
"persistence_establishment": 122,
|
| 49 |
+
"lateral_movement": 121,
|
| 50 |
+
"data_exfiltration": 113,
|
| 51 |
+
"c2_communication": 108,
|
| 52 |
+
"privilege_escalation": 107,
|
| 53 |
+
"payload_execution": 106,
|
| 54 |
+
"dormancy_dwell": 40,
|
| 55 |
+
"sandbox_evasion_stall": 32,
|
| 56 |
+
"self_destruct_cleanup": 28
|
| 57 |
+
},
|
| 58 |
+
"models": {
|
| 59 |
+
"xgboost": {
|
| 60 |
+
"architecture": "Gradient-boosted decision trees, multi:softprob, 10 classes",
|
| 61 |
+
"framework": "xgboost",
|
| 62 |
+
"test_metrics": {
|
| 63 |
+
"model": "xgboost",
|
| 64 |
+
"accuracy": 0.9177777777777778,
|
| 65 |
+
"macro_f1": 0.7780699645112974,
|
| 66 |
+
"weighted_f1": 0.9064879129227142,
|
| 67 |
+
"per_class_f1": {
|
| 68 |
+
"c2_communication": 1.0,
|
| 69 |
+
"data_exfiltration": 0.9699570815450643,
|
| 70 |
+
"dormancy_dwell": 0.5301204819277109,
|
| 71 |
+
"initial_drop": 0.9453125,
|
| 72 |
+
"lateral_movement": 0.9917355371900827,
|
| 73 |
+
"payload_execution": 0.963302752293578,
|
| 74 |
+
"persistence_establishment": 0.9918032786885246,
|
| 75 |
+
"privilege_escalation": 0.9907407407407407,
|
| 76 |
+
"sandbox_evasion_stall": 0.125,
|
| 77 |
+
"self_destruct_cleanup": 0.2727272727272727
|
| 78 |
+
},
|
| 79 |
+
"confusion_matrix": {
|
| 80 |
+
"labels": [
|
| 81 |
+
"c2_communication",
|
| 82 |
+
"data_exfiltration",
|
| 83 |
+
"dormancy_dwell",
|
| 84 |
+
"initial_drop",
|
| 85 |
+
"lateral_movement",
|
| 86 |
+
"payload_execution",
|
| 87 |
+
"persistence_establishment",
|
| 88 |
+
"privilege_escalation",
|
| 89 |
+
"sandbox_evasion_stall",
|
| 90 |
+
"self_destruct_cleanup"
|
| 91 |
+
],
|
| 92 |
+
"matrix": [
|
| 93 |
+
[
|
| 94 |
+
108,
|
| 95 |
+
0,
|
| 96 |
+
0,
|
| 97 |
+
0,
|
| 98 |
+
0,
|
| 99 |
+
0,
|
| 100 |
+
0,
|
| 101 |
+
0,
|
| 102 |
+
0,
|
| 103 |
+
0
|
| 104 |
+
],
|
| 105 |
+
[
|
| 106 |
+
0,
|
| 107 |
+
113,
|
| 108 |
+
0,
|
| 109 |
+
0,
|
| 110 |
+
0,
|
| 111 |
+
0,
|
| 112 |
+
0,
|
| 113 |
+
0,
|
| 114 |
+
0,
|
| 115 |
+
0
|
| 116 |
+
],
|
| 117 |
+
[
|
| 118 |
+
0,
|
| 119 |
+
4,
|
| 120 |
+
22,
|
| 121 |
+
7,
|
| 122 |
+
0,
|
| 123 |
+
1,
|
| 124 |
+
0,
|
| 125 |
+
0,
|
| 126 |
+
2,
|
| 127 |
+
4
|
| 128 |
+
],
|
| 129 |
+
[
|
| 130 |
+
0,
|
| 131 |
+
0,
|
| 132 |
+
2,
|
| 133 |
+
121,
|
| 134 |
+
0,
|
| 135 |
+
0,
|
| 136 |
+
0,
|
| 137 |
+
0,
|
| 138 |
+
0,
|
| 139 |
+
0
|
| 140 |
+
],
|
| 141 |
+
[
|
| 142 |
+
0,
|
| 143 |
+
0,
|
| 144 |
+
0,
|
| 145 |
+
0,
|
| 146 |
+
120,
|
| 147 |
+
0,
|
| 148 |
+
0,
|
| 149 |
+
0,
|
| 150 |
+
0,
|
| 151 |
+
1
|
| 152 |
+
],
|
| 153 |
+
[
|
| 154 |
+
0,
|
| 155 |
+
0,
|
| 156 |
+
1,
|
| 157 |
+
0,
|
| 158 |
+
0,
|
| 159 |
+
105,
|
| 160 |
+
0,
|
| 161 |
+
0,
|
| 162 |
+
0,
|
| 163 |
+
0
|
| 164 |
+
],
|
| 165 |
+
[
|
| 166 |
+
0,
|
| 167 |
+
0,
|
| 168 |
+
1,
|
| 169 |
+
0,
|
| 170 |
+
0,
|
| 171 |
+
0,
|
| 172 |
+
121,
|
| 173 |
+
0,
|
| 174 |
+
0,
|
| 175 |
+
0
|
| 176 |
+
],
|
| 177 |
+
[
|
| 178 |
+
0,
|
| 179 |
+
0,
|
| 180 |
+
0,
|
| 181 |
+
0,
|
| 182 |
+
0,
|
| 183 |
+
0,
|
| 184 |
+
0,
|
| 185 |
+
107,
|
| 186 |
+
0,
|
| 187 |
+
0
|
| 188 |
+
],
|
| 189 |
+
[
|
| 190 |
+
0,
|
| 191 |
+
0,
|
| 192 |
+
17,
|
| 193 |
+
3,
|
| 194 |
+
0,
|
| 195 |
+
1,
|
| 196 |
+
1,
|
| 197 |
+
2,
|
| 198 |
+
3,
|
| 199 |
+
5
|
| 200 |
+
],
|
| 201 |
+
[
|
| 202 |
+
0,
|
| 203 |
+
3,
|
| 204 |
+
0,
|
| 205 |
+
2,
|
| 206 |
+
1,
|
| 207 |
+
5,
|
| 208 |
+
0,
|
| 209 |
+
0,
|
| 210 |
+
11,
|
| 211 |
+
6
|
| 212 |
+
]
|
| 213 |
+
]
|
| 214 |
+
},
|
| 215 |
+
"macro_roc_auc_ovr": 0.979171667321058
|
| 216 |
+
}
|
| 217 |
+
},
|
| 218 |
+
"mlp": {
|
| 219 |
+
"architecture": "PyTorch MLP, 69 -> 128 -> 64 -> 10, BatchNorm1d + ReLU + Dropout, weighted cross-entropy loss",
|
| 220 |
+
"framework": "pytorch",
|
| 221 |
+
"test_metrics": {
|
| 222 |
+
"model": "mlp",
|
| 223 |
+
"accuracy": 0.8222222222222222,
|
| 224 |
+
"macro_f1": 0.7071652710164154,
|
| 225 |
+
"weighted_f1": 0.8217291149270296,
|
| 226 |
+
"per_class_f1": {
|
| 227 |
+
"c2_communication": 1.0,
|
| 228 |
+
"data_exfiltration": 0.9181818181818182,
|
| 229 |
+
"dormancy_dwell": 0.5194805194805194,
|
| 230 |
+
"initial_drop": 0.8854961832061069,
|
| 231 |
+
"lateral_movement": 0.9067796610169492,
|
| 232 |
+
"payload_execution": 0.6981132075471698,
|
| 233 |
+
"persistence_establishment": 0.8695652173913043,
|
| 234 |
+
"privilege_escalation": 0.9154228855721394,
|
| 235 |
+
"sandbox_evasion_stall": 0.07692307692307693,
|
| 236 |
+
"self_destruct_cleanup": 0.28169014084507044
|
| 237 |
+
},
|
| 238 |
+
"confusion_matrix": {
|
| 239 |
+
"labels": [
|
| 240 |
+
"c2_communication",
|
| 241 |
+
"data_exfiltration",
|
| 242 |
+
"dormancy_dwell",
|
| 243 |
+
"initial_drop",
|
| 244 |
+
"lateral_movement",
|
| 245 |
+
"payload_execution",
|
| 246 |
+
"persistence_establishment",
|
| 247 |
+
"privilege_escalation",
|
| 248 |
+
"sandbox_evasion_stall",
|
| 249 |
+
"self_destruct_cleanup"
|
| 250 |
+
],
|
| 251 |
+
"matrix": [
|
| 252 |
+
[
|
| 253 |
+
108,
|
| 254 |
+
0,
|
| 255 |
+
0,
|
| 256 |
+
0,
|
| 257 |
+
0,
|
| 258 |
+
0,
|
| 259 |
+
0,
|
| 260 |
+
0,
|
| 261 |
+
0,
|
| 262 |
+
0
|
| 263 |
+
],
|
| 264 |
+
[
|
| 265 |
+
0,
|
| 266 |
+
101,
|
| 267 |
+
0,
|
| 268 |
+
0,
|
| 269 |
+
6,
|
| 270 |
+
3,
|
| 271 |
+
0,
|
| 272 |
+
0,
|
| 273 |
+
0,
|
| 274 |
+
3
|
| 275 |
+
],
|
| 276 |
+
[
|
| 277 |
+
0,
|
| 278 |
+
1,
|
| 279 |
+
20,
|
| 280 |
+
5,
|
| 281 |
+
0,
|
| 282 |
+
7,
|
| 283 |
+
0,
|
| 284 |
+
0,
|
| 285 |
+
4,
|
| 286 |
+
3
|
| 287 |
+
],
|
| 288 |
+
[
|
| 289 |
+
0,
|
| 290 |
+
0,
|
| 291 |
+
3,
|
| 292 |
+
116,
|
| 293 |
+
0,
|
| 294 |
+
0,
|
| 295 |
+
4,
|
| 296 |
+
0,
|
| 297 |
+
0,
|
| 298 |
+
0
|
| 299 |
+
],
|
| 300 |
+
[
|
| 301 |
+
0,
|
| 302 |
+
2,
|
| 303 |
+
0,
|
| 304 |
+
0,
|
| 305 |
+
107,
|
| 306 |
+
7,
|
| 307 |
+
0,
|
| 308 |
+
0,
|
| 309 |
+
3,
|
| 310 |
+
2
|
| 311 |
+
],
|
| 312 |
+
[
|
| 313 |
+
0,
|
| 314 |
+
1,
|
| 315 |
+
0,
|
| 316 |
+
0,
|
| 317 |
+
2,
|
| 318 |
+
74,
|
| 319 |
+
1,
|
| 320 |
+
0,
|
| 321 |
+
9,
|
| 322 |
+
19
|
| 323 |
+
],
|
| 324 |
+
[
|
| 325 |
+
0,
|
| 326 |
+
0,
|
| 327 |
+
2,
|
| 328 |
+
7,
|
| 329 |
+
0,
|
| 330 |
+
0,
|
| 331 |
+
110,
|
| 332 |
+
2,
|
| 333 |
+
1,
|
| 334 |
+
0
|
| 335 |
+
],
|
| 336 |
+
[
|
| 337 |
+
0,
|
| 338 |
+
0,
|
| 339 |
+
0,
|
| 340 |
+
0,
|
| 341 |
+
0,
|
| 342 |
+
2,
|
| 343 |
+
13,
|
| 344 |
+
92,
|
| 345 |
+
0,
|
| 346 |
+
0
|
| 347 |
+
],
|
| 348 |
+
[
|
| 349 |
+
0,
|
| 350 |
+
1,
|
| 351 |
+
12,
|
| 352 |
+
7,
|
| 353 |
+
0,
|
| 354 |
+
3,
|
| 355 |
+
1,
|
| 356 |
+
0,
|
| 357 |
+
2,
|
| 358 |
+
6
|
| 359 |
+
],
|
| 360 |
+
[
|
| 361 |
+
0,
|
| 362 |
+
1,
|
| 363 |
+
0,
|
| 364 |
+
4,
|
| 365 |
+
0,
|
| 366 |
+
10,
|
| 367 |
+
2,
|
| 368 |
+
0,
|
| 369 |
+
1,
|
| 370 |
+
10
|
| 371 |
+
]
|
| 372 |
+
]
|
| 373 |
+
},
|
| 374 |
+
"macro_roc_auc_ovr": 0.9680976851704761
|
| 375 |
+
}
|
| 376 |
+
}
|
| 377 |
+
}
|
| 378 |
+
}
|