Initial release: XGBoost + MLP for phishing campaign-phase classification
Browse files- README.md +455 -0
- ablation_results.json +489 -0
- feature_engineering.py +341 -0
- feature_meta.json +149 -0
- feature_scaler.json +1 -0
- inference_example.ipynb +320 -0
- model_mlp.safetensors +3 -0
- model_xgb.json +0 -0
- multi_seed_results.json +98 -0
- validation_results.json +246 -0
README.md
ADDED
|
@@ -0,0 +1,455 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-4.0
|
| 3 |
+
library_name: pytorch
|
| 4 |
+
tags:
|
| 5 |
+
- cybersecurity
|
| 6 |
+
- phishing
|
| 7 |
+
- email-security
|
| 8 |
+
- bec
|
| 9 |
+
- social-engineering
|
| 10 |
+
- tabular-classification
|
| 11 |
+
- synthetic-data
|
| 12 |
+
- xgboost
|
| 13 |
+
- baseline
|
| 14 |
+
pipeline_tag: tabular-classification
|
| 15 |
+
base_model: []
|
| 16 |
+
datasets:
|
| 17 |
+
- xpertsystems/cyb004-sample
|
| 18 |
+
metrics:
|
| 19 |
+
- accuracy
|
| 20 |
+
- f1
|
| 21 |
+
- roc_auc
|
| 22 |
+
model-index:
|
| 23 |
+
- name: cyb004-baseline-classifier
|
| 24 |
+
results:
|
| 25 |
+
- task:
|
| 26 |
+
type: tabular-classification
|
| 27 |
+
name: 7-class phishing campaign phase classification
|
| 28 |
+
dataset:
|
| 29 |
+
type: xpertsystems/cyb004-sample
|
| 30 |
+
name: CYB004 Synthetic Phishing Campaign Dataset (Sample)
|
| 31 |
+
metrics:
|
| 32 |
+
- type: roc_auc
|
| 33 |
+
value: 0.9356
|
| 34 |
+
name: Test macro ROC-AUC OvR (XGBoost, seed 42)
|
| 35 |
+
- type: accuracy
|
| 36 |
+
value: 0.6547
|
| 37 |
+
name: Test accuracy (XGBoost, seed 42)
|
| 38 |
+
- type: f1
|
| 39 |
+
value: 0.6401
|
| 40 |
+
name: Test macro-F1 (XGBoost, seed 42)
|
| 41 |
+
- type: accuracy
|
| 42 |
+
value: 0.649
|
| 43 |
+
name: Multi-seed accuracy mean ± 0.038 (XGBoost, 10 seeds)
|
| 44 |
+
- type: roc_auc
|
| 45 |
+
value: 0.937
|
| 46 |
+
name: Multi-seed ROC-AUC mean ± 0.010 (XGBoost, 10 seeds)
|
| 47 |
+
- type: roc_auc
|
| 48 |
+
value: 0.9265
|
| 49 |
+
name: Test macro ROC-AUC OvR (MLP, seed 42)
|
| 50 |
+
- type: accuracy
|
| 51 |
+
value: 0.6427
|
| 52 |
+
name: Test accuracy (MLP, seed 42)
|
| 53 |
+
- type: f1
|
| 54 |
+
value: 0.6275
|
| 55 |
+
name: Test macro-F1 (MLP, seed 42)
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
# CYB004 Baseline Classifier
|
| 59 |
+
|
| 60 |
+
**Phishing campaign phase classifier trained on the CYB004 synthetic
|
| 61 |
+
phishing campaign sample. Predicts which of 7 lifecycle phases a
|
| 62 |
+
per-timestep telemetry record belongs to, from observable trajectory
|
| 63 |
+
and victim-topology features.**
|
| 64 |
+
|
| 65 |
+
> **Baseline reference, not for production use.** This model demonstrates
|
| 66 |
+
> that the [CYB004 sample dataset](https://huggingface.co/datasets/xpertsystems/cyb004-sample)
|
| 67 |
+
> is learnable end-to-end and gives prospective buyers a working starting
|
| 68 |
+
> point. It is not a production email-security platform, SOAR component,
|
| 69 |
+
> or threat detector. See [Limitations](#limitations).
|
| 70 |
+
|
| 71 |
+
## Model overview
|
| 72 |
+
|
| 73 |
+
| Property | Value |
|
| 74 |
+
|---|---|
|
| 75 |
+
| Task | 7-class campaign_phase classification |
|
| 76 |
+
| Training data | `xpertsystems/cyb004-sample` (3,952 timesteps across 100 phishing campaigns) |
|
| 77 |
+
| Models | XGBoost + PyTorch MLP |
|
| 78 |
+
| Input features | 53 (after one-hot encoding) |
|
| 79 |
+
| Split | **Group-aware by campaign_id** (disjoint train/val/test campaigns) |
|
| 80 |
+
| Validation | Single seed (artifact) + multi-seed aggregate across 10 seeds |
|
| 81 |
+
| License | CC-BY-NC-4.0 (matches dataset) |
|
| 82 |
+
| Status | Reference baseline |
|
| 83 |
+
|
| 84 |
+
## Why this task instead of actor-tier attribution?
|
| 85 |
+
|
| 86 |
+
The CYB004 dataset README leads with "actor attribution modelling — 4-tier
|
| 87 |
+
classification" as a suggested use case. We piloted that target first and
|
| 88 |
+
found a serious issue: four features in the dataset
|
| 89 |
+
(`lure_personalisation_score`, `click_through_rate`,
|
| 90 |
+
`credential_submission_rate`, `target_department_id`) are **constant per
|
| 91 |
+
campaign**, not per-timestep. They look like per-step features but each
|
| 92 |
+
takes a single value across all ~40 timesteps of a given campaign.
|
| 93 |
+
|
| 94 |
+
Because these constants are tier-correlated (especially
|
| 95 |
+
`lure_personalisation_score`, which differs systematically across the
|
| 96 |
+
four actor tiers), they leak tier identity through the campaign-level
|
| 97 |
+
fingerprint they create. With a 15-campaign test fold, many test
|
| 98 |
+
campaigns land in the same feature ranges as training campaigns of the
|
| 99 |
+
same tier, and the model achieves spurious 97%+ accuracy that does not
|
| 100 |
+
generalize. Removing those features (the honest fix) drops tier
|
| 101 |
+
prediction to **accuracy 0.45, ROC-AUC 0.70 — below majority baseline
|
| 102 |
+
of 0.59**. The full 335k-row CYB004 product, with ~4,800 campaigns,
|
| 103 |
+
will not have this constraint; the sample at n=100 cannot support
|
| 104 |
+
honest tier learning.
|
| 105 |
+
|
| 106 |
+
We pivoted to **campaign_phase prediction**, which has 3,952 rows of
|
| 107 |
+
per-timestep data spread across 7 phases with tight timestep windows.
|
| 108 |
+
It learns cleanly under the same group-aware split: 65% accuracy,
|
| 109 |
+
ROC-AUC 0.94, stable across 10 seeds. This is a legitimate
|
| 110 |
+
email-security use case — SOAR playbooks and threat-hunting workflows
|
| 111 |
+
need to tag what phase of a phishing campaign observed activity
|
| 112 |
+
belongs to.
|
| 113 |
+
|
| 114 |
+
Two model artifacts are published. They are designed to be used together — disagreement is a useful triage signal:
|
| 115 |
+
|
| 116 |
+
- `model_xgb.json` — gradient-boosted trees, primary recommendation
|
| 117 |
+
- `model_mlp.safetensors` — PyTorch MLP in SafeTensors format
|
| 118 |
+
|
| 119 |
+
## Quick start
|
| 120 |
+
|
| 121 |
+
```bash
|
| 122 |
+
pip install xgboost torch safetensors pandas huggingface_hub
|
| 123 |
+
```
|
| 124 |
+
|
| 125 |
+
```python
|
| 126 |
+
from huggingface_hub import hf_hub_download
|
| 127 |
+
import json, numpy as np, torch, xgboost as xgb
|
| 128 |
+
from safetensors.torch import load_file
|
| 129 |
+
|
| 130 |
+
REPO = "xpertsystems/cyb004-baseline-classifier"
|
| 131 |
+
|
| 132 |
+
paths = {n: hf_hub_download(REPO, n) for n in [
|
| 133 |
+
"model_xgb.json", "model_mlp.safetensors",
|
| 134 |
+
"feature_engineering.py", "feature_meta.json", "feature_scaler.json",
|
| 135 |
+
]}
|
| 136 |
+
|
| 137 |
+
import sys, os
|
| 138 |
+
sys.path.insert(0, os.path.dirname(paths["feature_engineering.py"]))
|
| 139 |
+
from feature_engineering import (
|
| 140 |
+
transform_single, load_meta, INT_TO_LABEL, build_department_lookup
|
| 141 |
+
)
|
| 142 |
+
|
| 143 |
+
meta = load_meta(paths["feature_meta.json"])
|
| 144 |
+
xgb_model = xgb.XGBClassifier(); xgb_model.load_model(paths["model_xgb.json"])
|
| 145 |
+
dept_lookup = build_department_lookup("path/to/victim_topology.csv")
|
| 146 |
+
|
| 147 |
+
# Predict (see inference_example.ipynb for the full pattern)
|
| 148 |
+
dept_aggs = dept_lookup.get(my_record["target_department_id"], {})
|
| 149 |
+
X = transform_single(my_record, meta, victim_aggregates=dept_aggs)
|
| 150 |
+
proba = xgb_model.predict_proba(X)[0]
|
| 151 |
+
print(INT_TO_LABEL[int(np.argmax(proba))])
|
| 152 |
+
```
|
| 153 |
+
|
| 154 |
+
See [`inference_example.ipynb`](./inference_example.ipynb) for the full
|
| 155 |
+
copy-paste demo.
|
| 156 |
+
|
| 157 |
+
## Training data
|
| 158 |
+
|
| 159 |
+
Trained on the public sample of CYB004, 3,952 per-timestep trajectory
|
| 160 |
+
rows from 100 phishing campaigns (~40 timesteps per campaign):
|
| 161 |
+
|
| 162 |
+
| Phase | Total rows | Test rows (seed 42) |
|
| 163 |
+
|---|---:|---:|
|
| 164 |
+
| `email_delivery` | 919 | 134 |
|
| 165 |
+
| `victim_engagement` | 667 | 102 |
|
| 166 |
+
| `target_reconnaissance` | 558 | 89 |
|
| 167 |
+
| `post_compromise_escalation` | 533 | 50 |
|
| 168 |
+
| `credential_harvesting` | 494 | 91 |
|
| 169 |
+
| `lure_crafting` | 435 | 71 |
|
| 170 |
+
| `infrastructure_setup` | 346 | 48 |
|
| 171 |
+
|
| 172 |
+
### Group-aware split
|
| 173 |
+
|
| 174 |
+
A single campaign generates ~40 highly-correlated timesteps. Random
|
| 175 |
+
row-level splitting would put timesteps from the same campaign in both
|
| 176 |
+
train and test, inflating metrics in a way that does not generalize to
|
| 177 |
+
new campaigns.
|
| 178 |
+
|
| 179 |
+
This release uses **GroupShuffleSplit by `campaign_id`** (nested,
|
| 180 |
+
70/15/15):
|
| 181 |
+
|
| 182 |
+
| Fold | Campaigns | Timesteps |
|
| 183 |
+
|---|---:|---:|
|
| 184 |
+
| Train | 69 | 2,792 |
|
| 185 |
+
| Validation | 16 | 575 |
|
| 186 |
+
| Test | 15 | 585 |
|
| 187 |
+
|
| 188 |
+
All test campaigns are completely unseen during training. Class imbalance
|
| 189 |
+
is addressed with `class_weight='balanced'` (XGBoost `sample_weight`) and
|
| 190 |
+
weighted cross-entropy (MLP).
|
| 191 |
+
|
| 192 |
+
## Feature pipeline
|
| 193 |
+
|
| 194 |
+
The bundled `feature_engineering.py` is the canonical feature recipe.
|
| 195 |
+
53 features survive after encoding, drawn from:
|
| 196 |
+
|
| 197 |
+
- **Per-timestep numeric** (7): `timestep`, `emails_sent_cumulative`, `click_through_rate`, `credential_submission_rate`, `gateway_detection_score`, `lure_personalisation_score`, `target_department_id`
|
| 198 |
+
- **Per-timestep categorical** (2, one-hot): `evasion_technique_active`, `actor_capability_tier`
|
| 199 |
+
- **Victim topology numeric** (5): `employee_count`, `privileged_account_density`, `mfa_enrollment_rate`, `click_susceptibility_base`, `email_volume_daily`
|
| 200 |
+
- **Victim topology categorical** (5, one-hot): `department_type`, `industry_sector`, `awareness_training_level`, `gateway_architecture`, `dmarc_enforcement_level`
|
| 201 |
+
- **Engineered** (6): `log_emails_sent`, `is_gateway_blocked_step`, `is_evasion_active`, `is_high_personalisation`, `has_credential_capture`, `has_user_engagement`
|
| 202 |
+
|
| 203 |
+
### Leakage audit
|
| 204 |
+
|
| 205 |
+
**One column dropped:** `delivery_outcome` (7-class categorical). Its
|
| 206 |
+
crosstab with `campaign_phase` shows that `no_delivery` appears only in
|
| 207 |
+
the early phases (`target_reconnaissance`, `infrastructure_setup`,
|
| 208 |
+
`lure_crafting`, `credential_harvesting`, `post_compromise_escalation`)
|
| 209 |
+
and never in `email_delivery` or `victim_engagement`. Cell purity 0.36
|
| 210 |
+
(uniform baseline 0.14). Keeping it would give the model a near-oracle
|
| 211 |
+
for partitioning early-vs-mid phases.
|
| 212 |
+
|
| 213 |
+
**No oracle features remain.** All retained features have phase-purity
|
| 214 |
+
under 0.20.
|
| 215 |
+
|
| 216 |
+
### Per-campaign-constant features
|
| 217 |
+
|
| 218 |
+
Four features (`lure_personalisation_score`, `click_through_rate`,
|
| 219 |
+
`credential_submission_rate`, `target_department_id`) are constant
|
| 220 |
+
within each campaign. For **phase prediction** this is acceptable —
|
| 221 |
+
their phase-purity is low, so the model uses them as conditioning
|
| 222 |
+
context (similar to "we know this is an APT campaign targeting finance"
|
| 223 |
+
when reasoning about which phase we're in), not as oracle features.
|
| 224 |
+
They became a problem only for the abandoned actor-tier task.
|
| 225 |
+
|
| 226 |
+
## Evaluation
|
| 227 |
+
|
| 228 |
+
### Test-set metrics, seed 42 (n = 585 timesteps from 15 disjoint campaigns)
|
| 229 |
+
|
| 230 |
+
**XGBoost** (the published `model_xgb.json` artifact)
|
| 231 |
+
|
| 232 |
+
| Metric | Value |
|
| 233 |
+
|---|---:|
|
| 234 |
+
| Macro ROC-AUC (OvR) | **0.9356** |
|
| 235 |
+
| Accuracy | **0.6547** |
|
| 236 |
+
| Macro-F1 | 0.6401 |
|
| 237 |
+
| Weighted-F1 | 0.6526 |
|
| 238 |
+
|
| 239 |
+
**MLP** (the published `model_mlp.safetensors` artifact)
|
| 240 |
+
|
| 241 |
+
| Metric | Value |
|
| 242 |
+
|---|---:|
|
| 243 |
+
| Macro ROC-AUC (OvR) | 0.9265 |
|
| 244 |
+
| Accuracy | 0.6427 |
|
| 245 |
+
| Macro-F1 | 0.6275 |
|
| 246 |
+
| Weighted-F1 | 0.6492 |
|
| 247 |
+
|
| 248 |
+
### Multi-seed robustness (XGBoost, 10 seeds)
|
| 249 |
+
|
| 250 |
+
Stable performance across seeds — the task learns cleanly, not seed-lucky:
|
| 251 |
+
|
| 252 |
+
| Metric | Mean | Std | Min | Max |
|
| 253 |
+
|---|---:|---:|---:|---:|
|
| 254 |
+
| Accuracy | 0.649 | 0.038 | 0.592 | 0.711 |
|
| 255 |
+
| Macro-F1 | 0.638 | 0.040 | 0.574 | 0.714 |
|
| 256 |
+
| Macro ROC-AUC OvR | 0.937 | 0.010 | 0.923 | 0.954 |
|
| 257 |
+
|
| 258 |
+
Full per-seed results in [`multi_seed_results.json`](./multi_seed_results.json).
|
| 259 |
+
All 10 seeds yielded all 7 classes in the test fold.
|
| 260 |
+
|
| 261 |
+
### Per-class F1 (seed 42) — where the signal is and isn't
|
| 262 |
+
|
| 263 |
+
| Phase | XGBoost F1 | MLP F1 | Note |
|
| 264 |
+
|---|---:|---:|---|
|
| 265 |
+
| `target_reconnaissance` | **0.888** | 0.831 | Tight early window (timesteps 0-7) |
|
| 266 |
+
| `email_delivery` | **0.791** | 0.761 | Tight window (8-30); gateway signals + email volume |
|
| 267 |
+
| `infrastructure_setup` | **0.712** | 0.702 | Tight window (5-18) |
|
| 268 |
+
| `lure_crafting` | **0.676** | 0.561 | Tight window (3-13) |
|
| 269 |
+
| `post_compromise_escalation` | 0.604 | 0.717 | Late window (22-52) |
|
| 270 |
+
| `victim_engagement` | 0.469 | 0.387 | Mid window (14-38), overlaps with adjacent phases |
|
| 271 |
+
| `credential_harvesting` | 0.341 | 0.434 | Mid-late (19-45), similar features to victim_engagement |
|
| 272 |
+
|
| 273 |
+
Four early phases (target_reconnaissance, infrastructure_setup,
|
| 274 |
+
lure_crafting, email_delivery) classify cleanly because they sit in
|
| 275 |
+
tight non-overlapping timestep windows with distinctive features.
|
| 276 |
+
Three later phases (victim_engagement, credential_harvesting,
|
| 277 |
+
post_compromise_escalation) overlap substantially in timestep range
|
| 278 |
+
(14-52, 19-45, 22-52) and share similar behavioural footprints
|
| 279 |
+
(non-zero click/credential rates, deployed evasion); these are
|
| 280 |
+
genuinely harder for a flat-tabular model. Sequence models with
|
| 281 |
+
campaign-level context would help here.
|
| 282 |
+
|
| 283 |
+
### Ablation: which feature groups matter
|
| 284 |
+
|
| 285 |
+
| Configuration | Accuracy | Macro-F1 | ROC-AUC | Δ accuracy |
|
| 286 |
+
|---|---:|---:|---:|---:|
|
| 287 |
+
| Full feature set (published) | 0.6547 | 0.6401 | 0.9356 | — |
|
| 288 |
+
| No `timestep` | 0.3624 | 0.3139 | 0.8128 | **−0.2923** |
|
| 289 |
+
| No behavioural features | 0.5795 | 0.5735 | 0.9188 | −0.0752 |
|
| 290 |
+
| No topology features | 0.6410 | 0.6260 | 0.9342 | −0.0137 |
|
| 291 |
+
| No engineered features | 0.6581 | 0.6402 | 0.9370 | +0.0034 |
|
| 292 |
+
|
| 293 |
+
Three findings:
|
| 294 |
+
|
| 295 |
+
1. **`timestep` is by far the dominant feature** (drops 29 pp when
|
| 296 |
+
removed, ROC-AUC still 0.81). Phishing campaigns progress through
|
| 297 |
+
phases over time; where you are in the campaign timeline carries
|
| 298 |
+
most of the phase signal.
|
| 299 |
+
2. **Behavioural features contribute ~8 pp accuracy.** These are the
|
| 300 |
+
per-timestep observables (emails sent, gateway score, click rate,
|
| 301 |
+
evasion technique).
|
| 302 |
+
3. **Topology and engineered features each contribute ~1 pp.** Trees
|
| 303 |
+
recover most of the engineered features on their own; topology
|
| 304 |
+
provides modest conditioning context.
|
| 305 |
+
|
| 306 |
+
### Architecture
|
| 307 |
+
|
| 308 |
+
**XGBoost:** multi-class gradient boosting (`multi:softprob`, 7 classes),
|
| 309 |
+
`hist` tree method, class-balanced sample weights, early stopping on
|
| 310 |
+
validation mlogloss.
|
| 311 |
+
|
| 312 |
+
**MLP:** `53 → 128 → 64 → 7`, each hidden layer followed by `BatchNorm1d`
|
| 313 |
+
→ `ReLU` → `Dropout(0.3)`, weighted cross-entropy loss, AdamW optimizer,
|
| 314 |
+
early stopping on validation macro-F1.
|
| 315 |
+
|
| 316 |
+
Training hyperparameters (learning rate, batch size, n_estimators,
|
| 317 |
+
early-stopping patience, weight decay, class-weighting strategy) are
|
| 318 |
+
held internally by XpertSystems and are not part of this release.
|
| 319 |
+
|
| 320 |
+
## Limitations
|
| 321 |
+
|
| 322 |
+
**This is a baseline reference, not a production email-security system.**
|
| 323 |
+
|
| 324 |
+
1. **Mid- and late-phase confusion.** Per-class F1 for
|
| 325 |
+
`victim_engagement`, `credential_harvesting`, and
|
| 326 |
+
`post_compromise_escalation` is 0.34–0.60. These phases overlap in
|
| 327 |
+
timestep range and share similar behavioural signatures. Sequence
|
| 328 |
+
models that consider campaign-level context would help substantially.
|
| 329 |
+
|
| 330 |
+
2. **The pivot away from actor-tier classification is dataset-limited,
|
| 331 |
+
not method-limited.** With 100 campaigns and 4 tiers (some with only
|
| 332 |
+
10 campaigns total), tier classification is below majority baseline
|
| 333 |
+
once leakage-prone features are removed. The full 335k-row CYB004
|
| 334 |
+
product provides ~4,800 campaigns; the sample does not.
|
| 335 |
+
|
| 336 |
+
3. **Synthetic-vs-real transfer.** The dataset is synthetic and
|
| 337 |
+
calibrated to email-security and threat-intelligence benchmark
|
| 338 |
+
targets (Proofpoint State of the Phish, KnowBe4 Industry Benchmark,
|
| 339 |
+
Cofense PIQ, Mandiant M-Trends, FBI IC3 BEC Report, Verizon DBIR,
|
| 340 |
+
CISA, APWG). Real phishing telemetry has different noise
|
| 341 |
+
characteristics, adversary adaptation, and instrumentation gaps. Do
|
| 342 |
+
not assume metrics transfer.
|
| 343 |
+
|
| 344 |
+
4. **Adversarial robustness not evaluated.** The dataset is not
|
| 345 |
+
adversarially generated; the model has not been red-teamed against
|
| 346 |
+
evasive lures or novel infrastructure.
|
| 347 |
+
|
| 348 |
+
5. **MLP brittleness on OOD inputs.** With ~2.8k training timesteps,
|
| 349 |
+
the MLP can produce confidently-wrong predictions on hand-crafted
|
| 350 |
+
records far from the training manifold. XGBoost is more robust.
|
| 351 |
+
Use both; treat disagreement as a signal for human review.
|
| 352 |
+
|
| 353 |
+
6. **`timestep` dominance is a property of the dataset.** Real
|
| 354 |
+
phishing telemetry doesn't carry a clean per-campaign normalized
|
| 355 |
+
timestep — that's a simulator artifact. A buyer transferring this
|
| 356 |
+
baseline to real campaign telemetry would need to recover an
|
| 357 |
+
equivalent temporal-position feature (e.g. hours since campaign
|
| 358 |
+
first observation, position in stage-detection pipeline).
|
| 359 |
+
|
| 360 |
+
## Notes on dataset schema
|
| 361 |
+
|
| 362 |
+
The CYB004 sample dataset README describes some fields differently from
|
| 363 |
+
the actual schema. The model was trained on the actual schema; this note
|
| 364 |
+
helps buyers reconcile what they read with what they receive.
|
| 365 |
+
|
| 366 |
+
| What the README says | What the data actually contains |
|
| 367 |
+
|---|---|
|
| 368 |
+
| "9 campaign phases" (reconnaissance, infrastructure_setup, lure_creation, send_wave, gateway_evaluation, user_interaction, credential_capture, lateral_pivot, exfiltration) | 7 phases with different names: target_reconnaissance, infrastructure_setup, lure_crafting, email_delivery, victim_engagement, credential_harvesting, post_compromise_escalation |
|
| 369 |
+
| 4 actor tiers: `opportunistic`, `organized_crime`, `targeted`, `nation_state_apt` | 4 tiers: `opportunistic`, `cybercriminal_gang`, `initial_access_broker`, `nation_state_apt` |
|
| 370 |
+
| 8 department types listed | 4 department types: `executive_leadership`, `finance_accounts_payable`, `human_resources`, `information_technology` |
|
| 371 |
+
| 4 gateway architectures | 8 gateway architectures including `ai_sender_reputation`, `integrated_cloud_defender`, `zero_trust_email_proxy` |
|
| 372 |
+
| Awareness training: none, annual, semi-annual, quarterly, monthly | annual, none, continuous, basic, quarterly (no semi-annual or monthly) |
|
| 373 |
+
| Per-timestep fields: `send_volume`, `gateway_blocked`, `emails_delivered`, `user_report_count`, `mfa_bypass_attempted`, `bec_attempt`, `lateral_pivot_attempted`, `operational_stealth_score`, `dmarc_enforcement_active` | None of these exist per-timestep. The actual per-timestep columns are: `emails_sent_cumulative`, `gateway_detection_score`, `delivery_outcome`, `lure_personalisation_score`, `evasion_technique_active`. BEC / MFA bypass / lateral phishing flags exist only at the campaign-summary level. |
|
| 374 |
+
|
| 375 |
+
None of these discrepancies affects model correctness — the feature
|
| 376 |
+
pipeline uses the actual column names. If you build your own pipeline
|
| 377 |
+
against the dataset, use the actual columns.
|
| 378 |
+
|
| 379 |
+
## Intended use
|
| 380 |
+
|
| 381 |
+
- **Evaluating fit** of the CYB004 dataset for your email-security
|
| 382 |
+
or threat-hunting research
|
| 383 |
+
- **Baseline reference** for new model architectures (especially
|
| 384 |
+
sequence models, which should beat this baseline on the overlapping
|
| 385 |
+
mid-late phases)
|
| 386 |
+
- **Teaching and demo** for tabular classification on phishing
|
| 387 |
+
campaign telemetry
|
| 388 |
+
- **Feature engineering reference** for per-timestep campaign data
|
| 389 |
+
|
| 390 |
+
## Out-of-scope use
|
| 391 |
+
|
| 392 |
+
- Production email security on real campaign telemetry
|
| 393 |
+
- Threat hunting / SOAR playbooks on real systems
|
| 394 |
+
- Actor attribution (this baseline does not address that task; see why above)
|
| 395 |
+
- Adversarial-evasion evaluation (dataset not adversarially generated)
|
| 396 |
+
- Any operational security decision
|
| 397 |
+
|
| 398 |
+
## Reproducibility
|
| 399 |
+
|
| 400 |
+
Outputs above were produced with `seed = 42` (published artifact),
|
| 401 |
+
group-aware nested `GroupShuffleSplit` (70/15/15 by campaign_id), on the
|
| 402 |
+
published sample (`xpertsystems/cyb004-sample`, version 1.0.0, generated
|
| 403 |
+
2026-05-16). The feature pipeline in `feature_engineering.py` is
|
| 404 |
+
deterministic and the trained weights in this repo correspond exactly
|
| 405 |
+
to the metrics above.
|
| 406 |
+
|
| 407 |
+
Multi-seed results (seeds 42, 7, 13, 17, 23, 31, 45, 99, 123, 200) in
|
| 408 |
+
`multi_seed_results.json` confirm robust performance across splits.
|
| 409 |
+
|
| 410 |
+
The training script itself is private to XpertSystems.
|
| 411 |
+
|
| 412 |
+
## Files in this repo
|
| 413 |
+
|
| 414 |
+
| File | Purpose |
|
| 415 |
+
|---|---|
|
| 416 |
+
| `model_xgb.json` | XGBoost weights (seed 42) |
|
| 417 |
+
| `model_mlp.safetensors` | PyTorch MLP weights (seed 42) |
|
| 418 |
+
| `feature_engineering.py` | Feature pipeline (load → join topology → engineer → encode) |
|
| 419 |
+
| `feature_meta.json` | Feature column order + categorical levels |
|
| 420 |
+
| `feature_scaler.json` | MLP input mean/std (XGBoost ignores) |
|
| 421 |
+
| `validation_results.json` | Per-class metrics, confusion matrix, architecture |
|
| 422 |
+
| `ablation_results.json` | Per-feature-group ablation |
|
| 423 |
+
| `multi_seed_results.json` | XGBoost metrics across 10 seeds with aggregate statistics |
|
| 424 |
+
| `inference_example.ipynb` | End-to-end inference demo notebook |
|
| 425 |
+
| `README.md` | This file |
|
| 426 |
+
|
| 427 |
+
## Contact and full product
|
| 428 |
+
|
| 429 |
+
The full **CYB004** dataset contains ~335,000 rows across four files,
|
| 430 |
+
with calibrated benchmark validation against 12 metrics from email
|
| 431 |
+
security and threat intelligence sources (Proofpoint, KnowBe4,
|
| 432 |
+
Cofense, Mandiant, FBI IC3, Verizon, CISA, APWG). The full
|
| 433 |
+
XpertSystems.ai synthetic data catalogue spans 41 SKUs across
|
| 434 |
+
Cybersecurity, Healthcare, Insurance & Risk, Oil & Gas, and Materials
|
| 435 |
+
& Energy.
|
| 436 |
+
|
| 437 |
+
- 📧 **pradeep@xpertsystems.ai**
|
| 438 |
+
- 🌐 **https://xpertsystems.ai**
|
| 439 |
+
- 🗂 Dataset: https://huggingface.co/datasets/xpertsystems/cyb004-sample
|
| 440 |
+
- 🤖 Companion models:
|
| 441 |
+
- https://huggingface.co/xpertsystems/cyb001-baseline-classifier (network traffic)
|
| 442 |
+
- https://huggingface.co/xpertsystems/cyb002-baseline-classifier (ATT&CK kill-chain)
|
| 443 |
+
- https://huggingface.co/xpertsystems/cyb003-baseline-classifier (malware execution phase)
|
| 444 |
+
|
| 445 |
+
## Citation
|
| 446 |
+
|
| 447 |
+
```bibtex
|
| 448 |
+
@misc{xpertsystems_cyb004_baseline_2026,
|
| 449 |
+
title = {CYB004 Baseline Classifier: XGBoost and MLP for Phishing Campaign Phase Classification},
|
| 450 |
+
author = {XpertSystems.ai},
|
| 451 |
+
year = {2026},
|
| 452 |
+
url = {https://huggingface.co/xpertsystems/cyb004-baseline-classifier},
|
| 453 |
+
note = {Baseline reference model trained on xpertsystems/cyb004-sample}
|
| 454 |
+
}
|
| 455 |
+
```
|
ablation_results.json
ADDED
|
@@ -0,0 +1,489 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"purpose": "Quantify how much each feature group contributes to the headline XGBoost score. Identical architecture, same group-aware split, with one feature group dropped at a time.",
|
| 3 |
+
"full_model_metrics": {
|
| 4 |
+
"model": "xgboost",
|
| 5 |
+
"accuracy": 0.6547008547008547,
|
| 6 |
+
"macro_f1": 0.6401276666852063,
|
| 7 |
+
"weighted_f1": 0.657179533714298,
|
| 8 |
+
"per_class_f1": {
|
| 9 |
+
"target_reconnaissance": 0.8875739644970414,
|
| 10 |
+
"infrastructure_setup": 0.7115384615384616,
|
| 11 |
+
"lure_crafting": 0.6762589928057554,
|
| 12 |
+
"email_delivery": 0.7913669064748201,
|
| 13 |
+
"victim_engagement": 0.46938775510204084,
|
| 14 |
+
"credential_harvesting": 0.34074074074074073,
|
| 15 |
+
"post_compromise_escalation": 0.6040268456375839
|
| 16 |
+
},
|
| 17 |
+
"confusion_matrix": {
|
| 18 |
+
"labels": [
|
| 19 |
+
"target_reconnaissance",
|
| 20 |
+
"infrastructure_setup",
|
| 21 |
+
"lure_crafting",
|
| 22 |
+
"email_delivery",
|
| 23 |
+
"victim_engagement",
|
| 24 |
+
"credential_harvesting",
|
| 25 |
+
"post_compromise_escalation"
|
| 26 |
+
],
|
| 27 |
+
"matrix": [
|
| 28 |
+
[
|
| 29 |
+
75,
|
| 30 |
+
0,
|
| 31 |
+
9,
|
| 32 |
+
0,
|
| 33 |
+
0,
|
| 34 |
+
0,
|
| 35 |
+
0
|
| 36 |
+
],
|
| 37 |
+
[
|
| 38 |
+
0,
|
| 39 |
+
37,
|
| 40 |
+
16,
|
| 41 |
+
0,
|
| 42 |
+
0,
|
| 43 |
+
0,
|
| 44 |
+
0
|
| 45 |
+
],
|
| 46 |
+
[
|
| 47 |
+
10,
|
| 48 |
+
10,
|
| 49 |
+
47,
|
| 50 |
+
0,
|
| 51 |
+
0,
|
| 52 |
+
0,
|
| 53 |
+
0
|
| 54 |
+
],
|
| 55 |
+
[
|
| 56 |
+
0,
|
| 57 |
+
4,
|
| 58 |
+
0,
|
| 59 |
+
110,
|
| 60 |
+
28,
|
| 61 |
+
1,
|
| 62 |
+
0
|
| 63 |
+
],
|
| 64 |
+
[
|
| 65 |
+
0,
|
| 66 |
+
0,
|
| 67 |
+
0,
|
| 68 |
+
21,
|
| 69 |
+
46,
|
| 70 |
+
24,
|
| 71 |
+
9
|
| 72 |
+
],
|
| 73 |
+
[
|
| 74 |
+
0,
|
| 75 |
+
0,
|
| 76 |
+
0,
|
| 77 |
+
4,
|
| 78 |
+
16,
|
| 79 |
+
23,
|
| 80 |
+
20
|
| 81 |
+
],
|
| 82 |
+
[
|
| 83 |
+
0,
|
| 84 |
+
0,
|
| 85 |
+
0,
|
| 86 |
+
0,
|
| 87 |
+
6,
|
| 88 |
+
24,
|
| 89 |
+
45
|
| 90 |
+
]
|
| 91 |
+
]
|
| 92 |
+
},
|
| 93 |
+
"macro_roc_auc_ovr": 0.935584434710217
|
| 94 |
+
},
|
| 95 |
+
"ablations": {
|
| 96 |
+
"no_topology": {
|
| 97 |
+
"n_features": 23,
|
| 98 |
+
"dropped_count": 30,
|
| 99 |
+
"metrics": {
|
| 100 |
+
"model": "xgboost_no_topology",
|
| 101 |
+
"accuracy": 0.6410256410256411,
|
| 102 |
+
"macro_f1": 0.626013906528604,
|
| 103 |
+
"weighted_f1": 0.6377089952999916,
|
| 104 |
+
"per_class_f1": {
|
| 105 |
+
"target_reconnaissance": 0.891566265060241,
|
| 106 |
+
"infrastructure_setup": 0.7586206896551724,
|
| 107 |
+
"lure_crafting": 0.676923076923077,
|
| 108 |
+
"email_delivery": 0.7598566308243727,
|
| 109 |
+
"victim_engagement": 0.40609137055837563,
|
| 110 |
+
"credential_harvesting": 0.2782608695652174,
|
| 111 |
+
"post_compromise_escalation": 0.6107784431137725
|
| 112 |
+
},
|
| 113 |
+
"confusion_matrix": {
|
| 114 |
+
"labels": [
|
| 115 |
+
"target_reconnaissance",
|
| 116 |
+
"infrastructure_setup",
|
| 117 |
+
"lure_crafting",
|
| 118 |
+
"email_delivery",
|
| 119 |
+
"victim_engagement",
|
| 120 |
+
"credential_harvesting",
|
| 121 |
+
"post_compromise_escalation"
|
| 122 |
+
],
|
| 123 |
+
"matrix": [
|
| 124 |
+
[
|
| 125 |
+
74,
|
| 126 |
+
0,
|
| 127 |
+
10,
|
| 128 |
+
0,
|
| 129 |
+
0,
|
| 130 |
+
0,
|
| 131 |
+
0
|
| 132 |
+
],
|
| 133 |
+
[
|
| 134 |
+
0,
|
| 135 |
+
44,
|
| 136 |
+
9,
|
| 137 |
+
0,
|
| 138 |
+
0,
|
| 139 |
+
0,
|
| 140 |
+
0
|
| 141 |
+
],
|
| 142 |
+
[
|
| 143 |
+
8,
|
| 144 |
+
15,
|
| 145 |
+
44,
|
| 146 |
+
0,
|
| 147 |
+
0,
|
| 148 |
+
0,
|
| 149 |
+
0
|
| 150 |
+
],
|
| 151 |
+
[
|
| 152 |
+
0,
|
| 153 |
+
4,
|
| 154 |
+
0,
|
| 155 |
+
106,
|
| 156 |
+
30,
|
| 157 |
+
3,
|
| 158 |
+
0
|
| 159 |
+
],
|
| 160 |
+
[
|
| 161 |
+
0,
|
| 162 |
+
0,
|
| 163 |
+
0,
|
| 164 |
+
26,
|
| 165 |
+
40,
|
| 166 |
+
16,
|
| 167 |
+
18
|
| 168 |
+
],
|
| 169 |
+
[
|
| 170 |
+
0,
|
| 171 |
+
0,
|
| 172 |
+
0,
|
| 173 |
+
4,
|
| 174 |
+
20,
|
| 175 |
+
16,
|
| 176 |
+
23
|
| 177 |
+
],
|
| 178 |
+
[
|
| 179 |
+
0,
|
| 180 |
+
0,
|
| 181 |
+
0,
|
| 182 |
+
0,
|
| 183 |
+
7,
|
| 184 |
+
17,
|
| 185 |
+
51
|
| 186 |
+
]
|
| 187 |
+
]
|
| 188 |
+
},
|
| 189 |
+
"macro_roc_auc_ovr": 0.9341744835062434
|
| 190 |
+
},
|
| 191 |
+
"delta_accuracy": 0.013675213675213627,
|
| 192 |
+
"delta_macro_f1": 0.014113760156602262
|
| 193 |
+
},
|
| 194 |
+
"no_behavioural": {
|
| 195 |
+
"n_features": 36,
|
| 196 |
+
"dropped_count": 17,
|
| 197 |
+
"metrics": {
|
| 198 |
+
"model": "xgboost_no_behavioural",
|
| 199 |
+
"accuracy": 0.5794871794871795,
|
| 200 |
+
"macro_f1": 0.5734830391013238,
|
| 201 |
+
"weighted_f1": 0.5833619015067782,
|
| 202 |
+
"per_class_f1": {
|
| 203 |
+
"target_reconnaissance": 0.9024390243902439,
|
| 204 |
+
"infrastructure_setup": 0.4745762711864407,
|
| 205 |
+
"lure_crafting": 0.6619718309859155,
|
| 206 |
+
"email_delivery": 0.6390977443609023,
|
| 207 |
+
"victim_engagement": 0.3404255319148936,
|
| 208 |
+
"credential_harvesting": 0.3472222222222222,
|
| 209 |
+
"post_compromise_escalation": 0.6486486486486487
|
| 210 |
+
},
|
| 211 |
+
"confusion_matrix": {
|
| 212 |
+
"labels": [
|
| 213 |
+
"target_reconnaissance",
|
| 214 |
+
"infrastructure_setup",
|
| 215 |
+
"lure_crafting",
|
| 216 |
+
"email_delivery",
|
| 217 |
+
"victim_engagement",
|
| 218 |
+
"credential_harvesting",
|
| 219 |
+
"post_compromise_escalation"
|
| 220 |
+
],
|
| 221 |
+
"matrix": [
|
| 222 |
+
[
|
| 223 |
+
74,
|
| 224 |
+
0,
|
| 225 |
+
10,
|
| 226 |
+
0,
|
| 227 |
+
0,
|
| 228 |
+
0,
|
| 229 |
+
0
|
| 230 |
+
],
|
| 231 |
+
[
|
| 232 |
+
0,
|
| 233 |
+
28,
|
| 234 |
+
16,
|
| 235 |
+
9,
|
| 236 |
+
0,
|
| 237 |
+
0,
|
| 238 |
+
0
|
| 239 |
+
],
|
| 240 |
+
[
|
| 241 |
+
6,
|
| 242 |
+
13,
|
| 243 |
+
47,
|
| 244 |
+
1,
|
| 245 |
+
0,
|
| 246 |
+
0,
|
| 247 |
+
0
|
| 248 |
+
],
|
| 249 |
+
[
|
| 250 |
+
0,
|
| 251 |
+
23,
|
| 252 |
+
2,
|
| 253 |
+
85,
|
| 254 |
+
30,
|
| 255 |
+
3,
|
| 256 |
+
0
|
| 257 |
+
],
|
| 258 |
+
[
|
| 259 |
+
0,
|
| 260 |
+
1,
|
| 261 |
+
0,
|
| 262 |
+
26,
|
| 263 |
+
32,
|
| 264 |
+
34,
|
| 265 |
+
7
|
| 266 |
+
],
|
| 267 |
+
[
|
| 268 |
+
0,
|
| 269 |
+
0,
|
| 270 |
+
0,
|
| 271 |
+
2,
|
| 272 |
+
18,
|
| 273 |
+
25,
|
| 274 |
+
18
|
| 275 |
+
],
|
| 276 |
+
[
|
| 277 |
+
0,
|
| 278 |
+
0,
|
| 279 |
+
0,
|
| 280 |
+
0,
|
| 281 |
+
8,
|
| 282 |
+
19,
|
| 283 |
+
48
|
| 284 |
+
]
|
| 285 |
+
]
|
| 286 |
+
},
|
| 287 |
+
"macro_roc_auc_ovr": 0.9187512184393106
|
| 288 |
+
},
|
| 289 |
+
"delta_accuracy": 0.07521367521367517,
|
| 290 |
+
"delta_macro_f1": 0.06664462758388245
|
| 291 |
+
},
|
| 292 |
+
"no_timestep": {
|
| 293 |
+
"n_features": 52,
|
| 294 |
+
"dropped_count": 1,
|
| 295 |
+
"metrics": {
|
| 296 |
+
"model": "xgboost_no_timestep",
|
| 297 |
+
"accuracy": 0.3623931623931624,
|
| 298 |
+
"macro_f1": 0.3138802646284953,
|
| 299 |
+
"weighted_f1": 0.3500013055228507,
|
| 300 |
+
"per_class_f1": {
|
| 301 |
+
"target_reconnaissance": 0.4419889502762431,
|
| 302 |
+
"infrastructure_setup": 0.24,
|
| 303 |
+
"lure_crafting": 0.2748091603053435,
|
| 304 |
+
"email_delivery": 0.5617283950617284,
|
| 305 |
+
"victim_engagement": 0.26666666666666666,
|
| 306 |
+
"credential_harvesting": 0.11666666666666667,
|
| 307 |
+
"post_compromise_escalation": 0.2953020134228188
|
| 308 |
+
},
|
| 309 |
+
"confusion_matrix": {
|
| 310 |
+
"labels": [
|
| 311 |
+
"target_reconnaissance",
|
| 312 |
+
"infrastructure_setup",
|
| 313 |
+
"lure_crafting",
|
| 314 |
+
"email_delivery",
|
| 315 |
+
"victim_engagement",
|
| 316 |
+
"credential_harvesting",
|
| 317 |
+
"post_compromise_escalation"
|
| 318 |
+
],
|
| 319 |
+
"matrix": [
|
| 320 |
+
[
|
| 321 |
+
40,
|
| 322 |
+
18,
|
| 323 |
+
26,
|
| 324 |
+
0,
|
| 325 |
+
0,
|
| 326 |
+
0,
|
| 327 |
+
0
|
| 328 |
+
],
|
| 329 |
+
[
|
| 330 |
+
23,
|
| 331 |
+
12,
|
| 332 |
+
18,
|
| 333 |
+
0,
|
| 334 |
+
0,
|
| 335 |
+
0,
|
| 336 |
+
0
|
| 337 |
+
],
|
| 338 |
+
[
|
| 339 |
+
32,
|
| 340 |
+
17,
|
| 341 |
+
18,
|
| 342 |
+
0,
|
| 343 |
+
0,
|
| 344 |
+
0,
|
| 345 |
+
0
|
| 346 |
+
],
|
| 347 |
+
[
|
| 348 |
+
2,
|
| 349 |
+
0,
|
| 350 |
+
2,
|
| 351 |
+
91,
|
| 352 |
+
16,
|
| 353 |
+
17,
|
| 354 |
+
15
|
| 355 |
+
],
|
| 356 |
+
[
|
| 357 |
+
0,
|
| 358 |
+
0,
|
| 359 |
+
0,
|
| 360 |
+
36,
|
| 361 |
+
22,
|
| 362 |
+
20,
|
| 363 |
+
22
|
| 364 |
+
],
|
| 365 |
+
[
|
| 366 |
+
0,
|
| 367 |
+
0,
|
| 368 |
+
0,
|
| 369 |
+
25,
|
| 370 |
+
16,
|
| 371 |
+
7,
|
| 372 |
+
15
|
| 373 |
+
],
|
| 374 |
+
[
|
| 375 |
+
0,
|
| 376 |
+
0,
|
| 377 |
+
0,
|
| 378 |
+
29,
|
| 379 |
+
11,
|
| 380 |
+
13,
|
| 381 |
+
22
|
| 382 |
+
]
|
| 383 |
+
]
|
| 384 |
+
},
|
| 385 |
+
"macro_roc_auc_ovr": 0.8128267634071407
|
| 386 |
+
},
|
| 387 |
+
"delta_accuracy": 0.2923076923076923,
|
| 388 |
+
"delta_macro_f1": 0.326247402056711
|
| 389 |
+
},
|
| 390 |
+
"no_engineered": {
|
| 391 |
+
"n_features": 47,
|
| 392 |
+
"dropped_count": 6,
|
| 393 |
+
"metrics": {
|
| 394 |
+
"model": "xgboost_no_engineered",
|
| 395 |
+
"accuracy": 0.6581196581196581,
|
| 396 |
+
"macro_f1": 0.6401951204875947,
|
| 397 |
+
"weighted_f1": 0.6592473136316277,
|
| 398 |
+
"per_class_f1": {
|
| 399 |
+
"target_reconnaissance": 0.8809523809523809,
|
| 400 |
+
"infrastructure_setup": 0.7155963302752294,
|
| 401 |
+
"lure_crafting": 0.6518518518518519,
|
| 402 |
+
"email_delivery": 0.8,
|
| 403 |
+
"victim_engagement": 0.49473684210526314,
|
| 404 |
+
"credential_harvesting": 0.3484848484848485,
|
| 405 |
+
"post_compromise_escalation": 0.5897435897435898
|
| 406 |
+
},
|
| 407 |
+
"confusion_matrix": {
|
| 408 |
+
"labels": [
|
| 409 |
+
"target_reconnaissance",
|
| 410 |
+
"infrastructure_setup",
|
| 411 |
+
"lure_crafting",
|
| 412 |
+
"email_delivery",
|
| 413 |
+
"victim_engagement",
|
| 414 |
+
"credential_harvesting",
|
| 415 |
+
"post_compromise_escalation"
|
| 416 |
+
],
|
| 417 |
+
"matrix": [
|
| 418 |
+
[
|
| 419 |
+
74,
|
| 420 |
+
0,
|
| 421 |
+
10,
|
| 422 |
+
0,
|
| 423 |
+
0,
|
| 424 |
+
0,
|
| 425 |
+
0
|
| 426 |
+
],
|
| 427 |
+
[
|
| 428 |
+
0,
|
| 429 |
+
39,
|
| 430 |
+
14,
|
| 431 |
+
0,
|
| 432 |
+
0,
|
| 433 |
+
0,
|
| 434 |
+
0
|
| 435 |
+
],
|
| 436 |
+
[
|
| 437 |
+
10,
|
| 438 |
+
13,
|
| 439 |
+
44,
|
| 440 |
+
0,
|
| 441 |
+
0,
|
| 442 |
+
0,
|
| 443 |
+
0
|
| 444 |
+
],
|
| 445 |
+
[
|
| 446 |
+
0,
|
| 447 |
+
4,
|
| 448 |
+
0,
|
| 449 |
+
112,
|
| 450 |
+
26,
|
| 451 |
+
1,
|
| 452 |
+
0
|
| 453 |
+
],
|
| 454 |
+
[
|
| 455 |
+
0,
|
| 456 |
+
0,
|
| 457 |
+
0,
|
| 458 |
+
20,
|
| 459 |
+
47,
|
| 460 |
+
22,
|
| 461 |
+
11
|
| 462 |
+
],
|
| 463 |
+
[
|
| 464 |
+
0,
|
| 465 |
+
0,
|
| 466 |
+
0,
|
| 467 |
+
5,
|
| 468 |
+
11,
|
| 469 |
+
23,
|
| 470 |
+
24
|
| 471 |
+
],
|
| 472 |
+
[
|
| 473 |
+
0,
|
| 474 |
+
0,
|
| 475 |
+
0,
|
| 476 |
+
0,
|
| 477 |
+
6,
|
| 478 |
+
23,
|
| 479 |
+
46
|
| 480 |
+
]
|
| 481 |
+
]
|
| 482 |
+
},
|
| 483 |
+
"macro_roc_auc_ovr": 0.9369503919262667
|
| 484 |
+
},
|
| 485 |
+
"delta_accuracy": -0.0034188034188034067,
|
| 486 |
+
"delta_macro_f1": -6.745380238848409e-05
|
| 487 |
+
}
|
| 488 |
+
}
|
| 489 |
+
}
|
feature_engineering.py
ADDED
|
@@ -0,0 +1,341 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
feature_engineering.py
|
| 3 |
+
======================
|
| 4 |
+
|
| 5 |
+
Feature pipeline for the CYB004 baseline classifier.
|
| 6 |
+
|
| 7 |
+
Predicts `campaign_phase` (7-class) from per-timestep phishing campaign
|
| 8 |
+
trajectory data on the CYB004 sample dataset.
|
| 9 |
+
|
| 10 |
+
CSV inputs:
|
| 11 |
+
campaign_trajectories.csv (primary, one row per timestep, 100
|
| 12 |
+
campaigns x ~40 timesteps = 3,952 rows)
|
| 13 |
+
victim_topology.csv (per-department victim configuration,
|
| 14 |
+
joined on target_department_id)
|
| 15 |
+
campaign_summary.csv (per-campaign aggregates; reserved for
|
| 16 |
+
future work)
|
| 17 |
+
campaign_events.csv (discrete event log; reserved for
|
| 18 |
+
future work)
|
| 19 |
+
|
| 20 |
+
Target classes (7 phases observed in the sample):
|
| 21 |
+
target_reconnaissance, infrastructure_setup, lure_crafting,
|
| 22 |
+
email_delivery, victim_engagement, credential_harvesting,
|
| 23 |
+
post_compromise_escalation
|
| 24 |
+
|
| 25 |
+
This is the email-security / SOC use case: given the observable
|
| 26 |
+
campaign telemetry at a moment in time, what phase of the phishing
|
| 27 |
+
lifecycle is the campaign in?
|
| 28 |
+
|
| 29 |
+
The pivot to campaign_phase (away from actor_capability_tier, the
|
| 30 |
+
README's headline use case) happened because per-campaign-constant
|
| 31 |
+
features (lure_personalisation_score, click_through_rate,
|
| 32 |
+
credential_submission_rate, target_department_id) leak tier via the
|
| 33 |
+
small test fold under group-aware splitting. With those features
|
| 34 |
+
removed, honest tier prediction is below majority baseline. The full
|
| 35 |
+
335k-row CYB004 dataset would address this; the sample does not.
|
| 36 |
+
See the model card for full discussion.
|
| 37 |
+
|
| 38 |
+
Public API
|
| 39 |
+
----------
|
| 40 |
+
build_features(trajectories_path, topology_path)
|
| 41 |
+
-> (X, y, groups, meta)
|
| 42 |
+
transform_single(record, meta, victim_aggregates=None) -> np.ndarray
|
| 43 |
+
save_meta(meta, path) / load_meta(path)
|
| 44 |
+
build_department_lookup(topology_path) -> dict
|
| 45 |
+
|
| 46 |
+
License
|
| 47 |
+
-------
|
| 48 |
+
Ships with the public model on Hugging Face under CC-BY-NC-4.0, matching
|
| 49 |
+
the dataset license. See README.md.
|
| 50 |
+
"""
|
| 51 |
+
|
| 52 |
+
from __future__ import annotations
|
| 53 |
+
|
| 54 |
+
import json
|
| 55 |
+
from pathlib import Path
|
| 56 |
+
from typing import Any
|
| 57 |
+
|
| 58 |
+
import numpy as np
|
| 59 |
+
import pandas as pd
|
| 60 |
+
|
| 61 |
+
# ---------------------------------------------------------------------------
|
| 62 |
+
# Label space
|
| 63 |
+
# ---------------------------------------------------------------------------
|
| 64 |
+
|
| 65 |
+
LABEL_ORDER = [
|
| 66 |
+
"target_reconnaissance",
|
| 67 |
+
"infrastructure_setup",
|
| 68 |
+
"lure_crafting",
|
| 69 |
+
"email_delivery",
|
| 70 |
+
"victim_engagement",
|
| 71 |
+
"credential_harvesting",
|
| 72 |
+
"post_compromise_escalation",
|
| 73 |
+
]
|
| 74 |
+
LABEL_TO_INT = {lbl: i for i, lbl in enumerate(LABEL_ORDER)}
|
| 75 |
+
INT_TO_LABEL = {i: lbl for lbl, i in LABEL_TO_INT.items()}
|
| 76 |
+
|
| 77 |
+
# ---------------------------------------------------------------------------
|
| 78 |
+
# Identifier and target columns - not features
|
| 79 |
+
# ---------------------------------------------------------------------------
|
| 80 |
+
|
| 81 |
+
ID_COLUMNS = ["campaign_id", "actor_id"]
|
| 82 |
+
TARGET_COLUMN = "campaign_phase"
|
| 83 |
+
|
| 84 |
+
# `actor_capability_tier` is kept as a feature - it's a real SOC observable
|
| 85 |
+
# (analysts typically have an actor cluster hypothesis), and its
|
| 86 |
+
# purity-vs-phase is 0.18 (uniform baseline 0.14), so it isn't an oracle.
|
| 87 |
+
|
| 88 |
+
# `delivery_outcome` is dropped: its purity vs phase is much higher
|
| 89 |
+
# (0.36) - `no_delivery` appears only in early phases, effectively
|
| 90 |
+
# encoding phase position. Keeping it would give the model a near-oracle.
|
| 91 |
+
LEAKY_COLUMNS = [
|
| 92 |
+
"delivery_outcome",
|
| 93 |
+
]
|
| 94 |
+
|
| 95 |
+
# ---------------------------------------------------------------------------
|
| 96 |
+
# Per-timestep numeric features
|
| 97 |
+
# ---------------------------------------------------------------------------
|
| 98 |
+
|
| 99 |
+
DIRECT_NUMERIC_TIMESTEP_FEATURES = [
|
| 100 |
+
"timestep", # strong but non-deterministic phase signal
|
| 101 |
+
"emails_sent_cumulative", # increases through campaign; useful position proxy
|
| 102 |
+
"click_through_rate", # per-campaign constant; informative when combined with timestep
|
| 103 |
+
"credential_submission_rate", # per-campaign constant
|
| 104 |
+
"gateway_detection_score", # per-step variation
|
| 105 |
+
"lure_personalisation_score", # per-campaign constant; tier signal
|
| 106 |
+
"target_department_id", # per-campaign constant; treated as ordinal ID
|
| 107 |
+
]
|
| 108 |
+
|
| 109 |
+
# Per-timestep categoricals
|
| 110 |
+
CATEGORICAL_TIMESTEP_FEATURES = [
|
| 111 |
+
"evasion_technique_active", # 6 levels incl. "none" (82%); active evasion correlates with mid-late phases
|
| 112 |
+
"actor_capability_tier", # 4 levels; mostly per-campaign constant
|
| 113 |
+
]
|
| 114 |
+
|
| 115 |
+
# ---------------------------------------------------------------------------
|
| 116 |
+
# Victim topology features (joined on target_department_id)
|
| 117 |
+
# ---------------------------------------------------------------------------
|
| 118 |
+
|
| 119 |
+
TOPOLOGY_NUMERIC_FEATURES = [
|
| 120 |
+
"employee_count",
|
| 121 |
+
"privileged_account_density",
|
| 122 |
+
"mfa_enrollment_rate",
|
| 123 |
+
"click_susceptibility_base",
|
| 124 |
+
"email_volume_daily",
|
| 125 |
+
]
|
| 126 |
+
|
| 127 |
+
TOPOLOGY_CATEGORICAL_FEATURES = [
|
| 128 |
+
"department_type",
|
| 129 |
+
"industry_sector",
|
| 130 |
+
"awareness_training_level",
|
| 131 |
+
"gateway_architecture",
|
| 132 |
+
"dmarc_enforcement_level",
|
| 133 |
+
]
|
| 134 |
+
|
| 135 |
+
|
| 136 |
+
# ---------------------------------------------------------------------------
|
| 137 |
+
# Engineered features (none derived from phase or timestep alone)
|
| 138 |
+
# ---------------------------------------------------------------------------
|
| 139 |
+
|
| 140 |
+
def _add_engineered_features(df: pd.DataFrame) -> pd.DataFrame:
|
| 141 |
+
"""
|
| 142 |
+
Six engineered features. None directly encode phase; each is a
|
| 143 |
+
behavioural composite that helps disambiguate adjacent phases.
|
| 144 |
+
"""
|
| 145 |
+
df = df.copy()
|
| 146 |
+
|
| 147 |
+
# 1. Log-scaled email volume. emails_sent_cumulative is heavy-tailed
|
| 148 |
+
# (0 in recon, hundreds-to-thousands by post_compromise).
|
| 149 |
+
df["log_emails_sent"] = np.log1p(df["emails_sent_cumulative"].clip(lower=0)).astype(float)
|
| 150 |
+
|
| 151 |
+
# 2. Gateway-blocked step. gateway_detection_score > 0.7 marks
|
| 152 |
+
# high-confidence gateway intervention; common in email_delivery.
|
| 153 |
+
df["is_gateway_blocked_step"] = (df["gateway_detection_score"] > 0.7).astype(int)
|
| 154 |
+
|
| 155 |
+
# 3. Evasion-active flag. Non-"none" evasion_technique_active
|
| 156 |
+
# concentrates in lure_crafting and email_delivery.
|
| 157 |
+
df["is_evasion_active"] = (df["evasion_technique_active"] != "none").astype(int)
|
| 158 |
+
|
| 159 |
+
# 4. High-personalisation flag. lure_personalisation_score > 0.7 is
|
| 160 |
+
# an APT-tier signature.
|
| 161 |
+
df["is_high_personalisation"] = (df["lure_personalisation_score"] > 0.7).astype(int)
|
| 162 |
+
|
| 163 |
+
# 5. Has credential capture flag. credential_submission_rate > 0
|
| 164 |
+
# indicates the campaign has reached credential-capture phases.
|
| 165 |
+
df["has_credential_capture"] = (df["credential_submission_rate"] > 0).astype(int)
|
| 166 |
+
|
| 167 |
+
# 6. Engaged-victim flag. click_through_rate > 0 indicates
|
| 168 |
+
# victim_engagement or later phase.
|
| 169 |
+
df["has_user_engagement"] = (df["click_through_rate"] > 0).astype(int)
|
| 170 |
+
|
| 171 |
+
return df
|
| 172 |
+
|
| 173 |
+
|
| 174 |
+
# ---------------------------------------------------------------------------
|
| 175 |
+
# Public API
|
| 176 |
+
# ---------------------------------------------------------------------------
|
| 177 |
+
|
| 178 |
+
def build_features(
|
| 179 |
+
trajectories_path: str | Path,
|
| 180 |
+
topology_path: str | Path,
|
| 181 |
+
) -> tuple[pd.DataFrame, pd.Series, pd.Series, dict[str, Any]]:
|
| 182 |
+
"""
|
| 183 |
+
Load CSVs, join topology, drop target + leaky columns, engineer features,
|
| 184 |
+
one-hot encode, return (X, y, groups, meta).
|
| 185 |
+
|
| 186 |
+
`groups` is a Series of campaign_id values aligned with X. Use it with
|
| 187 |
+
GroupShuffleSplit / GroupKFold: a single campaign generates ~40
|
| 188 |
+
correlated timesteps; row-level random splitting inflates metrics.
|
| 189 |
+
"""
|
| 190 |
+
traj = pd.read_csv(trajectories_path)
|
| 191 |
+
topo = pd.read_csv(topology_path)
|
| 192 |
+
|
| 193 |
+
y = traj[TARGET_COLUMN].map(LABEL_TO_INT)
|
| 194 |
+
if y.isna().any():
|
| 195 |
+
bad = traj.loc[y.isna(), TARGET_COLUMN].unique()
|
| 196 |
+
raise ValueError(f"Unknown campaign_phase values: {bad}")
|
| 197 |
+
y = y.astype(int)
|
| 198 |
+
groups = traj["campaign_id"].copy()
|
| 199 |
+
|
| 200 |
+
traj = traj.drop(columns=ID_COLUMNS + [TARGET_COLUMN] + LEAKY_COLUMNS,
|
| 201 |
+
errors="ignore")
|
| 202 |
+
|
| 203 |
+
topo_cols_needed = (
|
| 204 |
+
["department_id"]
|
| 205 |
+
+ TOPOLOGY_NUMERIC_FEATURES
|
| 206 |
+
+ TOPOLOGY_CATEGORICAL_FEATURES
|
| 207 |
+
)
|
| 208 |
+
traj = traj.merge(
|
| 209 |
+
topo[topo_cols_needed],
|
| 210 |
+
left_on="target_department_id", right_on="department_id", how="left",
|
| 211 |
+
).drop(columns=["department_id"], errors="ignore")
|
| 212 |
+
|
| 213 |
+
traj = _add_engineered_features(traj)
|
| 214 |
+
|
| 215 |
+
numeric_features = (
|
| 216 |
+
DIRECT_NUMERIC_TIMESTEP_FEATURES
|
| 217 |
+
+ TOPOLOGY_NUMERIC_FEATURES
|
| 218 |
+
+ [
|
| 219 |
+
"log_emails_sent", "is_gateway_blocked_step", "is_evasion_active",
|
| 220 |
+
"is_high_personalisation", "has_credential_capture", "has_user_engagement",
|
| 221 |
+
]
|
| 222 |
+
)
|
| 223 |
+
X_numeric = traj[numeric_features].astype(float)
|
| 224 |
+
|
| 225 |
+
all_categorical = (
|
| 226 |
+
[(col, "timestep") for col in CATEGORICAL_TIMESTEP_FEATURES]
|
| 227 |
+
+ [(col, "topology") for col in TOPOLOGY_CATEGORICAL_FEATURES]
|
| 228 |
+
)
|
| 229 |
+
categorical_levels: dict[str, list[str]] = {}
|
| 230 |
+
blocks: list[pd.DataFrame] = []
|
| 231 |
+
for col, _src in all_categorical:
|
| 232 |
+
if col not in traj.columns:
|
| 233 |
+
continue
|
| 234 |
+
levels = sorted(traj[col].dropna().unique().tolist())
|
| 235 |
+
categorical_levels[col] = levels
|
| 236 |
+
block = pd.get_dummies(
|
| 237 |
+
traj[col].astype("category").cat.set_categories(levels),
|
| 238 |
+
prefix=col, dummy_na=False,
|
| 239 |
+
).astype(int)
|
| 240 |
+
blocks.append(block)
|
| 241 |
+
|
| 242 |
+
X = pd.concat(
|
| 243 |
+
[X_numeric.reset_index(drop=True)]
|
| 244 |
+
+ [b.reset_index(drop=True) for b in blocks],
|
| 245 |
+
axis=1,
|
| 246 |
+
).fillna(0.0)
|
| 247 |
+
|
| 248 |
+
meta = {
|
| 249 |
+
"feature_names": X.columns.tolist(),
|
| 250 |
+
"numeric_features": numeric_features,
|
| 251 |
+
"categorical_levels": categorical_levels,
|
| 252 |
+
"label_to_int": LABEL_TO_INT,
|
| 253 |
+
"int_to_label": INT_TO_LABEL,
|
| 254 |
+
"leakage_excluded": LEAKY_COLUMNS,
|
| 255 |
+
}
|
| 256 |
+
return X, y, groups, meta
|
| 257 |
+
|
| 258 |
+
|
| 259 |
+
def transform_single(
|
| 260 |
+
record: dict | pd.DataFrame,
|
| 261 |
+
meta: dict[str, Any],
|
| 262 |
+
victim_aggregates: dict | None = None,
|
| 263 |
+
) -> np.ndarray:
|
| 264 |
+
"""Encode a single timestep record for inference."""
|
| 265 |
+
if isinstance(record, dict):
|
| 266 |
+
df = pd.DataFrame([record.copy()])
|
| 267 |
+
else:
|
| 268 |
+
df = record.copy()
|
| 269 |
+
|
| 270 |
+
if victim_aggregates is not None:
|
| 271 |
+
for k, v in victim_aggregates.items():
|
| 272 |
+
df[k] = v
|
| 273 |
+
|
| 274 |
+
df = _add_engineered_features(df)
|
| 275 |
+
|
| 276 |
+
numeric = pd.DataFrame({
|
| 277 |
+
col: df.get(col, pd.Series([0.0] * len(df))).astype(float).values
|
| 278 |
+
for col in meta["numeric_features"]
|
| 279 |
+
})
|
| 280 |
+
blocks: list[pd.DataFrame] = [numeric]
|
| 281 |
+
for col, levels in meta["categorical_levels"].items():
|
| 282 |
+
val = df.get(col, pd.Series([None] * len(df)))
|
| 283 |
+
block = pd.get_dummies(
|
| 284 |
+
val.astype("category").cat.set_categories(levels),
|
| 285 |
+
prefix=col, dummy_na=False,
|
| 286 |
+
).astype(int)
|
| 287 |
+
for lvl in levels:
|
| 288 |
+
cname = f"{col}_{lvl}"
|
| 289 |
+
if cname not in block.columns:
|
| 290 |
+
block[cname] = 0
|
| 291 |
+
block = block[[f"{col}_{lvl}" for lvl in levels]]
|
| 292 |
+
blocks.append(block)
|
| 293 |
+
|
| 294 |
+
X = pd.concat(blocks, axis=1).fillna(0.0)
|
| 295 |
+
X = X.reindex(columns=meta["feature_names"], fill_value=0.0)
|
| 296 |
+
return X.values.astype(np.float32)
|
| 297 |
+
|
| 298 |
+
|
| 299 |
+
def save_meta(meta: dict[str, Any], path: str | Path) -> None:
|
| 300 |
+
serializable = {
|
| 301 |
+
"feature_names": meta["feature_names"],
|
| 302 |
+
"numeric_features": meta["numeric_features"],
|
| 303 |
+
"categorical_levels": meta["categorical_levels"],
|
| 304 |
+
"label_to_int": meta["label_to_int"],
|
| 305 |
+
"int_to_label": {str(k): v for k, v in meta["int_to_label"].items()},
|
| 306 |
+
"leakage_excluded": meta.get("leakage_excluded", []),
|
| 307 |
+
}
|
| 308 |
+
with open(path, "w") as f:
|
| 309 |
+
json.dump(serializable, f, indent=2)
|
| 310 |
+
|
| 311 |
+
|
| 312 |
+
def load_meta(path: str | Path) -> dict[str, Any]:
|
| 313 |
+
with open(path) as f:
|
| 314 |
+
meta = json.load(f)
|
| 315 |
+
meta["int_to_label"] = {int(k): v for k, v in meta["int_to_label"].items()}
|
| 316 |
+
return meta
|
| 317 |
+
|
| 318 |
+
|
| 319 |
+
def build_department_lookup(topology_path: str | Path) -> dict[int, dict]:
|
| 320 |
+
"""Build {department_id: {topology features}} for inference-time lookup."""
|
| 321 |
+
topo = pd.read_csv(topology_path)
|
| 322 |
+
cols = TOPOLOGY_NUMERIC_FEATURES + TOPOLOGY_CATEGORICAL_FEATURES
|
| 323 |
+
out = {}
|
| 324 |
+
for _, row in topo.iterrows():
|
| 325 |
+
out[int(row["department_id"])] = {c: row[c] for c in cols if c in topo.columns}
|
| 326 |
+
return out
|
| 327 |
+
|
| 328 |
+
|
| 329 |
+
if __name__ == "__main__":
|
| 330 |
+
import sys
|
| 331 |
+
base = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("/mnt/user-data/uploads")
|
| 332 |
+
X, y, groups, meta = build_features(
|
| 333 |
+
base / "campaign_trajectories.csv",
|
| 334 |
+
base / "victim_topology.csv",
|
| 335 |
+
)
|
| 336 |
+
print(f"X shape: {X.shape}")
|
| 337 |
+
print(f"y shape: {y.shape}")
|
| 338 |
+
print(f"groups: {groups.nunique()} campaigns")
|
| 339 |
+
print(f"n features: {len(meta['feature_names'])}")
|
| 340 |
+
print(f"label distribution:\n{y.map(INT_TO_LABEL).value_counts()}")
|
| 341 |
+
print(f"X has NaN: {X.isnull().any().any()}")
|
feature_meta.json
ADDED
|
@@ -0,0 +1,149 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"feature_names": [
|
| 3 |
+
"timestep",
|
| 4 |
+
"emails_sent_cumulative",
|
| 5 |
+
"click_through_rate",
|
| 6 |
+
"credential_submission_rate",
|
| 7 |
+
"gateway_detection_score",
|
| 8 |
+
"lure_personalisation_score",
|
| 9 |
+
"target_department_id",
|
| 10 |
+
"employee_count",
|
| 11 |
+
"privileged_account_density",
|
| 12 |
+
"mfa_enrollment_rate",
|
| 13 |
+
"click_susceptibility_base",
|
| 14 |
+
"email_volume_daily",
|
| 15 |
+
"log_emails_sent",
|
| 16 |
+
"is_gateway_blocked_step",
|
| 17 |
+
"is_evasion_active",
|
| 18 |
+
"is_high_personalisation",
|
| 19 |
+
"has_credential_capture",
|
| 20 |
+
"has_user_engagement",
|
| 21 |
+
"evasion_technique_active_base64_payload_embedding",
|
| 22 |
+
"evasion_technique_active_homoglyph_substitution",
|
| 23 |
+
"evasion_technique_active_html_obfuscation",
|
| 24 |
+
"evasion_technique_active_image_only_lure",
|
| 25 |
+
"evasion_technique_active_none",
|
| 26 |
+
"evasion_technique_active_redirect_chain",
|
| 27 |
+
"actor_capability_tier_cybercriminal_gang",
|
| 28 |
+
"actor_capability_tier_initial_access_broker",
|
| 29 |
+
"actor_capability_tier_nation_state_apt",
|
| 30 |
+
"actor_capability_tier_opportunistic",
|
| 31 |
+
"department_type_executive_leadership",
|
| 32 |
+
"department_type_finance_accounts_payable",
|
| 33 |
+
"department_type_human_resources",
|
| 34 |
+
"department_type_information_technology",
|
| 35 |
+
"industry_sector_financial_services",
|
| 36 |
+
"industry_sector_government_state_local",
|
| 37 |
+
"industry_sector_retail_ecommerce",
|
| 38 |
+
"industry_sector_technology",
|
| 39 |
+
"awareness_training_level_annual",
|
| 40 |
+
"awareness_training_level_basic",
|
| 41 |
+
"awareness_training_level_continuous",
|
| 42 |
+
"awareness_training_level_none",
|
| 43 |
+
"awareness_training_level_quarterly",
|
| 44 |
+
"gateway_architecture_ai_sender_reputation",
|
| 45 |
+
"gateway_architecture_ensemble_layered_gateway",
|
| 46 |
+
"gateway_architecture_integrated_cloud_defender",
|
| 47 |
+
"gateway_architecture_legacy_spam_filter",
|
| 48 |
+
"gateway_architecture_ml_classifier_gateway",
|
| 49 |
+
"gateway_architecture_rule_based_filter",
|
| 50 |
+
"gateway_architecture_sandbox_detonation",
|
| 51 |
+
"gateway_architecture_zero_trust_email_proxy",
|
| 52 |
+
"dmarc_enforcement_level_monitoring",
|
| 53 |
+
"dmarc_enforcement_level_none",
|
| 54 |
+
"dmarc_enforcement_level_quarantine",
|
| 55 |
+
"dmarc_enforcement_level_reject"
|
| 56 |
+
],
|
| 57 |
+
"numeric_features": [
|
| 58 |
+
"timestep",
|
| 59 |
+
"emails_sent_cumulative",
|
| 60 |
+
"click_through_rate",
|
| 61 |
+
"credential_submission_rate",
|
| 62 |
+
"gateway_detection_score",
|
| 63 |
+
"lure_personalisation_score",
|
| 64 |
+
"target_department_id",
|
| 65 |
+
"employee_count",
|
| 66 |
+
"privileged_account_density",
|
| 67 |
+
"mfa_enrollment_rate",
|
| 68 |
+
"click_susceptibility_base",
|
| 69 |
+
"email_volume_daily",
|
| 70 |
+
"log_emails_sent",
|
| 71 |
+
"is_gateway_blocked_step",
|
| 72 |
+
"is_evasion_active",
|
| 73 |
+
"is_high_personalisation",
|
| 74 |
+
"has_credential_capture",
|
| 75 |
+
"has_user_engagement"
|
| 76 |
+
],
|
| 77 |
+
"categorical_levels": {
|
| 78 |
+
"evasion_technique_active": [
|
| 79 |
+
"base64_payload_embedding",
|
| 80 |
+
"homoglyph_substitution",
|
| 81 |
+
"html_obfuscation",
|
| 82 |
+
"image_only_lure",
|
| 83 |
+
"none",
|
| 84 |
+
"redirect_chain"
|
| 85 |
+
],
|
| 86 |
+
"actor_capability_tier": [
|
| 87 |
+
"cybercriminal_gang",
|
| 88 |
+
"initial_access_broker",
|
| 89 |
+
"nation_state_apt",
|
| 90 |
+
"opportunistic"
|
| 91 |
+
],
|
| 92 |
+
"department_type": [
|
| 93 |
+
"executive_leadership",
|
| 94 |
+
"finance_accounts_payable",
|
| 95 |
+
"human_resources",
|
| 96 |
+
"information_technology"
|
| 97 |
+
],
|
| 98 |
+
"industry_sector": [
|
| 99 |
+
"financial_services",
|
| 100 |
+
"government_state_local",
|
| 101 |
+
"retail_ecommerce",
|
| 102 |
+
"technology"
|
| 103 |
+
],
|
| 104 |
+
"awareness_training_level": [
|
| 105 |
+
"annual",
|
| 106 |
+
"basic",
|
| 107 |
+
"continuous",
|
| 108 |
+
"none",
|
| 109 |
+
"quarterly"
|
| 110 |
+
],
|
| 111 |
+
"gateway_architecture": [
|
| 112 |
+
"ai_sender_reputation",
|
| 113 |
+
"ensemble_layered_gateway",
|
| 114 |
+
"integrated_cloud_defender",
|
| 115 |
+
"legacy_spam_filter",
|
| 116 |
+
"ml_classifier_gateway",
|
| 117 |
+
"rule_based_filter",
|
| 118 |
+
"sandbox_detonation",
|
| 119 |
+
"zero_trust_email_proxy"
|
| 120 |
+
],
|
| 121 |
+
"dmarc_enforcement_level": [
|
| 122 |
+
"monitoring",
|
| 123 |
+
"none",
|
| 124 |
+
"quarantine",
|
| 125 |
+
"reject"
|
| 126 |
+
]
|
| 127 |
+
},
|
| 128 |
+
"label_to_int": {
|
| 129 |
+
"target_reconnaissance": 0,
|
| 130 |
+
"infrastructure_setup": 1,
|
| 131 |
+
"lure_crafting": 2,
|
| 132 |
+
"email_delivery": 3,
|
| 133 |
+
"victim_engagement": 4,
|
| 134 |
+
"credential_harvesting": 5,
|
| 135 |
+
"post_compromise_escalation": 6
|
| 136 |
+
},
|
| 137 |
+
"int_to_label": {
|
| 138 |
+
"0": "target_reconnaissance",
|
| 139 |
+
"1": "infrastructure_setup",
|
| 140 |
+
"2": "lure_crafting",
|
| 141 |
+
"3": "email_delivery",
|
| 142 |
+
"4": "victim_engagement",
|
| 143 |
+
"5": "credential_harvesting",
|
| 144 |
+
"6": "post_compromise_escalation"
|
| 145 |
+
},
|
| 146 |
+
"leakage_excluded": [
|
| 147 |
+
"delivery_outcome"
|
| 148 |
+
]
|
| 149 |
+
}
|
feature_scaler.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"mean": [19.882267966775007, 264.6323582520766, 0.052154893463344176, 0.03361545684362586, 0.6047568436258577, 0.4326537739256049, 17.127121704586493, 172.1491513181654, 0.7521711809317442, 0.8172881906825569, 0.07423943661971831, 1151.8490429758035, 3.894440315600361, 0.30371975442397975, 0.1758757674250632, 0.11917659804983749, 1.0, 1.0, 0.030335861321776816, 0.053810039725532686, 0.04333694474539545, 0.025279884434814014, 0.8241242325749368, 0.02311303719754424, 0.18923799205489347, 0.10220296135789093, 0.10509209100758396, 0.6034669555796316, 0.27230046948356806, 0.26291079812206575, 0.1632358252076562, 0.30155290718671, 0.27230046948356806, 0.30155290718671, 0.1632358252076562, 0.26291079812206575, 0.30841459010473093, 0.16143011917659805, 0.20548934633441676, 0.29360780065005415, 0.031058143734200072, 0.12639942217407008, 0.13578909353557242, 0.14590104730949802, 0.11014806789454677, 0.06608884073672806, 0.09750812567713976, 0.13867822318526543, 0.1794871794871795, 0.09750812567713976, 0.11014806789454677, 0.2047670639219935, 0.58757674250632], "std": [12.12092281961143, 240.98788415799402, 0.020507195059365872, 0.012951632990740584, 0.16345254609210969, 0.1787513429787685, 9.161154583852591, 85.48823018511177, 0.13799067057693098, 0.10193473774948415, 0.02923768201623528, 772.2778476847263, 2.791161927013341, 0.45994615422530144, 0.38078320056479364, 0.3240547183619046, 1.0, 1.0, 0.1715407352835541, 0.22568321453759693, 0.20365125061044834, 0.15700227356694219, 0.38078320056479364, 0.15028965965395197, 0.3917683030033992, 0.30296974343852234, 0.30672743633583477, 0.4892658167708199, 0.4452241130504305, 0.4402939026663002, 0.3696474491122128, 0.45901507812196696, 0.4452241130504305, 0.45901507812196696, 0.3696474491122128, 0.4402939026663002, 0.4619221667863049, 0.3679936701948691, 0.40413173266488067, 0.4554966395152088, 0.17350621713351017, 0.3323589938739654, 0.34262634311115486, 0.35307074530111743, 0.31313077339534806, 0.24848220047758568, 0.2967020106382041, 0.3456728601775323, 0.38382904647787225, 0.2967020106382041, 0.31313077339534806, 0.40360418981498464, 0.4923594837624244]}
|
inference_example.ipynb
ADDED
|
@@ -0,0 +1,320 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"cells": [
|
| 3 |
+
{
|
| 4 |
+
"cell_type": "markdown",
|
| 5 |
+
"metadata": {},
|
| 6 |
+
"source": [
|
| 7 |
+
"# CYB004 Baseline Classifier — Inference Example\n",
|
| 8 |
+
"\n",
|
| 9 |
+
"End-to-end demo: load the trained XGBoost and PyTorch MLP models from the Hugging Face repo and predict the **phishing campaign phase** of a new per-timestep telemetry record.\n",
|
| 10 |
+
"\n",
|
| 11 |
+
"**Models predict one of 7 phases:** `target_reconnaissance`, `infrastructure_setup`, `lure_crafting`, `email_delivery`, `victim_engagement`, `credential_harvesting`, `post_compromise_escalation`.\n",
|
| 12 |
+
"\n",
|
| 13 |
+
"**This is a baseline reference model**, not a production email-security platform. See the model card for full metrics and limitations."
|
| 14 |
+
]
|
| 15 |
+
},
|
| 16 |
+
{
|
| 17 |
+
"cell_type": "markdown",
|
| 18 |
+
"metadata": {},
|
| 19 |
+
"source": [
|
| 20 |
+
"## 1. Install dependencies"
|
| 21 |
+
]
|
| 22 |
+
},
|
| 23 |
+
{
|
| 24 |
+
"cell_type": "code",
|
| 25 |
+
"execution_count": null,
|
| 26 |
+
"metadata": {},
|
| 27 |
+
"outputs": [],
|
| 28 |
+
"source": [
|
| 29 |
+
"%pip install --quiet xgboost torch safetensors pandas numpy huggingface_hub"
|
| 30 |
+
]
|
| 31 |
+
},
|
| 32 |
+
{
|
| 33 |
+
"cell_type": "markdown",
|
| 34 |
+
"metadata": {},
|
| 35 |
+
"source": [
|
| 36 |
+
"## 2. Download model artifacts from Hugging Face"
|
| 37 |
+
]
|
| 38 |
+
},
|
| 39 |
+
{
|
| 40 |
+
"cell_type": "code",
|
| 41 |
+
"execution_count": null,
|
| 42 |
+
"metadata": {},
|
| 43 |
+
"outputs": [],
|
| 44 |
+
"source": [
|
| 45 |
+
"from huggingface_hub import hf_hub_download\n",
|
| 46 |
+
"\n",
|
| 47 |
+
"REPO_ID = \"xpertsystems/cyb004-baseline-classifier\"\n",
|
| 48 |
+
"\n",
|
| 49 |
+
"files = {}\n",
|
| 50 |
+
"for name in [\"model_xgb.json\", \"model_mlp.safetensors\",\n",
|
| 51 |
+
" \"feature_engineering.py\", \"feature_meta.json\",\n",
|
| 52 |
+
" \"feature_scaler.json\"]:\n",
|
| 53 |
+
" files[name] = hf_hub_download(repo_id=REPO_ID, filename=name)\n",
|
| 54 |
+
" print(f\" downloaded: {name}\")"
|
| 55 |
+
]
|
| 56 |
+
},
|
| 57 |
+
{
|
| 58 |
+
"cell_type": "code",
|
| 59 |
+
"execution_count": null,
|
| 60 |
+
"metadata": {},
|
| 61 |
+
"outputs": [],
|
| 62 |
+
"source": [
|
| 63 |
+
"import sys, os\n",
|
| 64 |
+
"fe_dir = os.path.dirname(files[\"feature_engineering.py\"])\n",
|
| 65 |
+
"if fe_dir not in sys.path:\n",
|
| 66 |
+
" sys.path.insert(0, fe_dir)\n",
|
| 67 |
+
"\n",
|
| 68 |
+
"from feature_engineering import (\n",
|
| 69 |
+
" transform_single, load_meta, INT_TO_LABEL, build_department_lookup\n",
|
| 70 |
+
")"
|
| 71 |
+
]
|
| 72 |
+
},
|
| 73 |
+
{
|
| 74 |
+
"cell_type": "markdown",
|
| 75 |
+
"metadata": {},
|
| 76 |
+
"source": [
|
| 77 |
+
"## 3. Load models and metadata"
|
| 78 |
+
]
|
| 79 |
+
},
|
| 80 |
+
{
|
| 81 |
+
"cell_type": "code",
|
| 82 |
+
"execution_count": null,
|
| 83 |
+
"metadata": {},
|
| 84 |
+
"outputs": [],
|
| 85 |
+
"source": [
|
| 86 |
+
"import json\n",
|
| 87 |
+
"import numpy as np\n",
|
| 88 |
+
"import torch\n",
|
| 89 |
+
"import torch.nn as nn\n",
|
| 90 |
+
"import xgboost as xgb\n",
|
| 91 |
+
"from safetensors.torch import load_file\n",
|
| 92 |
+
"\n",
|
| 93 |
+
"meta = load_meta(files[\"feature_meta.json\"])\n",
|
| 94 |
+
"with open(files[\"feature_scaler.json\"]) as f:\n",
|
| 95 |
+
" scaler = json.load(f)\n",
|
| 96 |
+
"\n",
|
| 97 |
+
"N_FEATURES = len(meta[\"feature_names\"])\n",
|
| 98 |
+
"N_CLASSES = len(meta[\"int_to_label\"])\n",
|
| 99 |
+
"print(f\"feature count: {N_FEATURES}\")\n",
|
| 100 |
+
"print(f\"class count: {N_CLASSES}\")\n",
|
| 101 |
+
"print(f\"label classes: {list(meta['int_to_label'].values())}\")"
|
| 102 |
+
]
|
| 103 |
+
},
|
| 104 |
+
{
|
| 105 |
+
"cell_type": "code",
|
| 106 |
+
"execution_count": null,
|
| 107 |
+
"metadata": {},
|
| 108 |
+
"outputs": [],
|
| 109 |
+
"source": [
|
| 110 |
+
"# XGBoost\n",
|
| 111 |
+
"xgb_model = xgb.XGBClassifier()\n",
|
| 112 |
+
"xgb_model.load_model(files[\"model_xgb.json\"])\n",
|
| 113 |
+
"\n",
|
| 114 |
+
"# MLP architecture (must match training)\n",
|
| 115 |
+
"class PhaseMLP(nn.Module):\n",
|
| 116 |
+
" def __init__(self, n_features, n_classes=7, hidden1=128, hidden2=64, dropout=0.3):\n",
|
| 117 |
+
" super().__init__()\n",
|
| 118 |
+
" self.net = nn.Sequential(\n",
|
| 119 |
+
" nn.Linear(n_features, hidden1),\n",
|
| 120 |
+
" nn.BatchNorm1d(hidden1),\n",
|
| 121 |
+
" nn.ReLU(),\n",
|
| 122 |
+
" nn.Dropout(dropout),\n",
|
| 123 |
+
" nn.Linear(hidden1, hidden2),\n",
|
| 124 |
+
" nn.BatchNorm1d(hidden2),\n",
|
| 125 |
+
" nn.ReLU(),\n",
|
| 126 |
+
" nn.Dropout(dropout),\n",
|
| 127 |
+
" nn.Linear(hidden2, n_classes),\n",
|
| 128 |
+
" )\n",
|
| 129 |
+
" def forward(self, x):\n",
|
| 130 |
+
" return self.net(x)\n",
|
| 131 |
+
"\n",
|
| 132 |
+
"mlp_model = PhaseMLP(N_FEATURES, n_classes=N_CLASSES)\n",
|
| 133 |
+
"mlp_model.load_state_dict(load_file(files[\"model_mlp.safetensors\"]))\n",
|
| 134 |
+
"mlp_model.eval()\n",
|
| 135 |
+
"print(\"models loaded\")"
|
| 136 |
+
]
|
| 137 |
+
},
|
| 138 |
+
{
|
| 139 |
+
"cell_type": "markdown",
|
| 140 |
+
"metadata": {},
|
| 141 |
+
"source": [
|
| 142 |
+
"## 4. Build the department lookup\n",
|
| 143 |
+
"\n",
|
| 144 |
+
"Per-department topology features (employee_count, MFA enrollment, gateway architecture, DMARC level, etc.) are pulled from `victim_topology.csv` and merged into each timestep record by `target_department_id`."
|
| 145 |
+
]
|
| 146 |
+
},
|
| 147 |
+
{
|
| 148 |
+
"cell_type": "code",
|
| 149 |
+
"execution_count": null,
|
| 150 |
+
"metadata": {},
|
| 151 |
+
"outputs": [],
|
| 152 |
+
"source": [
|
| 153 |
+
"from huggingface_hub import snapshot_download\n",
|
| 154 |
+
"\n",
|
| 155 |
+
"ds_path = snapshot_download(repo_id=\"xpertsystems/cyb004-sample\", repo_type=\"dataset\")\n",
|
| 156 |
+
"\n",
|
| 157 |
+
"dept_lookup = build_department_lookup(\n",
|
| 158 |
+
" os.path.join(ds_path, \"victim_topology.csv\")\n",
|
| 159 |
+
")\n",
|
| 160 |
+
"print(f\"loaded {len(dept_lookup)} department profiles\")"
|
| 161 |
+
]
|
| 162 |
+
},
|
| 163 |
+
{
|
| 164 |
+
"cell_type": "markdown",
|
| 165 |
+
"metadata": {},
|
| 166 |
+
"source": [
|
| 167 |
+
"## 5. Prediction helper"
|
| 168 |
+
]
|
| 169 |
+
},
|
| 170 |
+
{
|
| 171 |
+
"cell_type": "code",
|
| 172 |
+
"execution_count": null,
|
| 173 |
+
"metadata": {},
|
| 174 |
+
"outputs": [],
|
| 175 |
+
"source": [
|
| 176 |
+
"MU = np.array(scaler[\"mean\"], dtype=np.float32)\n",
|
| 177 |
+
"SD = np.array(scaler[\"std\"], dtype=np.float32)\n",
|
| 178 |
+
"\n",
|
| 179 |
+
"def predict_phase(record: dict) -> dict:\n",
|
| 180 |
+
" \"\"\"Predict the campaign phase for one per-timestep telemetry record.\n",
|
| 181 |
+
"\n",
|
| 182 |
+
" Per-department topology features are pulled automatically via\n",
|
| 183 |
+
" `target_department_id` from the dept_lookup loaded above.\n",
|
| 184 |
+
" \"\"\"\n",
|
| 185 |
+
" dept_id = int(record.get(\"target_department_id\", -1))\n",
|
| 186 |
+
" dept_aggs = dept_lookup.get(dept_id, {})\n",
|
| 187 |
+
" X = transform_single(record, meta, victim_aggregates=dept_aggs)\n",
|
| 188 |
+
"\n",
|
| 189 |
+
" xgb_proba = xgb_model.predict_proba(X)[0]\n",
|
| 190 |
+
" xgb_label = INT_TO_LABEL[int(np.argmax(xgb_proba))]\n",
|
| 191 |
+
"\n",
|
| 192 |
+
" Xs = ((X - MU) / SD).astype(np.float32)\n",
|
| 193 |
+
" with torch.no_grad():\n",
|
| 194 |
+
" logits = mlp_model(torch.tensor(Xs))\n",
|
| 195 |
+
" mlp_proba = torch.softmax(logits, dim=1).numpy()[0]\n",
|
| 196 |
+
" mlp_label = INT_TO_LABEL[int(np.argmax(mlp_proba))]\n",
|
| 197 |
+
"\n",
|
| 198 |
+
" return {\n",
|
| 199 |
+
" \"xgboost\": {\n",
|
| 200 |
+
" \"label\": xgb_label,\n",
|
| 201 |
+
" \"probabilities\": {INT_TO_LABEL[i]: float(p) for i, p in enumerate(xgb_proba)},\n",
|
| 202 |
+
" },\n",
|
| 203 |
+
" \"mlp\": {\n",
|
| 204 |
+
" \"label\": mlp_label,\n",
|
| 205 |
+
" \"probabilities\": {INT_TO_LABEL[i]: float(p) for i, p in enumerate(mlp_proba)},\n",
|
| 206 |
+
" },\n",
|
| 207 |
+
" }"
|
| 208 |
+
]
|
| 209 |
+
},
|
| 210 |
+
{
|
| 211 |
+
"cell_type": "markdown",
|
| 212 |
+
"metadata": {},
|
| 213 |
+
"source": [
|
| 214 |
+
"## 6. Run on an example record\n",
|
| 215 |
+
"\n",
|
| 216 |
+
"Real `email_delivery` event lifted from the sample dataset: a nation-state APT campaign at timestep 13, with homoglyph substitution evasion active and 58 emails sent. Both models should predict `email_delivery`."
|
| 217 |
+
]
|
| 218 |
+
},
|
| 219 |
+
{
|
| 220 |
+
"cell_type": "code",
|
| 221 |
+
"execution_count": null,
|
| 222 |
+
"metadata": {},
|
| 223 |
+
"outputs": [],
|
| 224 |
+
"source": [
|
| 225 |
+
"# Real timestep record from the sample dataset (true phase: email_delivery)\n",
|
| 226 |
+
"example_record = {\n",
|
| 227 |
+
" \"timestep\": 13,\n",
|
| 228 |
+
" \"emails_sent_cumulative\": 58,\n",
|
| 229 |
+
" \"click_through_rate\": 0.1158,\n",
|
| 230 |
+
" \"credential_submission_rate\": 0.0713,\n",
|
| 231 |
+
" \"gateway_detection_score\": 0.7327,\n",
|
| 232 |
+
" \"lure_personalisation_score\": 0.7507,\n",
|
| 233 |
+
" \"evasion_technique_active\": \"homoglyph_substitution\",\n",
|
| 234 |
+
" \"target_department_id\": 10,\n",
|
| 235 |
+
" \"actor_capability_tier\": \"nation_state_apt\",\n",
|
| 236 |
+
"}\n",
|
| 237 |
+
"\n",
|
| 238 |
+
"result = predict_phase(example_record)\n",
|
| 239 |
+
"\n",
|
| 240 |
+
"print(f\"XGBoost -> {result['xgboost']['label']}\")\n",
|
| 241 |
+
"for lbl, p in sorted(result['xgboost']['probabilities'].items(), key=lambda x: -x[1])[:5]:\n",
|
| 242 |
+
" print(f\" P({lbl:30s}) = {p:.4f}\")\n",
|
| 243 |
+
"\n",
|
| 244 |
+
"print(f\"\\nMLP -> {result['mlp']['label']}\")\n",
|
| 245 |
+
"for lbl, p in sorted(result['mlp']['probabilities'].items(), key=lambda x: -x[1])[:5]:\n",
|
| 246 |
+
" print(f\" P({lbl:30s}) = {p:.4f}\")"
|
| 247 |
+
]
|
| 248 |
+
},
|
| 249 |
+
{
|
| 250 |
+
"cell_type": "markdown",
|
| 251 |
+
"metadata": {},
|
| 252 |
+
"source": [
|
| 253 |
+
"### Note: when the two models disagree\n",
|
| 254 |
+
"\n",
|
| 255 |
+
"XGBoost and the MLP can disagree on mid-pipeline phases (`victim_engagement`, `credential_harvesting`) where timestep windows overlap. The per-class F1 in the model card identifies which phases are robustly predicted vs. which are not. In a SOC workflow, conflicting predictions are worth surfacing for human review."
|
| 256 |
+
]
|
| 257 |
+
},
|
| 258 |
+
{
|
| 259 |
+
"cell_type": "markdown",
|
| 260 |
+
"metadata": {},
|
| 261 |
+
"source": [
|
| 262 |
+
"## 7. Batch prediction on the sample dataset"
|
| 263 |
+
]
|
| 264 |
+
},
|
| 265 |
+
{
|
| 266 |
+
"cell_type": "code",
|
| 267 |
+
"execution_count": null,
|
| 268 |
+
"metadata": {},
|
| 269 |
+
"outputs": [],
|
| 270 |
+
"source": [
|
| 271 |
+
"import pandas as pd\n",
|
| 272 |
+
"\n",
|
| 273 |
+
"traj = pd.read_csv(f\"{ds_path}/campaign_trajectories.csv\")\n",
|
| 274 |
+
"\n",
|
| 275 |
+
"# Drop the leaky column the model was never trained on\n",
|
| 276 |
+
"traj = traj.drop(columns=[\"delivery_outcome\"], errors=\"ignore\")\n",
|
| 277 |
+
"\n",
|
| 278 |
+
"# Score the first 200 timesteps\n",
|
| 279 |
+
"sample = traj.head(200).copy()\n",
|
| 280 |
+
"preds = [predict_phase(row.to_dict())[\"xgboost\"][\"label\"] for _, row in sample.iterrows()]\n",
|
| 281 |
+
"sample[\"xgb_pred\"] = preds\n",
|
| 282 |
+
"\n",
|
| 283 |
+
"ct = pd.crosstab(sample[\"campaign_phase\"], sample[\"xgb_pred\"],\n",
|
| 284 |
+
" rownames=[\"true\"], colnames=[\"pred\"])\n",
|
| 285 |
+
"print(\"Confusion on first 200 sample rows (XGBoost):\")\n",
|
| 286 |
+
"print(ct)\n",
|
| 287 |
+
"acc = (sample[\"campaign_phase\"] == sample[\"xgb_pred\"]).mean()\n",
|
| 288 |
+
"print(f\"\\nbatch accuracy on first 200 rows (in-distribution): {acc:.4f}\")\n",
|
| 289 |
+
"print(\"\\nNote: these rows include training-set campaigns. See validation_results.json\\n\"\n",
|
| 290 |
+
" \"for proper held-out test metrics from disjoint campaigns.\")"
|
| 291 |
+
]
|
| 292 |
+
},
|
| 293 |
+
{
|
| 294 |
+
"cell_type": "markdown",
|
| 295 |
+
"metadata": {},
|
| 296 |
+
"source": [
|
| 297 |
+
"## 8. Next steps\n",
|
| 298 |
+
"\n",
|
| 299 |
+
"- See `validation_results.json` for held-out test metrics (15 disjoint campaigns, ~580 timesteps).\n",
|
| 300 |
+
"- See `multi_seed_results.json` for the across-10-seeds robustness picture (accuracy 0.649 ± 0.038, ROC-AUC 0.937 ± 0.010).\n",
|
| 301 |
+
"- See `ablation_results.json` for per-feature-group contribution. `timestep` carries the dominant signal.\n",
|
| 302 |
+
"- The model card explains why `actor_capability_tier` was *not* used as the target despite being the README's headline use case.\n",
|
| 303 |
+
"- For the full 335k-row CYB004 dataset and commercial licensing, contact **pradeep@xpertsystems.ai**."
|
| 304 |
+
]
|
| 305 |
+
}
|
| 306 |
+
],
|
| 307 |
+
"metadata": {
|
| 308 |
+
"kernelspec": {
|
| 309 |
+
"display_name": "Python 3",
|
| 310 |
+
"language": "python",
|
| 311 |
+
"name": "python3"
|
| 312 |
+
},
|
| 313 |
+
"language_info": {
|
| 314 |
+
"name": "python",
|
| 315 |
+
"version": "3.10"
|
| 316 |
+
}
|
| 317 |
+
},
|
| 318 |
+
"nbformat": 4,
|
| 319 |
+
"nbformat_minor": 5
|
| 320 |
+
}
|
model_mlp.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:072999e38cd542460473780a9c71164efc1a53a1037a4b579064cc93f3f5b4b8
|
| 3 |
+
size 66788
|
model_xgb.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
multi_seed_results.json
ADDED
|
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"purpose": "With n=100 campaigns, single-seed metrics carry test-fold variance. Multi-seed evaluation gives a more reliable picture.",
|
| 3 |
+
"seeds_evaluated": [
|
| 4 |
+
42,
|
| 5 |
+
7,
|
| 6 |
+
13,
|
| 7 |
+
17,
|
| 8 |
+
23,
|
| 9 |
+
31,
|
| 10 |
+
45,
|
| 11 |
+
99,
|
| 12 |
+
123,
|
| 13 |
+
200
|
| 14 |
+
],
|
| 15 |
+
"per_seed": [
|
| 16 |
+
{
|
| 17 |
+
"seed": 42,
|
| 18 |
+
"test_n_classes": 7,
|
| 19 |
+
"accuracy": 0.6547008547008547,
|
| 20 |
+
"macro_f1": 0.6401276666852063,
|
| 21 |
+
"macro_roc_auc_ovr": 0.935584434710217
|
| 22 |
+
},
|
| 23 |
+
{
|
| 24 |
+
"seed": 7,
|
| 25 |
+
"test_n_classes": 7,
|
| 26 |
+
"accuracy": 0.6267123287671232,
|
| 27 |
+
"macro_f1": 0.6141815367358149,
|
| 28 |
+
"macro_roc_auc_ovr": 0.9256987657069029
|
| 29 |
+
},
|
| 30 |
+
{
|
| 31 |
+
"seed": 13,
|
| 32 |
+
"test_n_classes": 7,
|
| 33 |
+
"accuracy": 0.5983050847457627,
|
| 34 |
+
"macro_f1": 0.5953435905708684,
|
| 35 |
+
"macro_roc_auc_ovr": 0.9235372520169014
|
| 36 |
+
},
|
| 37 |
+
{
|
| 38 |
+
"seed": 17,
|
| 39 |
+
"test_n_classes": 7,
|
| 40 |
+
"accuracy": 0.64349376114082,
|
| 41 |
+
"macro_f1": 0.6328717716731788,
|
| 42 |
+
"macro_roc_auc_ovr": 0.9426545946495839
|
| 43 |
+
},
|
| 44 |
+
{
|
| 45 |
+
"seed": 23,
|
| 46 |
+
"test_n_classes": 7,
|
| 47 |
+
"accuracy": 0.5915254237288136,
|
| 48 |
+
"macro_f1": 0.5734921834318393,
|
| 49 |
+
"macro_roc_auc_ovr": 0.9245031023094512
|
| 50 |
+
},
|
| 51 |
+
{
|
| 52 |
+
"seed": 31,
|
| 53 |
+
"test_n_classes": 7,
|
| 54 |
+
"accuracy": 0.6220095693779905,
|
| 55 |
+
"macro_f1": 0.6103022022937624,
|
| 56 |
+
"macro_roc_auc_ovr": 0.9325576570435162
|
| 57 |
+
},
|
| 58 |
+
{
|
| 59 |
+
"seed": 45,
|
| 60 |
+
"test_n_classes": 7,
|
| 61 |
+
"accuracy": 0.6678082191780822,
|
| 62 |
+
"macro_f1": 0.655097964659693,
|
| 63 |
+
"macro_roc_auc_ovr": 0.9396074000285977
|
| 64 |
+
},
|
| 65 |
+
{
|
| 66 |
+
"seed": 99,
|
| 67 |
+
"test_n_classes": 7,
|
| 68 |
+
"accuracy": 0.7111111111111111,
|
| 69 |
+
"macro_f1": 0.7136854710276727,
|
| 70 |
+
"macro_roc_auc_ovr": 0.9538147161172963
|
| 71 |
+
},
|
| 72 |
+
{
|
| 73 |
+
"seed": 123,
|
| 74 |
+
"test_n_classes": 7,
|
| 75 |
+
"accuracy": 0.6823734729493892,
|
| 76 |
+
"macro_f1": 0.6727927606720584,
|
| 77 |
+
"macro_roc_auc_ovr": 0.9443324151480283
|
| 78 |
+
},
|
| 79 |
+
{
|
| 80 |
+
"seed": 200,
|
| 81 |
+
"test_n_classes": 7,
|
| 82 |
+
"accuracy": 0.6931407942238267,
|
| 83 |
+
"macro_f1": 0.6752712902262269,
|
| 84 |
+
"macro_roc_auc_ovr": 0.9450377543018418
|
| 85 |
+
}
|
| 86 |
+
],
|
| 87 |
+
"aggregate": {
|
| 88 |
+
"accuracy_mean": 0.6491180619923773,
|
| 89 |
+
"accuracy_std": 0.03799334369624316,
|
| 90 |
+
"accuracy_min": 0.5915254237288136,
|
| 91 |
+
"accuracy_max": 0.7111111111111111,
|
| 92 |
+
"macro_f1_mean": 0.638316643797632,
|
| 93 |
+
"macro_f1_std": 0.039956794294168915,
|
| 94 |
+
"roc_auc_mean": 0.9367328092032338,
|
| 95 |
+
"roc_auc_std": 0.009623085359130642
|
| 96 |
+
},
|
| 97 |
+
"published_artifact_seed": 42
|
| 98 |
+
}
|
validation_results.json
ADDED
|
@@ -0,0 +1,246 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"version": "1.0.0",
|
| 3 |
+
"dataset": "xpertsystems/cyb004-sample",
|
| 4 |
+
"task": "7-class campaign_phase classification",
|
| 5 |
+
"baselines": {
|
| 6 |
+
"always_predict_majority_accuracy": 0.24444444444444444,
|
| 7 |
+
"majority_class": "email_delivery",
|
| 8 |
+
"random_guess_accuracy": 0.14285714285714285
|
| 9 |
+
},
|
| 10 |
+
"split": {
|
| 11 |
+
"strategy": "group_aware (GroupShuffleSplit by campaign_id, nested)",
|
| 12 |
+
"rationale": "100 phishing campaigns generate ~3,952 timesteps (~40 per campaign). Random row-split would leak per-campaign correlations into the test fold. Group-aware split keeps train/val/test campaigns disjoint.",
|
| 13 |
+
"campaigns_train": 69,
|
| 14 |
+
"campaigns_val": 16,
|
| 15 |
+
"campaigns_test": 15,
|
| 16 |
+
"timesteps_train": 2769,
|
| 17 |
+
"timesteps_val": 598,
|
| 18 |
+
"timesteps_test": 585,
|
| 19 |
+
"seed": 42
|
| 20 |
+
},
|
| 21 |
+
"n_features": 53,
|
| 22 |
+
"label_classes": [
|
| 23 |
+
"target_reconnaissance",
|
| 24 |
+
"infrastructure_setup",
|
| 25 |
+
"lure_crafting",
|
| 26 |
+
"email_delivery",
|
| 27 |
+
"victim_engagement",
|
| 28 |
+
"credential_harvesting",
|
| 29 |
+
"post_compromise_escalation"
|
| 30 |
+
],
|
| 31 |
+
"class_distribution_train": {
|
| 32 |
+
"email_delivery": 655,
|
| 33 |
+
"victim_engagement": 459,
|
| 34 |
+
"post_compromise_escalation": 388,
|
| 35 |
+
"target_reconnaissance": 381,
|
| 36 |
+
"credential_harvesting": 352,
|
| 37 |
+
"lure_crafting": 300,
|
| 38 |
+
"infrastructure_setup": 234
|
| 39 |
+
},
|
| 40 |
+
"class_distribution_test": {
|
| 41 |
+
"email_delivery": 143,
|
| 42 |
+
"victim_engagement": 100,
|
| 43 |
+
"target_reconnaissance": 84,
|
| 44 |
+
"post_compromise_escalation": 75,
|
| 45 |
+
"lure_crafting": 67,
|
| 46 |
+
"credential_harvesting": 63,
|
| 47 |
+
"infrastructure_setup": 53
|
| 48 |
+
},
|
| 49 |
+
"leakage_excluded_features": [
|
| 50 |
+
"delivery_outcome (purity 0.36 vs phase; no_delivery appears only in early phases - near-oracle)"
|
| 51 |
+
],
|
| 52 |
+
"models": {
|
| 53 |
+
"xgboost": {
|
| 54 |
+
"architecture": "Gradient-boosted decision trees, multi:softprob, 7 classes",
|
| 55 |
+
"framework": "xgboost",
|
| 56 |
+
"test_metrics": {
|
| 57 |
+
"model": "xgboost",
|
| 58 |
+
"accuracy": 0.6547008547008547,
|
| 59 |
+
"macro_f1": 0.6401276666852063,
|
| 60 |
+
"weighted_f1": 0.657179533714298,
|
| 61 |
+
"per_class_f1": {
|
| 62 |
+
"target_reconnaissance": 0.8875739644970414,
|
| 63 |
+
"infrastructure_setup": 0.7115384615384616,
|
| 64 |
+
"lure_crafting": 0.6762589928057554,
|
| 65 |
+
"email_delivery": 0.7913669064748201,
|
| 66 |
+
"victim_engagement": 0.46938775510204084,
|
| 67 |
+
"credential_harvesting": 0.34074074074074073,
|
| 68 |
+
"post_compromise_escalation": 0.6040268456375839
|
| 69 |
+
},
|
| 70 |
+
"confusion_matrix": {
|
| 71 |
+
"labels": [
|
| 72 |
+
"target_reconnaissance",
|
| 73 |
+
"infrastructure_setup",
|
| 74 |
+
"lure_crafting",
|
| 75 |
+
"email_delivery",
|
| 76 |
+
"victim_engagement",
|
| 77 |
+
"credential_harvesting",
|
| 78 |
+
"post_compromise_escalation"
|
| 79 |
+
],
|
| 80 |
+
"matrix": [
|
| 81 |
+
[
|
| 82 |
+
75,
|
| 83 |
+
0,
|
| 84 |
+
9,
|
| 85 |
+
0,
|
| 86 |
+
0,
|
| 87 |
+
0,
|
| 88 |
+
0
|
| 89 |
+
],
|
| 90 |
+
[
|
| 91 |
+
0,
|
| 92 |
+
37,
|
| 93 |
+
16,
|
| 94 |
+
0,
|
| 95 |
+
0,
|
| 96 |
+
0,
|
| 97 |
+
0
|
| 98 |
+
],
|
| 99 |
+
[
|
| 100 |
+
10,
|
| 101 |
+
10,
|
| 102 |
+
47,
|
| 103 |
+
0,
|
| 104 |
+
0,
|
| 105 |
+
0,
|
| 106 |
+
0
|
| 107 |
+
],
|
| 108 |
+
[
|
| 109 |
+
0,
|
| 110 |
+
4,
|
| 111 |
+
0,
|
| 112 |
+
110,
|
| 113 |
+
28,
|
| 114 |
+
1,
|
| 115 |
+
0
|
| 116 |
+
],
|
| 117 |
+
[
|
| 118 |
+
0,
|
| 119 |
+
0,
|
| 120 |
+
0,
|
| 121 |
+
21,
|
| 122 |
+
46,
|
| 123 |
+
24,
|
| 124 |
+
9
|
| 125 |
+
],
|
| 126 |
+
[
|
| 127 |
+
0,
|
| 128 |
+
0,
|
| 129 |
+
0,
|
| 130 |
+
4,
|
| 131 |
+
16,
|
| 132 |
+
23,
|
| 133 |
+
20
|
| 134 |
+
],
|
| 135 |
+
[
|
| 136 |
+
0,
|
| 137 |
+
0,
|
| 138 |
+
0,
|
| 139 |
+
0,
|
| 140 |
+
6,
|
| 141 |
+
24,
|
| 142 |
+
45
|
| 143 |
+
]
|
| 144 |
+
]
|
| 145 |
+
},
|
| 146 |
+
"macro_roc_auc_ovr": 0.935584434710217
|
| 147 |
+
}
|
| 148 |
+
},
|
| 149 |
+
"mlp": {
|
| 150 |
+
"architecture": "PyTorch MLP, 53 -> 128 -> 64 -> 7, BatchNorm1d + ReLU + Dropout, weighted cross-entropy loss",
|
| 151 |
+
"framework": "pytorch",
|
| 152 |
+
"test_metrics": {
|
| 153 |
+
"model": "mlp",
|
| 154 |
+
"accuracy": 0.6427350427350428,
|
| 155 |
+
"macro_f1": 0.6275373447450349,
|
| 156 |
+
"weighted_f1": 0.6380162402905546,
|
| 157 |
+
"per_class_f1": {
|
| 158 |
+
"target_reconnaissance": 0.8313253012048193,
|
| 159 |
+
"infrastructure_setup": 0.7017543859649122,
|
| 160 |
+
"lure_crafting": 0.5606060606060606,
|
| 161 |
+
"email_delivery": 0.7612456747404844,
|
| 162 |
+
"victim_engagement": 0.3867403314917127,
|
| 163 |
+
"credential_harvesting": 0.43410852713178294,
|
| 164 |
+
"post_compromise_escalation": 0.7169811320754716
|
| 165 |
+
},
|
| 166 |
+
"confusion_matrix": {
|
| 167 |
+
"labels": [
|
| 168 |
+
"target_reconnaissance",
|
| 169 |
+
"infrastructure_setup",
|
| 170 |
+
"lure_crafting",
|
| 171 |
+
"email_delivery",
|
| 172 |
+
"victim_engagement",
|
| 173 |
+
"credential_harvesting",
|
| 174 |
+
"post_compromise_escalation"
|
| 175 |
+
],
|
| 176 |
+
"matrix": [
|
| 177 |
+
[
|
| 178 |
+
69,
|
| 179 |
+
1,
|
| 180 |
+
14,
|
| 181 |
+
0,
|
| 182 |
+
0,
|
| 183 |
+
0,
|
| 184 |
+
0
|
| 185 |
+
],
|
| 186 |
+
[
|
| 187 |
+
0,
|
| 188 |
+
40,
|
| 189 |
+
13,
|
| 190 |
+
0,
|
| 191 |
+
0,
|
| 192 |
+
0,
|
| 193 |
+
0
|
| 194 |
+
],
|
| 195 |
+
[
|
| 196 |
+
13,
|
| 197 |
+
17,
|
| 198 |
+
37,
|
| 199 |
+
0,
|
| 200 |
+
0,
|
| 201 |
+
0,
|
| 202 |
+
0
|
| 203 |
+
],
|
| 204 |
+
[
|
| 205 |
+
0,
|
| 206 |
+
3,
|
| 207 |
+
1,
|
| 208 |
+
110,
|
| 209 |
+
23,
|
| 210 |
+
6,
|
| 211 |
+
0
|
| 212 |
+
],
|
| 213 |
+
[
|
| 214 |
+
0,
|
| 215 |
+
0,
|
| 216 |
+
0,
|
| 217 |
+
32,
|
| 218 |
+
35,
|
| 219 |
+
21,
|
| 220 |
+
12
|
| 221 |
+
],
|
| 222 |
+
[
|
| 223 |
+
0,
|
| 224 |
+
0,
|
| 225 |
+
0,
|
| 226 |
+
4,
|
| 227 |
+
16,
|
| 228 |
+
28,
|
| 229 |
+
15
|
| 230 |
+
],
|
| 231 |
+
[
|
| 232 |
+
0,
|
| 233 |
+
0,
|
| 234 |
+
0,
|
| 235 |
+
0,
|
| 236 |
+
7,
|
| 237 |
+
11,
|
| 238 |
+
57
|
| 239 |
+
]
|
| 240 |
+
]
|
| 241 |
+
},
|
| 242 |
+
"macro_roc_auc_ovr": 0.9264812360054401
|
| 243 |
+
}
|
| 244 |
+
}
|
| 245 |
+
}
|
| 246 |
+
}
|