Initial release: attack_phase 7-class baseline + 6-oracle-path leakage diagnostic + missing tier note
Browse files- README.md +488 -0
- ablation_results.json +685 -0
- feature_engineering.py +399 -0
- feature_meta.json +111 -0
- feature_scaler.json +1 -0
- inference_example.ipynb +342 -0
- leakage_diagnostic.json +238 -0
- model_mlp.safetensors +3 -0
- model_xgb.json +0 -0
- multi_seed_results.json +98 -0
- validation_results.json +247 -0
README.md
ADDED
|
@@ -0,0 +1,488 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-4.0
|
| 3 |
+
library_name: pytorch
|
| 4 |
+
tags:
|
| 5 |
+
- cybersecurity
|
| 6 |
+
- adversarial-machine-learning
|
| 7 |
+
- ai-security
|
| 8 |
+
- adversarial-attacks
|
| 9 |
+
- evasion-attacks
|
| 10 |
+
- apt
|
| 11 |
+
- tabular-classification
|
| 12 |
+
- synthetic-data
|
| 13 |
+
- xgboost
|
| 14 |
+
- baseline
|
| 15 |
+
- leakage-diagnostic
|
| 16 |
+
pipeline_tag: tabular-classification
|
| 17 |
+
base_model: []
|
| 18 |
+
datasets:
|
| 19 |
+
- xpertsystems/cyb011-sample
|
| 20 |
+
metrics:
|
| 21 |
+
- accuracy
|
| 22 |
+
- f1
|
| 23 |
+
- roc_auc
|
| 24 |
+
model-index:
|
| 25 |
+
- name: cyb011-baseline-classifier
|
| 26 |
+
results:
|
| 27 |
+
- task:
|
| 28 |
+
type: tabular-classification
|
| 29 |
+
name: 7-class adversarial attack phase classification
|
| 30 |
+
dataset:
|
| 31 |
+
type: xpertsystems/cyb011-sample
|
| 32 |
+
name: CYB011 Synthetic AI Evasion Attack Trajectory Dataset (Sample)
|
| 33 |
+
metrics:
|
| 34 |
+
- type: roc_auc
|
| 35 |
+
value: 0.9753
|
| 36 |
+
name: Test macro ROC-AUC OvR (XGBoost, seed 42)
|
| 37 |
+
- type: accuracy
|
| 38 |
+
value: 0.8643
|
| 39 |
+
name: Test accuracy (XGBoost, seed 42)
|
| 40 |
+
- type: f1
|
| 41 |
+
value: 0.7693
|
| 42 |
+
name: Test macro-F1 (XGBoost, seed 42)
|
| 43 |
+
- type: accuracy
|
| 44 |
+
value: 0.867
|
| 45 |
+
name: Multi-seed accuracy mean ± 0.010 (XGBoost, 10 seeds)
|
| 46 |
+
- type: roc_auc
|
| 47 |
+
value: 0.977
|
| 48 |
+
name: Multi-seed ROC-AUC mean ± 0.002 (XGBoost, 10 seeds)
|
| 49 |
+
---
|
| 50 |
+
|
| 51 |
+
# CYB011 Baseline Classifier
|
| 52 |
+
|
| 53 |
+
**Adversarial attack phase classifier (7-class) trained on the CYB011
|
| 54 |
+
synthetic AI evasion attack trajectory sample. Predicts which of 7
|
| 55 |
+
attack phases (`reconnaissance` / `feature_space_probe` /
|
| 56 |
+
`perturbation_craft` / `evasion_attempt` / `feedback_adaptation` /
|
| 57 |
+
`campaign_consolidation` / `idle_dwell`) a per-timestep trajectory
|
| 58 |
+
event belongs to, from per-event features. ALSO ships a comprehensive
|
| 59 |
+
`leakage_diagnostic.json` documenting 6 oracle paths discovered
|
| 60 |
+
across the dataset's targets, 4 README-suggested targets that are
|
| 61 |
+
unlearnable on the sample after honest leak removal, and the missing
|
| 62 |
+
`nation_state` attacker tier.**
|
| 63 |
+
|
| 64 |
+
> **Read this first.** This repo ships two related artifacts:
|
| 65 |
+
> (1) a working baseline classifier for `attack_phase` (the dataset's
|
| 66 |
+
> headline target), and (2) `leakage_diagnostic.json` documenting 6
|
| 67 |
+
> separate oracle paths, 4 unlearnable targets, and one missing
|
| 68 |
+
> attacker tier. Both files matter; the diagnostic is required reading
|
| 69 |
+
> for anyone evaluating CYB011 for adversarial ML research.
|
| 70 |
+
|
| 71 |
+
## Model overview
|
| 72 |
+
|
| 73 |
+
| Property | Value |
|
| 74 |
+
|---|---|
|
| 75 |
+
| Primary task | 7-class `attack_phase` classification |
|
| 76 |
+
| Secondary artifact | `leakage_diagnostic.json` — 6 oracle paths + 4 unlearnable targets |
|
| 77 |
+
| Training data | `xpertsystems/cyb011-sample` (14,000 events / 200 campaigns) |
|
| 78 |
+
| Models | XGBoost + PyTorch MLP |
|
| 79 |
+
| Input features | 37 (after one-hot encoding) |
|
| 80 |
+
| Split | **Group-aware** (GroupShuffleSplit on `campaign_id`) |
|
| 81 |
+
| Validation | Single seed (artifact) + multi-seed aggregate across 10 seeds |
|
| 82 |
+
| License | CC-BY-NC-4.0 (matches dataset) |
|
| 83 |
+
| Status | Reference baseline + comprehensive leakage diagnostic |
|
| 84 |
+
|
| 85 |
+
## Why this task — and what was dropped
|
| 86 |
+
|
| 87 |
+
The CYB011 README describes a "6-phase adversarial state machine."
|
| 88 |
+
The actual sample data contains **7 phases** — it adds `idle_dwell`
|
| 89 |
+
as a class (18% of all events, the second-largest class). The
|
| 90 |
+
published baseline trains on all 7.
|
| 91 |
+
|
| 92 |
+
We piloted nine candidate targets and found:
|
| 93 |
+
|
| 94 |
+
- **`attack_phase` 7-class**: strongest honest result. Acc 0.867 ±
|
| 95 |
+
0.010, ROC-AUC 0.977 ± 0.002 (multi-seed). All 7 classes
|
| 96 |
+
represented, per-class F1 range 0.49–1.00.
|
| 97 |
+
|
| 98 |
+
- **`attacker_capability_tier` 3-class (per-timestep)**: weak honest
|
| 99 |
+
result (acc 0.68, mF1 0.64). The 3 tiers do not strongly
|
| 100 |
+
distinguish each other at the per-timestep level — feature means
|
| 101 |
+
are within ~1% across tiers.
|
| 102 |
+
|
| 103 |
+
- **`attacker_capability_tier` 3-class (per-campaign)**: hits acc 0.94
|
| 104 |
+
but is structurally inflated by `stealth_score` leakage
|
| 105 |
+
(near-deterministic ranges per tier). Documented in the diagnostic.
|
| 106 |
+
|
| 107 |
+
- **`detection_outcome` 4-class**: hits 100% trivially via
|
| 108 |
+
`detector_confidence_score` thresholds. Pure oracle.
|
| 109 |
+
|
| 110 |
+
- **`defender_architecture` 8-class**: hits 100% trivially via the
|
| 111 |
+
topology fingerprint (7 segment features uniquely identify each
|
| 112 |
+
architecture). Collapses to acc 0.13 vs majority 0.17 when the
|
| 113 |
+
fingerprint is dropped.
|
| 114 |
+
|
| 115 |
+
- **`campaign_success_flag` / `campaign_type` / `coordinated_attack_flag`**:
|
| 116 |
+
all below majority baseline at n=200 campaigns.
|
| 117 |
+
|
| 118 |
+
### Three oracle columns dropped from features
|
| 119 |
+
|
| 120 |
+
The phase task has three direct outcome-leak columns. Each is a perfect
|
| 121 |
+
or near-perfect oracle for specific phases:
|
| 122 |
+
|
| 123 |
+
| Column | Oracle relationship |
|
| 124 |
+
|---|---|
|
| 125 |
+
| `detection_outcome` | `!= suppressed_alert` → 100% `evasion_attempt` phase |
|
| 126 |
+
| `detector_confidence_score` | Threshold-derived from `detection_outcome` (<0.25 → evasion_success, [0.52,0.78] → marginal, ≥0.78 → high_confidence) |
|
| 127 |
+
| `evasion_budget_consumed` | `== 0` → 100% one of 3 early phases (reconnaissance, feature_space_probe, perturbation_craft) |
|
| 128 |
+
|
| 129 |
+
With these three columns present, a plain XGBoost achieves 100%
|
| 130 |
+
accuracy. The published baseline trains with all three excluded.
|
| 131 |
+
|
| 132 |
+
### `timestep` kept as a legitimate observable
|
| 133 |
+
|
| 134 |
+
`timestep` is a partial oracle for 3 phases (reconnaissance is
|
| 135 |
+
always timestep 1-7, feedback_adaptation is 63-66, campaign_consolidation
|
| 136 |
+
is 65-70). It's **kept** in the feature set because campaign-progress
|
| 137 |
+
position is a real observable a defender would have at decision time
|
| 138 |
+
— it's not encoding the label, it's encoding the lifecycle position.
|
| 139 |
+
|
| 140 |
+
Removing `timestep` drops headline accuracy by ~9pp (0.87 → 0.78).
|
| 141 |
+
Documented in the diagnostic for transparency.
|
| 142 |
+
|
| 143 |
+
Two model artifacts are published. They are designed to be used
|
| 144 |
+
together:
|
| 145 |
+
|
| 146 |
+
- `model_xgb.json` — gradient-boosted trees (higher F1)
|
| 147 |
+
- `model_mlp.safetensors` — PyTorch MLP
|
| 148 |
+
|
| 149 |
+
## Quick start
|
| 150 |
+
|
| 151 |
+
```bash
|
| 152 |
+
pip install xgboost torch safetensors pandas huggingface_hub
|
| 153 |
+
```
|
| 154 |
+
|
| 155 |
+
```python
|
| 156 |
+
from huggingface_hub import hf_hub_download, snapshot_download
|
| 157 |
+
import json, numpy as np, torch, xgboost as xgb
|
| 158 |
+
from safetensors.torch import load_file
|
| 159 |
+
|
| 160 |
+
REPO = "xpertsystems/cyb011-baseline-classifier"
|
| 161 |
+
|
| 162 |
+
paths = {n: hf_hub_download(REPO, n) for n in [
|
| 163 |
+
"model_xgb.json", "model_mlp.safetensors",
|
| 164 |
+
"feature_engineering.py", "feature_meta.json", "feature_scaler.json",
|
| 165 |
+
]}
|
| 166 |
+
|
| 167 |
+
import sys, os
|
| 168 |
+
sys.path.insert(0, os.path.dirname(paths["feature_engineering.py"]))
|
| 169 |
+
from feature_engineering import (
|
| 170 |
+
transform_single, load_meta, build_segment_lookup, INT_TO_LABEL,
|
| 171 |
+
)
|
| 172 |
+
|
| 173 |
+
meta = load_meta(paths["feature_meta.json"])
|
| 174 |
+
|
| 175 |
+
# Segment features are joined from network_topology.csv at inference time
|
| 176 |
+
ds = snapshot_download("xpertsystems/cyb011-sample", repo_type="dataset")
|
| 177 |
+
segment_lookup = build_segment_lookup(f"{ds}/network_topology.csv")
|
| 178 |
+
|
| 179 |
+
xgb_model = xgb.XGBClassifier(); xgb_model.load_model(paths["model_xgb.json"])
|
| 180 |
+
|
| 181 |
+
# Predict (see inference_example.ipynb for the full pattern)
|
| 182 |
+
# Note: do NOT include detection_outcome, detector_confidence_score,
|
| 183 |
+
# or evasion_budget_consumed — those were the outcome leak columns.
|
| 184 |
+
X = transform_single(my_event, meta, segment_lookup=segment_lookup)
|
| 185 |
+
proba = xgb_model.predict_proba(X)[0]
|
| 186 |
+
print(INT_TO_LABEL[int(np.argmax(proba))])
|
| 187 |
+
```
|
| 188 |
+
|
| 189 |
+
See [`inference_example.ipynb`](./inference_example.ipynb) for the full
|
| 190 |
+
copy-paste demo.
|
| 191 |
+
|
| 192 |
+
## Training data
|
| 193 |
+
|
| 194 |
+
Trained on the public sample of CYB011, 14,000 per-timestep records:
|
| 195 |
+
|
| 196 |
+
| Phase | Events | Class share |
|
| 197 |
+
|---|---:|---:|
|
| 198 |
+
| `evasion_attempt` | 7,206 | 51.5% |
|
| 199 |
+
| `idle_dwell` | 2,450 | 17.5% |
|
| 200 |
+
| `feature_space_probe` | 1,465 | 10.5% |
|
| 201 |
+
| `campaign_consolidation` | 829 | 5.9% |
|
| 202 |
+
| `reconnaissance` | 809 | 5.8% |
|
| 203 |
+
| `perturbation_craft` | 745 | 5.3% |
|
| 204 |
+
| `feedback_adaptation` | 496 | 3.5% |
|
| 205 |
+
|
| 206 |
+
### Group-aware split by campaign_id
|
| 207 |
+
|
| 208 |
+
200 campaigns × 70 timesteps each. Timesteps from the same campaign
|
| 209 |
+
share attacker, target segment, and tier — so train/test contamination
|
| 210 |
+
is a real risk with random splitting. The baseline uses
|
| 211 |
+
**GroupShuffleSplit** on `campaign_id` (nested 70/15/15):
|
| 212 |
+
|
| 213 |
+
| Fold | Events | Campaigns |
|
| 214 |
+
|---|---:|---:|
|
| 215 |
+
| Train | 9,730 | ~140 |
|
| 216 |
+
| Validation | 2,170 | ~30 |
|
| 217 |
+
| Test | 2,100 | ~30 |
|
| 218 |
+
|
| 219 |
+
All 10 multi-seed evaluations yielded all 7 classes in the test fold.
|
| 220 |
+
Class imbalance is addressed with `class_weight='balanced'` (XGBoost
|
| 221 |
+
`sample_weight`) and weighted cross-entropy (MLP).
|
| 222 |
+
|
| 223 |
+
## Feature pipeline
|
| 224 |
+
|
| 225 |
+
The bundled `feature_engineering.py` is the canonical recipe. 37
|
| 226 |
+
features survive after encoding, drawn from:
|
| 227 |
+
|
| 228 |
+
- **Per-timestep numeric** (5): `timestep`, `perturbation_magnitude`,
|
| 229 |
+
`feature_delta_l2_norm`, `feature_delta_linf_norm`, `query_count_cumulative`
|
| 230 |
+
- **Per-timestep categorical** (1, one-hot): `attacker_capability_tier`
|
| 231 |
+
(3 values in sample)
|
| 232 |
+
- **Segment features** (joined from `network_topology.csv`): 8 numeric
|
| 233 |
+
+ 2 categorical (segment_type, defender_architecture)
|
| 234 |
+
- **Engineered** (5): `progress_frac`, `log_queries`, `perturb_intensity`,
|
| 235 |
+
`defender_weakness`, `query_rate`
|
| 236 |
+
|
| 237 |
+
## Evaluation
|
| 238 |
+
|
| 239 |
+
### Test-set metrics, seed 42 (n = 2,100 events from ~30 test campaigns)
|
| 240 |
+
|
| 241 |
+
**XGBoost** (the published `model_xgb.json` artifact)
|
| 242 |
+
|
| 243 |
+
| Metric | Value |
|
| 244 |
+
|---|---:|
|
| 245 |
+
| Macro ROC-AUC (OvR) | **0.9753** |
|
| 246 |
+
| Accuracy | **0.8643** |
|
| 247 |
+
| Macro-F1 | 0.7693 |
|
| 248 |
+
| Weighted-F1 | 0.8703 |
|
| 249 |
+
|
| 250 |
+
**MLP** (the published `model_mlp.safetensors` artifact)
|
| 251 |
+
|
| 252 |
+
| Metric | Value |
|
| 253 |
+
|---|---:|
|
| 254 |
+
| Macro ROC-AUC (OvR) | **0.9705** |
|
| 255 |
+
| Accuracy | **0.8386** |
|
| 256 |
+
| Macro-F1 | 0.7345 |
|
| 257 |
+
| Weighted-F1 | 0.8462 |
|
| 258 |
+
|
| 259 |
+
XGBoost slightly outperforms MLP (acc 0.864 vs 0.839, macro-F1 0.769
|
| 260 |
+
vs 0.735). The gap is consistent across seeds.
|
| 261 |
+
|
| 262 |
+
### Multi-seed robustness (XGBoost, 10 seeds)
|
| 263 |
+
|
| 264 |
+
| Metric | Mean | Std | Min | Max |
|
| 265 |
+
|---|---:|---:|---:|---:|
|
| 266 |
+
| Accuracy | 0.867 | 0.010 | 0.852 | 0.884 |
|
| 267 |
+
| Macro-F1 | 0.775 | 0.012 | 0.750 | 0.798 |
|
| 268 |
+
| Macro ROC-AUC OvR | 0.977 | 0.002 | 0.973 | 0.980 |
|
| 269 |
+
|
| 270 |
+
All 10 seeds yielded all 7 classes in the test fold. Full per-seed
|
| 271 |
+
results in [`multi_seed_results.json`](./multi_seed_results.json).
|
| 272 |
+
|
| 273 |
+
### Per-class F1 (seed 42)
|
| 274 |
+
|
| 275 |
+
| Phase | Class share | XGBoost F1 | MLP F1 |
|
| 276 |
+
|---|---:|---:|---:|
|
| 277 |
+
| `evasion_attempt` | 51.5% | **0.996** | 0.993 |
|
| 278 |
+
| `reconnaissance` | 5.8% | **0.886** | 0.874 |
|
| 279 |
+
| `campaign_consolidation` | 5.9% | 0.808 | 0.785 |
|
| 280 |
+
| `feature_space_probe` | 10.5% | 0.783 | 0.747 |
|
| 281 |
+
| `feedback_adaptation` | 3.5% | 0.715 | 0.628 |
|
| 282 |
+
| `idle_dwell` | 17.5% | 0.704 | 0.619 |
|
| 283 |
+
| `perturbation_craft` | 5.3% | **0.493** | 0.497 |
|
| 284 |
+
|
| 285 |
+
`evasion_attempt` is nearly perfectly separable because of its
|
| 286 |
+
distinctive query-usage and perturbation-activity signatures.
|
| 287 |
+
`reconnaissance` and `campaign_consolidation` are well-separated by
|
| 288 |
+
their characteristic timestep ranges. `perturbation_craft` is the
|
| 289 |
+
hardest class (F1 0.49) because its per-timestep features overlap
|
| 290 |
+
heavily with `feature_space_probe` — both involve probing model
|
| 291 |
+
behavior at moderate query counts without submitting a final evasion
|
| 292 |
+
attempt.
|
| 293 |
+
|
| 294 |
+
### Ablation: which feature groups matter
|
| 295 |
+
|
| 296 |
+
| Configuration | Accuracy | Macro-F1 | ROC-AUC | Δ accuracy | Δ macro-F1 |
|
| 297 |
+
|---|---:|---:|---:|---:|---:|
|
| 298 |
+
| Full feature set (published) | 0.8643 | 0.7693 | 0.9753 | — | — |
|
| 299 |
+
| No perturbation features | 0.6595 | 0.6451 | 0.8979 | **−0.205** | **−0.124** |
|
| 300 |
+
| No query features | 0.8210 | 0.7080 | 0.9669 | −0.043 | −0.061 |
|
| 301 |
+
| No engineered features | 0.8590 | 0.7619 | 0.9751 | −0.005 | −0.007 |
|
| 302 |
+
| No tier (one-hot) | 0.8614 | 0.7647 | 0.9752 | −0.003 | −0.005 |
|
| 303 |
+
| No timestep | 0.8557 | 0.7549 | 0.9696 | −0.009 | −0.014 |
|
| 304 |
+
| No topology features | 0.8648 | 0.7745 | 0.9760 | +0.001 | +0.005 |
|
| 305 |
+
|
| 306 |
+
Three findings:
|
| 307 |
+
|
| 308 |
+
1. **Perturbation features carry the dominant signal** (−20pp accuracy,
|
| 309 |
+
−12pp F1 when removed). `feature_delta_l2_norm`,
|
| 310 |
+
`feature_delta_linf_norm`, and `perturbation_magnitude` directly
|
| 311 |
+
encode whether the attacker is actively perturbing inputs.
|
| 312 |
+
2. **Query features are second-strongest** (−4pp accuracy, −6pp F1).
|
| 313 |
+
Cumulative query count distinguishes active phases (evasion_attempt,
|
| 314 |
+
probe) from idle phases.
|
| 315 |
+
3. **Topology features contribute nothing on this task** (+0.1pp
|
| 316 |
+
accuracy when removed). Clean confirmation that the topology
|
| 317 |
+
fingerprint isn't leaking phase information — topology
|
| 318 |
+
fingerprints defender_architecture, not attack_phase.
|
| 319 |
+
|
| 320 |
+
### Architecture
|
| 321 |
+
|
| 322 |
+
**XGBoost:** multi-class gradient boosting (`multi:softprob`, 7 classes),
|
| 323 |
+
`hist` tree method, class-balanced sample weights, early stopping on
|
| 324 |
+
validation mlogloss.
|
| 325 |
+
|
| 326 |
+
**MLP:** `37 → 128 → 64 → 7`, each hidden layer followed by `BatchNorm1d`
|
| 327 |
+
→ `ReLU` → `Dropout(0.3)`, weighted cross-entropy loss, AdamW optimizer,
|
| 328 |
+
early stopping on validation macro-F1.
|
| 329 |
+
|
| 330 |
+
Training hyperparameters are held internally by XpertSystems.
|
| 331 |
+
|
| 332 |
+
## Limitations
|
| 333 |
+
|
| 334 |
+
**This is a baseline reference, not a production phase classifier.**
|
| 335 |
+
|
| 336 |
+
1. **The leakage diagnostic is required reading.** Three direct
|
| 337 |
+
oracle columns for the phase task plus three additional documented
|
| 338 |
+
leaks (timestep partial, stealth_score per-tier, topology
|
| 339 |
+
fingerprint) are in `leakage_diagnostic.json`. If you use CYB011
|
| 340 |
+
sample data for your own training, you MUST drop the three direct
|
| 341 |
+
oracles or your model will learn the oracles instead of the task.
|
| 342 |
+
|
| 343 |
+
2. **`perturbation_craft` F1 0.49 is the weakest class.** This phase's
|
| 344 |
+
per-timestep features overlap heavily with `feature_space_probe`.
|
| 345 |
+
A sequence model considering event ordering within campaigns would
|
| 346 |
+
likely do better than per-timestep classification.
|
| 347 |
+
|
| 348 |
+
3. **`nation_state` attacker tier is MISSING from the sample.** The
|
| 349 |
+
README claims 4 tiers (script_kiddie, opportunistic, APT,
|
| 350 |
+
nation_state). The sample contains only 3 — nation_state events
|
| 351 |
+
are entirely absent. Models trained on this sample cannot
|
| 352 |
+
generalize to nation_state actors.
|
| 353 |
+
|
| 354 |
+
4. **Four README-suggested headline targets are unlearnable on the
|
| 355 |
+
sample** after honest leak removal: `campaign_success_flag` (acc
|
| 356 |
+
0.51 vs majority 0.61), `campaign_type` 8-class (acc 0.11 vs 0.17),
|
| 357 |
+
`coordinated_attack_flag` (acc 0.83 vs 0.90 — only 20 positives in
|
| 358 |
+
200 campaigns), and `defender_architecture` 8-class (collapses to
|
| 359 |
+
acc 0.13 when the 7-feature topology fingerprint is dropped).
|
| 360 |
+
|
| 361 |
+
5. **Per-campaign tasks are structurally limited at n=200.** With ~30
|
| 362 |
+
test campaigns per fold, statistical power is limited. The full
|
| 363 |
+
~5,500-campaign product would yield much tighter per-campaign
|
| 364 |
+
metrics.
|
| 365 |
+
|
| 366 |
+
6. **Synthetic-vs-real transfer.** The dataset is synthetic, calibrated
|
| 367 |
+
to 12 benchmarks from MITRE ATLAS / NIST AI 100-2 / OWASP ML Top 10
|
| 368 |
+
/ USENIX / IBM ART / Anthropic-OpenAI red team reports. Real
|
| 369 |
+
adversarial ML telemetry has different noise characteristics, and
|
| 370 |
+
in particular the threshold-encoded `detector_confidence_score`
|
| 371 |
+
and zero-sentinel `evasion_budget_consumed` patterns documented in
|
| 372 |
+
the diagnostic would not be present in real data. Real telemetry
|
| 373 |
+
has continuous, overlapping distributions.
|
| 374 |
+
|
| 375 |
+
## Notes on dataset schema
|
| 376 |
+
|
| 377 |
+
The CYB011 sample dataset README describes some fields differently
|
| 378 |
+
from the actual schema. The model was trained on the actual schema;
|
| 379 |
+
this note helps buyers reconcile what they read with what they receive.
|
| 380 |
+
|
| 381 |
+
| What the README says | What the data actually contains |
|
| 382 |
+
|---|---|
|
| 383 |
+
| `attack_trajectories` has 18 columns | Data has **13 columns** |
|
| 384 |
+
| Field renames | `adversarial_phase` → `attack_phase`, `attacker_tier` → `attacker_capability_tier`, `perturbation_linf` → `feature_delta_linf_norm`, `perturbation_l2` → `feature_delta_l2_norm`, `queries_used` → `query_count_cumulative` |
|
| 385 |
+
| README missing from `attack_trajectories` | `detector_confidence_score`, `detection_outcome`, `evasion_budget_consumed` are in data but not documented |
|
| 386 |
+
| README claims `gradient_access`, `evasion_attempted`, `evasion_succeeded`, `query_budget_remaining`, `defender_detection_strength`, `concept_drift_injected`, `transfer_attack_used`, `stealth_score`, `feature_space_dim` | None of these columns exist in `attack_trajectories`. `defender_detection_strength`, `feature_space_dim`, and `stealth_score` exist in `network_topology` or `campaign_summary` respectively, not in `attack_trajectories` |
|
| 387 |
+
| `attacker_capability_tier` has 4 values | Data has **3 values** — `nation_state` MISSING entirely |
|
| 388 |
+
| `attack_phase` 6-phase lifecycle | Data has **7 phases** — adds `idle_dwell` (18% of events) |
|
| 389 |
+
| `campaign_summary` has 14 columns | Data has **25 columns** |
|
| 390 |
+
| README documents no schema for `network_topology` | Data has **12 columns** |
|
| 391 |
+
|
| 392 |
+
None of these affects model correctness — the feature pipeline uses
|
| 393 |
+
the actual column names. If you build your own pipeline against the
|
| 394 |
+
dataset, use the actual columns.
|
| 395 |
+
|
| 396 |
+
## Intended use
|
| 397 |
+
|
| 398 |
+
- **Evaluating fit** of the CYB011 dataset for your adversarial ML
|
| 399 |
+
research
|
| 400 |
+
- **Baseline reference** for new model architectures on the attack-
|
| 401 |
+
phase classification task
|
| 402 |
+
- **Reference example of structural-leakage diagnostics** for
|
| 403 |
+
synthetic adversarial ML datasets — the methodology is reusable
|
| 404 |
+
- **Feature engineering reference** for per-timestep adversarial
|
| 405 |
+
trajectory telemetry
|
| 406 |
+
|
| 407 |
+
## Out-of-scope use
|
| 408 |
+
|
| 409 |
+
- Production adversarial detection on real ML systems
|
| 410 |
+
- Attacker tier attribution (3-class per-timestep is weak; per-campaign
|
| 411 |
+
is leaky via stealth_score)
|
| 412 |
+
- Defender architecture vulnerability assessment (trivially leaky on
|
| 413 |
+
this sample; collapses when topology fingerprint is dropped)
|
| 414 |
+
- Campaign success prediction (unlearnable on sample)
|
| 415 |
+
- Any nation_state-specific modeling (tier absent from sample)
|
| 416 |
+
- Any operational AI security decision without further validation on
|
| 417 |
+
real adversarial telemetry
|
| 418 |
+
|
| 419 |
+
## Reproducibility
|
| 420 |
+
|
| 421 |
+
Outputs above were produced with `seed = 42` (published artifact),
|
| 422 |
+
nested `GroupShuffleSplit` on `campaign_id` (70/15/15), on the
|
| 423 |
+
published sample (`xpertsystems/cyb011-sample`, version 1.0.0,
|
| 424 |
+
generated 2026-05-16). The feature pipeline in `feature_engineering.py`
|
| 425 |
+
is deterministic and the trained weights in this repo correspond
|
| 426 |
+
exactly to the metrics above.
|
| 427 |
+
|
| 428 |
+
Multi-seed results (seeds 42, 7, 13, 17, 23, 31, 45, 99, 123, 200)
|
| 429 |
+
in `multi_seed_results.json` confirm robust performance across splits
|
| 430 |
+
(std 0.010 on accuracy, 0.002 on ROC-AUC).
|
| 431 |
+
|
| 432 |
+
The training script itself is private to XpertSystems.
|
| 433 |
+
|
| 434 |
+
## Files in this repo
|
| 435 |
+
|
| 436 |
+
| File | Purpose |
|
| 437 |
+
|---|---|
|
| 438 |
+
| `model_xgb.json` | XGBoost weights (seed 42) |
|
| 439 |
+
| `model_mlp.safetensors` | PyTorch MLP weights (seed 42) |
|
| 440 |
+
| `feature_engineering.py` | Feature pipeline |
|
| 441 |
+
| `feature_meta.json` | Feature column order + categorical levels |
|
| 442 |
+
| `feature_scaler.json` | MLP input mean/std (XGBoost ignores) |
|
| 443 |
+
| `validation_results.json` | Per-class metrics, confusion matrix, architecture |
|
| 444 |
+
| `ablation_results.json` | Per-feature-group ablation |
|
| 445 |
+
| `multi_seed_results.json` | XGBoost metrics across 10 seeds |
|
| 446 |
+
| **`leakage_diagnostic.json`** | **6-oracle-path audit + 4 unlearnable targets + missing tier note** |
|
| 447 |
+
| `inference_example.ipynb` | End-to-end inference demo notebook |
|
| 448 |
+
| `README.md` | This file |
|
| 449 |
+
|
| 450 |
+
## Contact and full product
|
| 451 |
+
|
| 452 |
+
The full **CYB011** dataset contains **~383,000 rows** across four files,
|
| 453 |
+
with calibrated benchmark validation against 12 metrics drawn from
|
| 454 |
+
authoritative adversarial ML research (MITRE ATLAS, NIST AI 100-2
|
| 455 |
+
Adversarial ML Taxonomy, OWASP ML Top 10, USENIX Security adversarial
|
| 456 |
+
ML papers, IEEE SaTML, Microsoft Counterfit, IBM Adversarial Robustness
|
| 457 |
+
Toolbox, Anthropic / OpenAI red team reports).
|
| 458 |
+
|
| 459 |
+
The full XpertSystems.ai synthetic data catalogue spans 41 SKUs across
|
| 460 |
+
Cybersecurity, Healthcare, Insurance & Risk, Oil & Gas, and Materials
|
| 461 |
+
& Energy.
|
| 462 |
+
|
| 463 |
+
- 📧 **pradeep@xpertsystems.ai**
|
| 464 |
+
- 🌐 **https://xpertsystems.ai**
|
| 465 |
+
- 🗂 Dataset: https://huggingface.co/datasets/xpertsystems/cyb011-sample
|
| 466 |
+
- 🤖 Companion models:
|
| 467 |
+
- https://huggingface.co/xpertsystems/cyb001-baseline-classifier (network traffic)
|
| 468 |
+
- https://huggingface.co/xpertsystems/cyb002-baseline-classifier (ATT&CK kill-chain)
|
| 469 |
+
- https://huggingface.co/xpertsystems/cyb003-baseline-classifier (malware execution phase)
|
| 470 |
+
- https://huggingface.co/xpertsystems/cyb004-baseline-classifier (phishing campaign phase)
|
| 471 |
+
- https://huggingface.co/xpertsystems/cyb005-baseline-classifier (ransomware actor-tier attribution)
|
| 472 |
+
- https://huggingface.co/xpertsystems/cyb006-baseline-classifier (user risk tier + leakage diagnostic)
|
| 473 |
+
- https://huggingface.co/xpertsystems/cyb007-baseline-classifier (insider threat type)
|
| 474 |
+
- https://huggingface.co/xpertsystems/cyb008-baseline-classifier (SOC alert triage + leakage diagnostic)
|
| 475 |
+
- https://huggingface.co/xpertsystems/cyb009-baseline-classifier (vulnerability classification + leakage diagnostic)
|
| 476 |
+
- https://huggingface.co/xpertsystems/cyb010-baseline-classifier (attack lifecycle phase + leakage diagnostic)
|
| 477 |
+
|
| 478 |
+
## Citation
|
| 479 |
+
|
| 480 |
+
```bibtex
|
| 481 |
+
@misc{xpertsystems_cyb011_baseline_2026,
|
| 482 |
+
title = {CYB011 Baseline Classifier: XGBoost and MLP for Adversarial Attack Phase Classification, with 6-Oracle-Path Leakage Diagnostic},
|
| 483 |
+
author = {XpertSystems.ai},
|
| 484 |
+
year = {2026},
|
| 485 |
+
url = {https://huggingface.co/xpertsystems/cyb011-baseline-classifier},
|
| 486 |
+
note = {Baseline reference model + leakage audit trained on xpertsystems/cyb011-sample}
|
| 487 |
+
}
|
| 488 |
+
```
|
ablation_results.json
ADDED
|
@@ -0,0 +1,685 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"purpose": "Quantify how much each feature group contributes to the headline XGBoost score. Identical architecture, same group-aware split, with one feature group dropped at a time.",
|
| 3 |
+
"full_model_metrics": {
|
| 4 |
+
"model": "xgboost",
|
| 5 |
+
"accuracy": 0.8642857142857143,
|
| 6 |
+
"macro_f1": 0.7693247628697397,
|
| 7 |
+
"weighted_f1": 0.8650489644308249,
|
| 8 |
+
"per_class_f1": {
|
| 9 |
+
"reconnaissance": 0.8865248226950354,
|
| 10 |
+
"feature_space_probe": 0.7829977628635347,
|
| 11 |
+
"perturbation_craft": 0.4927536231884058,
|
| 12 |
+
"evasion_attempt": 0.9962013295346629,
|
| 13 |
+
"feedback_adaptation": 0.7151515151515152,
|
| 14 |
+
"campaign_consolidation": 0.8075471698113208,
|
| 15 |
+
"idle_dwell": 0.7040971168437026
|
| 16 |
+
},
|
| 17 |
+
"confusion_matrix": {
|
| 18 |
+
"labels": [
|
| 19 |
+
"reconnaissance",
|
| 20 |
+
"feature_space_probe",
|
| 21 |
+
"perturbation_craft",
|
| 22 |
+
"evasion_attempt",
|
| 23 |
+
"feedback_adaptation",
|
| 24 |
+
"campaign_consolidation",
|
| 25 |
+
"idle_dwell"
|
| 26 |
+
],
|
| 27 |
+
"matrix": [
|
| 28 |
+
[
|
| 29 |
+
125,
|
| 30 |
+
0,
|
| 31 |
+
0,
|
| 32 |
+
0,
|
| 33 |
+
0,
|
| 34 |
+
0,
|
| 35 |
+
3
|
| 36 |
+
],
|
| 37 |
+
[
|
| 38 |
+
0,
|
| 39 |
+
175,
|
| 40 |
+
43,
|
| 41 |
+
0,
|
| 42 |
+
0,
|
| 43 |
+
0,
|
| 44 |
+
2
|
| 45 |
+
],
|
| 46 |
+
[
|
| 47 |
+
0,
|
| 48 |
+
20,
|
| 49 |
+
68,
|
| 50 |
+
0,
|
| 51 |
+
0,
|
| 52 |
+
0,
|
| 53 |
+
27
|
| 54 |
+
],
|
| 55 |
+
[
|
| 56 |
+
0,
|
| 57 |
+
0,
|
| 58 |
+
2,
|
| 59 |
+
1049,
|
| 60 |
+
0,
|
| 61 |
+
0,
|
| 62 |
+
6
|
| 63 |
+
],
|
| 64 |
+
[
|
| 65 |
+
0,
|
| 66 |
+
0,
|
| 67 |
+
0,
|
| 68 |
+
0,
|
| 69 |
+
59,
|
| 70 |
+
16,
|
| 71 |
+
1
|
| 72 |
+
],
|
| 73 |
+
[
|
| 74 |
+
0,
|
| 75 |
+
0,
|
| 76 |
+
0,
|
| 77 |
+
0,
|
| 78 |
+
9,
|
| 79 |
+
107,
|
| 80 |
+
0
|
| 81 |
+
],
|
| 82 |
+
[
|
| 83 |
+
29,
|
| 84 |
+
32,
|
| 85 |
+
48,
|
| 86 |
+
0,
|
| 87 |
+
21,
|
| 88 |
+
26,
|
| 89 |
+
232
|
| 90 |
+
]
|
| 91 |
+
]
|
| 92 |
+
},
|
| 93 |
+
"macro_roc_auc_ovr": 0.9752868672798508
|
| 94 |
+
},
|
| 95 |
+
"ablations": {
|
| 96 |
+
"no_timestep": {
|
| 97 |
+
"n_features": 35,
|
| 98 |
+
"dropped_count": 2,
|
| 99 |
+
"metrics": {
|
| 100 |
+
"model": "xgboost_no_timestep",
|
| 101 |
+
"accuracy": 0.8557142857142858,
|
| 102 |
+
"macro_f1": 0.7549062338242875,
|
| 103 |
+
"weighted_f1": 0.8554198045390304,
|
| 104 |
+
"per_class_f1": {
|
| 105 |
+
"reconnaissance": 0.8833922261484098,
|
| 106 |
+
"feature_space_probe": 0.7652173913043478,
|
| 107 |
+
"perturbation_craft": 0.51985559566787,
|
| 108 |
+
"evasion_attempt": 0.9952516619183286,
|
| 109 |
+
"feedback_adaptation": 0.6380368098159509,
|
| 110 |
+
"campaign_consolidation": 0.8108108108108109,
|
| 111 |
+
"idle_dwell": 0.6717791411042945
|
| 112 |
+
},
|
| 113 |
+
"confusion_matrix": {
|
| 114 |
+
"labels": [
|
| 115 |
+
"reconnaissance",
|
| 116 |
+
"feature_space_probe",
|
| 117 |
+
"perturbation_craft",
|
| 118 |
+
"evasion_attempt",
|
| 119 |
+
"feedback_adaptation",
|
| 120 |
+
"campaign_consolidation",
|
| 121 |
+
"idle_dwell"
|
| 122 |
+
],
|
| 123 |
+
"matrix": [
|
| 124 |
+
[
|
| 125 |
+
125,
|
| 126 |
+
0,
|
| 127 |
+
0,
|
| 128 |
+
0,
|
| 129 |
+
0,
|
| 130 |
+
0,
|
| 131 |
+
3
|
| 132 |
+
],
|
| 133 |
+
[
|
| 134 |
+
0,
|
| 135 |
+
176,
|
| 136 |
+
40,
|
| 137 |
+
0,
|
| 138 |
+
0,
|
| 139 |
+
0,
|
| 140 |
+
4
|
| 141 |
+
],
|
| 142 |
+
[
|
| 143 |
+
0,
|
| 144 |
+
25,
|
| 145 |
+
72,
|
| 146 |
+
0,
|
| 147 |
+
0,
|
| 148 |
+
0,
|
| 149 |
+
18
|
| 150 |
+
],
|
| 151 |
+
[
|
| 152 |
+
0,
|
| 153 |
+
0,
|
| 154 |
+
2,
|
| 155 |
+
1048,
|
| 156 |
+
1,
|
| 157 |
+
0,
|
| 158 |
+
6
|
| 159 |
+
],
|
| 160 |
+
[
|
| 161 |
+
0,
|
| 162 |
+
0,
|
| 163 |
+
0,
|
| 164 |
+
0,
|
| 165 |
+
52,
|
| 166 |
+
17,
|
| 167 |
+
7
|
| 168 |
+
],
|
| 169 |
+
[
|
| 170 |
+
0,
|
| 171 |
+
0,
|
| 172 |
+
0,
|
| 173 |
+
0,
|
| 174 |
+
4,
|
| 175 |
+
105,
|
| 176 |
+
7
|
| 177 |
+
],
|
| 178 |
+
[
|
| 179 |
+
30,
|
| 180 |
+
39,
|
| 181 |
+
48,
|
| 182 |
+
1,
|
| 183 |
+
30,
|
| 184 |
+
21,
|
| 185 |
+
219
|
| 186 |
+
]
|
| 187 |
+
]
|
| 188 |
+
},
|
| 189 |
+
"macro_roc_auc_ovr": 0.9695634570525955
|
| 190 |
+
},
|
| 191 |
+
"delta_accuracy": 0.008571428571428563,
|
| 192 |
+
"delta_macro_f1": 0.014418529045452266
|
| 193 |
+
},
|
| 194 |
+
"no_perturb": {
|
| 195 |
+
"n_features": 33,
|
| 196 |
+
"dropped_count": 4,
|
| 197 |
+
"metrics": {
|
| 198 |
+
"model": "xgboost_no_perturb",
|
| 199 |
+
"accuracy": 0.6595238095238095,
|
| 200 |
+
"macro_f1": 0.6450937004078117,
|
| 201 |
+
"weighted_f1": 0.6477912364682181,
|
| 202 |
+
"per_class_f1": {
|
| 203 |
+
"reconnaissance": 0.8896797153024911,
|
| 204 |
+
"feature_space_probe": 0.7264957264957265,
|
| 205 |
+
"perturbation_craft": 0.4034090909090909,
|
| 206 |
+
"evasion_attempt": 0.7787005373717636,
|
| 207 |
+
"feedback_adaptation": 0.7317073170731707,
|
| 208 |
+
"campaign_consolidation": 0.8120300751879699,
|
| 209 |
+
"idle_dwell": 0.17363344051446947
|
| 210 |
+
},
|
| 211 |
+
"confusion_matrix": {
|
| 212 |
+
"labels": [
|
| 213 |
+
"reconnaissance",
|
| 214 |
+
"feature_space_probe",
|
| 215 |
+
"perturbation_craft",
|
| 216 |
+
"evasion_attempt",
|
| 217 |
+
"feedback_adaptation",
|
| 218 |
+
"campaign_consolidation",
|
| 219 |
+
"idle_dwell"
|
| 220 |
+
],
|
| 221 |
+
"matrix": [
|
| 222 |
+
[
|
| 223 |
+
125,
|
| 224 |
+
0,
|
| 225 |
+
0,
|
| 226 |
+
0,
|
| 227 |
+
0,
|
| 228 |
+
0,
|
| 229 |
+
3
|
| 230 |
+
],
|
| 231 |
+
[
|
| 232 |
+
0,
|
| 233 |
+
170,
|
| 234 |
+
47,
|
| 235 |
+
2,
|
| 236 |
+
0,
|
| 237 |
+
0,
|
| 238 |
+
1
|
| 239 |
+
],
|
| 240 |
+
[
|
| 241 |
+
0,
|
| 242 |
+
20,
|
| 243 |
+
71,
|
| 244 |
+
9,
|
| 245 |
+
0,
|
| 246 |
+
0,
|
| 247 |
+
15
|
| 248 |
+
],
|
| 249 |
+
[
|
| 250 |
+
0,
|
| 251 |
+
25,
|
| 252 |
+
76,
|
| 253 |
+
797,
|
| 254 |
+
0,
|
| 255 |
+
0,
|
| 256 |
+
159
|
| 257 |
+
],
|
| 258 |
+
[
|
| 259 |
+
0,
|
| 260 |
+
0,
|
| 261 |
+
0,
|
| 262 |
+
0,
|
| 263 |
+
60,
|
| 264 |
+
15,
|
| 265 |
+
1
|
| 266 |
+
],
|
| 267 |
+
[
|
| 268 |
+
0,
|
| 269 |
+
0,
|
| 270 |
+
0,
|
| 271 |
+
0,
|
| 272 |
+
7,
|
| 273 |
+
108,
|
| 274 |
+
1
|
| 275 |
+
],
|
| 276 |
+
[
|
| 277 |
+
28,
|
| 278 |
+
33,
|
| 279 |
+
43,
|
| 280 |
+
182,
|
| 281 |
+
21,
|
| 282 |
+
27,
|
| 283 |
+
54
|
| 284 |
+
]
|
| 285 |
+
]
|
| 286 |
+
},
|
| 287 |
+
"macro_roc_auc_ovr": 0.89792342601005
|
| 288 |
+
},
|
| 289 |
+
"delta_accuracy": 0.2047619047619048,
|
| 290 |
+
"delta_macro_f1": 0.12423106246192805
|
| 291 |
+
},
|
| 292 |
+
"no_queries": {
|
| 293 |
+
"n_features": 34,
|
| 294 |
+
"dropped_count": 3,
|
| 295 |
+
"metrics": {
|
| 296 |
+
"model": "xgboost_no_queries",
|
| 297 |
+
"accuracy": 0.820952380952381,
|
| 298 |
+
"macro_f1": 0.7079823380902172,
|
| 299 |
+
"weighted_f1": 0.824790421215039,
|
| 300 |
+
"per_class_f1": {
|
| 301 |
+
"reconnaissance": 0.7986111111111112,
|
| 302 |
+
"feature_space_probe": 0.5427872860635696,
|
| 303 |
+
"perturbation_craft": 0.42073170731707316,
|
| 304 |
+
"evasion_attempt": 0.9952471482889734,
|
| 305 |
+
"feedback_adaptation": 0.7209302325581395,
|
| 306 |
+
"campaign_consolidation": 0.8015564202334631,
|
| 307 |
+
"idle_dwell": 0.67601246105919
|
| 308 |
+
},
|
| 309 |
+
"confusion_matrix": {
|
| 310 |
+
"labels": [
|
| 311 |
+
"reconnaissance",
|
| 312 |
+
"feature_space_probe",
|
| 313 |
+
"perturbation_craft",
|
| 314 |
+
"evasion_attempt",
|
| 315 |
+
"feedback_adaptation",
|
| 316 |
+
"campaign_consolidation",
|
| 317 |
+
"idle_dwell"
|
| 318 |
+
],
|
| 319 |
+
"matrix": [
|
| 320 |
+
[
|
| 321 |
+
115,
|
| 322 |
+
13,
|
| 323 |
+
0,
|
| 324 |
+
0,
|
| 325 |
+
0,
|
| 326 |
+
0,
|
| 327 |
+
0
|
| 328 |
+
],
|
| 329 |
+
[
|
| 330 |
+
20,
|
| 331 |
+
111,
|
| 332 |
+
84,
|
| 333 |
+
0,
|
| 334 |
+
0,
|
| 335 |
+
0,
|
| 336 |
+
5
|
| 337 |
+
],
|
| 338 |
+
[
|
| 339 |
+
0,
|
| 340 |
+
25,
|
| 341 |
+
69,
|
| 342 |
+
0,
|
| 343 |
+
0,
|
| 344 |
+
0,
|
| 345 |
+
21
|
| 346 |
+
],
|
| 347 |
+
[
|
| 348 |
+
0,
|
| 349 |
+
0,
|
| 350 |
+
2,
|
| 351 |
+
1047,
|
| 352 |
+
0,
|
| 353 |
+
0,
|
| 354 |
+
8
|
| 355 |
+
],
|
| 356 |
+
[
|
| 357 |
+
0,
|
| 358 |
+
0,
|
| 359 |
+
0,
|
| 360 |
+
0,
|
| 361 |
+
62,
|
| 362 |
+
13,
|
| 363 |
+
1
|
| 364 |
+
],
|
| 365 |
+
[
|
| 366 |
+
0,
|
| 367 |
+
0,
|
| 368 |
+
0,
|
| 369 |
+
0,
|
| 370 |
+
11,
|
| 371 |
+
103,
|
| 372 |
+
2
|
| 373 |
+
],
|
| 374 |
+
[
|
| 375 |
+
25,
|
| 376 |
+
40,
|
| 377 |
+
58,
|
| 378 |
+
0,
|
| 379 |
+
23,
|
| 380 |
+
25,
|
| 381 |
+
217
|
| 382 |
+
]
|
| 383 |
+
]
|
| 384 |
+
},
|
| 385 |
+
"macro_roc_auc_ovr": 0.9668743863750572
|
| 386 |
+
},
|
| 387 |
+
"delta_accuracy": 0.043333333333333335,
|
| 388 |
+
"delta_macro_f1": 0.061342424779522564
|
| 389 |
+
},
|
| 390 |
+
"no_topology": {
|
| 391 |
+
"n_features": 12,
|
| 392 |
+
"dropped_count": 25,
|
| 393 |
+
"metrics": {
|
| 394 |
+
"model": "xgboost_no_topology",
|
| 395 |
+
"accuracy": 0.8647619047619047,
|
| 396 |
+
"macro_f1": 0.7744509705042503,
|
| 397 |
+
"weighted_f1": 0.8651794157598562,
|
| 398 |
+
"per_class_f1": {
|
| 399 |
+
"reconnaissance": 0.9014084507042254,
|
| 400 |
+
"feature_space_probe": 0.7668161434977578,
|
| 401 |
+
"perturbation_craft": 0.519298245614035,
|
| 402 |
+
"evasion_attempt": 0.9952561669829222,
|
| 403 |
+
"feedback_adaptation": 0.7218934911242604,
|
| 404 |
+
"campaign_consolidation": 0.816793893129771,
|
| 405 |
+
"idle_dwell": 0.6996904024767802
|
| 406 |
+
},
|
| 407 |
+
"confusion_matrix": {
|
| 408 |
+
"labels": [
|
| 409 |
+
"reconnaissance",
|
| 410 |
+
"feature_space_probe",
|
| 411 |
+
"perturbation_craft",
|
| 412 |
+
"evasion_attempt",
|
| 413 |
+
"feedback_adaptation",
|
| 414 |
+
"campaign_consolidation",
|
| 415 |
+
"idle_dwell"
|
| 416 |
+
],
|
| 417 |
+
"matrix": [
|
| 418 |
+
[
|
| 419 |
+
128,
|
| 420 |
+
0,
|
| 421 |
+
0,
|
| 422 |
+
0,
|
| 423 |
+
0,
|
| 424 |
+
0,
|
| 425 |
+
0
|
| 426 |
+
],
|
| 427 |
+
[
|
| 428 |
+
0,
|
| 429 |
+
171,
|
| 430 |
+
47,
|
| 431 |
+
0,
|
| 432 |
+
0,
|
| 433 |
+
0,
|
| 434 |
+
2
|
| 435 |
+
],
|
| 436 |
+
[
|
| 437 |
+
0,
|
| 438 |
+
18,
|
| 439 |
+
74,
|
| 440 |
+
0,
|
| 441 |
+
0,
|
| 442 |
+
0,
|
| 443 |
+
23
|
| 444 |
+
],
|
| 445 |
+
[
|
| 446 |
+
0,
|
| 447 |
+
0,
|
| 448 |
+
1,
|
| 449 |
+
1049,
|
| 450 |
+
0,
|
| 451 |
+
0,
|
| 452 |
+
7
|
| 453 |
+
],
|
| 454 |
+
[
|
| 455 |
+
0,
|
| 456 |
+
0,
|
| 457 |
+
0,
|
| 458 |
+
0,
|
| 459 |
+
61,
|
| 460 |
+
15,
|
| 461 |
+
0
|
| 462 |
+
],
|
| 463 |
+
[
|
| 464 |
+
0,
|
| 465 |
+
0,
|
| 466 |
+
0,
|
| 467 |
+
0,
|
| 468 |
+
9,
|
| 469 |
+
107,
|
| 470 |
+
0
|
| 471 |
+
],
|
| 472 |
+
[
|
| 473 |
+
28,
|
| 474 |
+
37,
|
| 475 |
+
48,
|
| 476 |
+
2,
|
| 477 |
+
23,
|
| 478 |
+
24,
|
| 479 |
+
226
|
| 480 |
+
]
|
| 481 |
+
]
|
| 482 |
+
},
|
| 483 |
+
"macro_roc_auc_ovr": 0.9760448304097272
|
| 484 |
+
},
|
| 485 |
+
"delta_accuracy": -0.0004761904761904079,
|
| 486 |
+
"delta_macro_f1": -0.005126207634510549
|
| 487 |
+
},
|
| 488 |
+
"no_tier": {
|
| 489 |
+
"n_features": 34,
|
| 490 |
+
"dropped_count": 3,
|
| 491 |
+
"metrics": {
|
| 492 |
+
"model": "xgboost_no_tier",
|
| 493 |
+
"accuracy": 0.8614285714285714,
|
| 494 |
+
"macro_f1": 0.7646643425700288,
|
| 495 |
+
"weighted_f1": 0.8620313204951823,
|
| 496 |
+
"per_class_f1": {
|
| 497 |
+
"reconnaissance": 0.8865248226950354,
|
| 498 |
+
"feature_space_probe": 0.7671840354767184,
|
| 499 |
+
"perturbation_craft": 0.48148148148148145,
|
| 500 |
+
"evasion_attempt": 0.9952471482889734,
|
| 501 |
+
"feedback_adaptation": 0.7073170731707317,
|
| 502 |
+
"campaign_consolidation": 0.8120300751879699,
|
| 503 |
+
"idle_dwell": 0.702865761689291
|
| 504 |
+
},
|
| 505 |
+
"confusion_matrix": {
|
| 506 |
+
"labels": [
|
| 507 |
+
"reconnaissance",
|
| 508 |
+
"feature_space_probe",
|
| 509 |
+
"perturbation_craft",
|
| 510 |
+
"evasion_attempt",
|
| 511 |
+
"feedback_adaptation",
|
| 512 |
+
"campaign_consolidation",
|
| 513 |
+
"idle_dwell"
|
| 514 |
+
],
|
| 515 |
+
"matrix": [
|
| 516 |
+
[
|
| 517 |
+
125,
|
| 518 |
+
0,
|
| 519 |
+
0,
|
| 520 |
+
0,
|
| 521 |
+
0,
|
| 522 |
+
0,
|
| 523 |
+
3
|
| 524 |
+
],
|
| 525 |
+
[
|
| 526 |
+
0,
|
| 527 |
+
173,
|
| 528 |
+
45,
|
| 529 |
+
0,
|
| 530 |
+
0,
|
| 531 |
+
0,
|
| 532 |
+
2
|
| 533 |
+
],
|
| 534 |
+
[
|
| 535 |
+
0,
|
| 536 |
+
21,
|
| 537 |
+
65,
|
| 538 |
+
0,
|
| 539 |
+
0,
|
| 540 |
+
0,
|
| 541 |
+
29
|
| 542 |
+
],
|
| 543 |
+
[
|
| 544 |
+
0,
|
| 545 |
+
0,
|
| 546 |
+
3,
|
| 547 |
+
1047,
|
| 548 |
+
0,
|
| 549 |
+
0,
|
| 550 |
+
7
|
| 551 |
+
],
|
| 552 |
+
[
|
| 553 |
+
0,
|
| 554 |
+
0,
|
| 555 |
+
0,
|
| 556 |
+
0,
|
| 557 |
+
58,
|
| 558 |
+
17,
|
| 559 |
+
1
|
| 560 |
+
],
|
| 561 |
+
[
|
| 562 |
+
0,
|
| 563 |
+
0,
|
| 564 |
+
0,
|
| 565 |
+
0,
|
| 566 |
+
8,
|
| 567 |
+
108,
|
| 568 |
+
0
|
| 569 |
+
],
|
| 570 |
+
[
|
| 571 |
+
29,
|
| 572 |
+
37,
|
| 573 |
+
42,
|
| 574 |
+
0,
|
| 575 |
+
22,
|
| 576 |
+
25,
|
| 577 |
+
233
|
| 578 |
+
]
|
| 579 |
+
]
|
| 580 |
+
},
|
| 581 |
+
"macro_roc_auc_ovr": 0.9752322842014612
|
| 582 |
+
},
|
| 583 |
+
"delta_accuracy": 0.0028571428571428914,
|
| 584 |
+
"delta_macro_f1": 0.004660420299710921
|
| 585 |
+
},
|
| 586 |
+
"no_engineered": {
|
| 587 |
+
"n_features": 32,
|
| 588 |
+
"dropped_count": 5,
|
| 589 |
+
"metrics": {
|
| 590 |
+
"model": "xgboost_no_engineered",
|
| 591 |
+
"accuracy": 0.8590476190476191,
|
| 592 |
+
"macro_f1": 0.7619124928932358,
|
| 593 |
+
"weighted_f1": 0.859520574191734,
|
| 594 |
+
"per_class_f1": {
|
| 595 |
+
"reconnaissance": 0.8825622775800712,
|
| 596 |
+
"feature_space_probe": 0.7682119205298014,
|
| 597 |
+
"perturbation_craft": 0.4946236559139785,
|
| 598 |
+
"evasion_attempt": 0.9957285239677266,
|
| 599 |
+
"feedback_adaptation": 0.703030303030303,
|
| 600 |
+
"campaign_consolidation": 0.8,
|
| 601 |
+
"idle_dwell": 0.6892307692307692
|
| 602 |
+
},
|
| 603 |
+
"confusion_matrix": {
|
| 604 |
+
"labels": [
|
| 605 |
+
"reconnaissance",
|
| 606 |
+
"feature_space_probe",
|
| 607 |
+
"perturbation_craft",
|
| 608 |
+
"evasion_attempt",
|
| 609 |
+
"feedback_adaptation",
|
| 610 |
+
"campaign_consolidation",
|
| 611 |
+
"idle_dwell"
|
| 612 |
+
],
|
| 613 |
+
"matrix": [
|
| 614 |
+
[
|
| 615 |
+
124,
|
| 616 |
+
0,
|
| 617 |
+
0,
|
| 618 |
+
0,
|
| 619 |
+
0,
|
| 620 |
+
0,
|
| 621 |
+
4
|
| 622 |
+
],
|
| 623 |
+
[
|
| 624 |
+
0,
|
| 625 |
+
174,
|
| 626 |
+
45,
|
| 627 |
+
0,
|
| 628 |
+
0,
|
| 629 |
+
0,
|
| 630 |
+
1
|
| 631 |
+
],
|
| 632 |
+
[
|
| 633 |
+
0,
|
| 634 |
+
21,
|
| 635 |
+
69,
|
| 636 |
+
0,
|
| 637 |
+
0,
|
| 638 |
+
0,
|
| 639 |
+
25
|
| 640 |
+
],
|
| 641 |
+
[
|
| 642 |
+
0,
|
| 643 |
+
0,
|
| 644 |
+
2,
|
| 645 |
+
1049,
|
| 646 |
+
0,
|
| 647 |
+
0,
|
| 648 |
+
6
|
| 649 |
+
],
|
| 650 |
+
[
|
| 651 |
+
0,
|
| 652 |
+
0,
|
| 653 |
+
0,
|
| 654 |
+
0,
|
| 655 |
+
58,
|
| 656 |
+
17,
|
| 657 |
+
1
|
| 658 |
+
],
|
| 659 |
+
[
|
| 660 |
+
0,
|
| 661 |
+
0,
|
| 662 |
+
0,
|
| 663 |
+
0,
|
| 664 |
+
9,
|
| 665 |
+
106,
|
| 666 |
+
1
|
| 667 |
+
],
|
| 668 |
+
[
|
| 669 |
+
29,
|
| 670 |
+
38,
|
| 671 |
+
48,
|
| 672 |
+
1,
|
| 673 |
+
22,
|
| 674 |
+
26,
|
| 675 |
+
224
|
| 676 |
+
]
|
| 677 |
+
]
|
| 678 |
+
},
|
| 679 |
+
"macro_roc_auc_ovr": 0.9751320314704773
|
| 680 |
+
},
|
| 681 |
+
"delta_accuracy": 0.005238095238095264,
|
| 682 |
+
"delta_macro_f1": 0.007412269976503905
|
| 683 |
+
}
|
| 684 |
+
}
|
| 685 |
+
}
|
feature_engineering.py
ADDED
|
@@ -0,0 +1,399 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
feature_engineering.py
|
| 3 |
+
======================
|
| 4 |
+
|
| 5 |
+
Feature pipeline for the CYB011 baseline classifier.
|
| 6 |
+
|
| 7 |
+
Predicts `attack_phase` (7-class adversarial attack phase) from
|
| 8 |
+
per-timestep features on the CYB011 sample dataset.
|
| 9 |
+
|
| 10 |
+
CSV inputs:
|
| 11 |
+
attack_trajectories.csv (primary, per-timestep, 14,000 events)
|
| 12 |
+
network_topology.csv (per-segment registry, joined for defender
|
| 13 |
+
context features)
|
| 14 |
+
campaign_summary.csv (per-campaign summaries; reserved)
|
| 15 |
+
campaign_events.csv (discrete event log; reserved)
|
| 16 |
+
|
| 17 |
+
Target classes (7):
|
| 18 |
+
reconnaissance, feature_space_probe, perturbation_craft,
|
| 19 |
+
evasion_attempt, feedback_adaptation, campaign_consolidation,
|
| 20 |
+
idle_dwell
|
| 21 |
+
|
| 22 |
+
The CYB011 README describes a "6-phase adversarial state machine" but
|
| 23 |
+
the sample data has 7 phases — it adds `idle_dwell` (18% of events,
|
| 24 |
+
the second-largest class).
|
| 25 |
+
|
| 26 |
+
Group structure
|
| 27 |
+
---------------
|
| 28 |
+
200 campaigns x 70 timesteps = 14,000 events. Each campaign is a
|
| 29 |
+
sequential evasion attempt; events from the same campaign share
|
| 30 |
+
attacker, target segment, and tier. Group-aware splitting by
|
| 31 |
+
`campaign_id` (~30 test campaigns per fold) prevents train/test
|
| 32 |
+
contamination.
|
| 33 |
+
|
| 34 |
+
Leakage audit
|
| 35 |
+
-------------
|
| 36 |
+
Three columns dropped from features because they're outcome leaks
|
| 37 |
+
for `attack_phase`:
|
| 38 |
+
|
| 39 |
+
1. `detection_outcome` (4-class categorical):
|
| 40 |
+
- `evasion_success` / `marginal_alert` / `high_confidence_alert`
|
| 41 |
+
ALL → 100% `evasion_attempt` phase
|
| 42 |
+
- `suppressed_alert` → can be any of the 7 phases
|
| 43 |
+
So detection_outcome != suppressed_alert is a perfect oracle for
|
| 44 |
+
evasion_attempt.
|
| 45 |
+
|
| 46 |
+
2. `detector_confidence_score`: deterministically derives detection
|
| 47 |
+
outcome via threshold boundaries (< 0.25 -> evasion_success,
|
| 48 |
+
[0.52, 0.78] -> marginal, >= 0.78 -> high_confidence). Same
|
| 49 |
+
leakage as detection_outcome.
|
| 50 |
+
|
| 51 |
+
3. `evasion_budget_consumed`: == 0 for 100% of {reconnaissance,
|
| 52 |
+
feature_space_probe, perturbation_craft} events. > 0 for the
|
| 53 |
+
other 4 phases. Perfect oracle for the 3 early phases.
|
| 54 |
+
|
| 55 |
+
KEPT as a legitimate observable:
|
| 56 |
+
|
| 57 |
+
- `timestep` is the per-event position in the campaign lifecycle.
|
| 58 |
+
It correlates with phase (reconnaissance is always early,
|
| 59 |
+
campaign_consolidation is always late) but is NOT a label-encoding
|
| 60 |
+
oracle — it's a real progress observable that a defender would have
|
| 61 |
+
at decision time. Adding +9pp accuracy when included is honest signal.
|
| 62 |
+
|
| 63 |
+
KEPT as a defender-context observable:
|
| 64 |
+
|
| 65 |
+
- `defender_architecture`, `detection_strength`, `adversarial_robustness`,
|
| 66 |
+
`ensemble_size`, `alert_threshold`, `detection_coverage`,
|
| 67 |
+
`feature_space_dim`, `retraining_cadence_days`, `trust_level`: all
|
| 68 |
+
per-segment topology features. They are deterministic per segment
|
| 69 |
+
(each topology row uniquely fingerprints its segment), but the
|
| 70 |
+
segment itself is real context — a defender knows its own
|
| 71 |
+
architecture. These features are NOT oracles for attack_phase (they
|
| 72 |
+
predict defender_architecture trivially, but defender_architecture
|
| 73 |
+
isn't our target).
|
| 74 |
+
|
| 75 |
+
Public API
|
| 76 |
+
----------
|
| 77 |
+
build_features(trajectories_path, topology_path)
|
| 78 |
+
-> (X, y, ids, groups, meta)
|
| 79 |
+
transform_single(record, meta, segment_lookup=None) -> np.ndarray
|
| 80 |
+
save_meta(meta, path) / load_meta(path)
|
| 81 |
+
build_segment_lookup(topology_path) -> dict
|
| 82 |
+
|
| 83 |
+
License
|
| 84 |
+
-------
|
| 85 |
+
Ships with the public model on Hugging Face under CC-BY-NC-4.0,
|
| 86 |
+
matching the dataset license. See README.md.
|
| 87 |
+
"""
|
| 88 |
+
|
| 89 |
+
from __future__ import annotations
|
| 90 |
+
|
| 91 |
+
import json
|
| 92 |
+
from pathlib import Path
|
| 93 |
+
from typing import Any
|
| 94 |
+
|
| 95 |
+
import numpy as np
|
| 96 |
+
import pandas as pd
|
| 97 |
+
|
| 98 |
+
# ---------------------------------------------------------------------------
|
| 99 |
+
# Label space
|
| 100 |
+
# ---------------------------------------------------------------------------
|
| 101 |
+
|
| 102 |
+
# Ordered by attack lifecycle progression.
|
| 103 |
+
LABEL_ORDER = [
|
| 104 |
+
"reconnaissance",
|
| 105 |
+
"feature_space_probe",
|
| 106 |
+
"perturbation_craft",
|
| 107 |
+
"evasion_attempt",
|
| 108 |
+
"feedback_adaptation",
|
| 109 |
+
"campaign_consolidation",
|
| 110 |
+
"idle_dwell",
|
| 111 |
+
]
|
| 112 |
+
LABEL_TO_INT = {lbl: i for i, lbl in enumerate(LABEL_ORDER)}
|
| 113 |
+
INT_TO_LABEL = {i: lbl for lbl, i in LABEL_TO_INT.items()}
|
| 114 |
+
|
| 115 |
+
# ---------------------------------------------------------------------------
|
| 116 |
+
# Identifier and target columns
|
| 117 |
+
# ---------------------------------------------------------------------------
|
| 118 |
+
|
| 119 |
+
ID_COLUMNS = [
|
| 120 |
+
"campaign_id", "attacker_id",
|
| 121 |
+
"target_segment_id", "segment_id", "detector_id",
|
| 122 |
+
]
|
| 123 |
+
TARGET_COLUMN = "attack_phase"
|
| 124 |
+
GROUP_COLUMN = "campaign_id"
|
| 125 |
+
|
| 126 |
+
# Outcome leaks dropped from features.
|
| 127 |
+
ORACLE_COLUMNS = [
|
| 128 |
+
"detection_outcome", # !=suppressed -> 100% evasion_attempt
|
| 129 |
+
"detector_confidence_score",# threshold-derived from detection_outcome
|
| 130 |
+
"evasion_budget_consumed", # ==0 -> 100% one of 3 early phases
|
| 131 |
+
]
|
| 132 |
+
|
| 133 |
+
# ---------------------------------------------------------------------------
|
| 134 |
+
# Per-timestep numeric features
|
| 135 |
+
# ---------------------------------------------------------------------------
|
| 136 |
+
|
| 137 |
+
EVENT_NUMERIC_FEATURES = [
|
| 138 |
+
"timestep", # kept: legitimate campaign-progress observable
|
| 139 |
+
"perturbation_magnitude",
|
| 140 |
+
"feature_delta_l2_norm",
|
| 141 |
+
"feature_delta_linf_norm",
|
| 142 |
+
"query_count_cumulative",
|
| 143 |
+
]
|
| 144 |
+
|
| 145 |
+
EVENT_CATEGORICAL_FEATURES = [
|
| 146 |
+
"attacker_capability_tier", # 3 values in sample (script_kiddie, opportunistic, APT)
|
| 147 |
+
]
|
| 148 |
+
|
| 149 |
+
# ---------------------------------------------------------------------------
|
| 150 |
+
# Segment / topology features (joined on target_segment_id)
|
| 151 |
+
# ---------------------------------------------------------------------------
|
| 152 |
+
|
| 153 |
+
SEGMENT_NUMERIC_FEATURES = [
|
| 154 |
+
"trust_level",
|
| 155 |
+
"detection_coverage",
|
| 156 |
+
"feature_space_dim",
|
| 157 |
+
"alert_threshold",
|
| 158 |
+
"retraining_cadence_days",
|
| 159 |
+
"ensemble_size",
|
| 160 |
+
"detection_strength",
|
| 161 |
+
"adversarial_robustness",
|
| 162 |
+
]
|
| 163 |
+
|
| 164 |
+
SEGMENT_CATEGORICAL_FEATURES = [
|
| 165 |
+
"segment_type", # 8 values
|
| 166 |
+
"defender_architecture", # 8 values
|
| 167 |
+
]
|
| 168 |
+
|
| 169 |
+
|
| 170 |
+
# ---------------------------------------------------------------------------
|
| 171 |
+
# Engineered features
|
| 172 |
+
# ---------------------------------------------------------------------------
|
| 173 |
+
|
| 174 |
+
def _add_engineered_features(df: pd.DataFrame) -> pd.DataFrame:
|
| 175 |
+
"""
|
| 176 |
+
Five engineered features encoding phase-discriminative hypotheses.
|
| 177 |
+
"""
|
| 178 |
+
df = df.copy()
|
| 179 |
+
|
| 180 |
+
# 1. Campaign progress fraction (timestep / 70). Normalizes the
|
| 181 |
+
# position-in-lifecycle signal.
|
| 182 |
+
if "timestep" in df.columns:
|
| 183 |
+
df["progress_frac"] = (df["timestep"] / 70.0).astype(float)
|
| 184 |
+
else:
|
| 185 |
+
df["progress_frac"] = 0.0
|
| 186 |
+
|
| 187 |
+
# 2. Log query intensity. Queries are heavy-tailed; some phases
|
| 188 |
+
# (reconnaissance, idle_dwell) have ~0 queries while
|
| 189 |
+
# evasion_attempt cumulates many.
|
| 190 |
+
df["log_queries"] = np.log1p(
|
| 191 |
+
df.get("query_count_cumulative", 0).clip(lower=0)
|
| 192 |
+
).astype(float)
|
| 193 |
+
|
| 194 |
+
# 3. Perturbation intensity: max(L2, Linf). Captures whether the
|
| 195 |
+
# attacker is actively perturbing inputs.
|
| 196 |
+
if "feature_delta_l2_norm" in df.columns and "feature_delta_linf_norm" in df.columns:
|
| 197 |
+
df["perturb_intensity"] = np.maximum(
|
| 198 |
+
df["feature_delta_l2_norm"].fillna(0),
|
| 199 |
+
df["feature_delta_linf_norm"].fillna(0),
|
| 200 |
+
).astype(float)
|
| 201 |
+
else:
|
| 202 |
+
df["perturb_intensity"] = 0.0
|
| 203 |
+
|
| 204 |
+
# 4. Defender weakness composite: low detection_strength + low
|
| 205 |
+
# adversarial_robustness = more evadable defender. Some phases
|
| 206 |
+
# (evasion_attempt) cluster on weaker defenders.
|
| 207 |
+
if "detection_strength" in df.columns and "adversarial_robustness" in df.columns:
|
| 208 |
+
df["defender_weakness"] = (
|
| 209 |
+
(1 - df["detection_strength"].fillna(0.5))
|
| 210 |
+
* (1 - df["adversarial_robustness"].fillna(0.5))
|
| 211 |
+
).astype(float)
|
| 212 |
+
else:
|
| 213 |
+
df["defender_weakness"] = 0.0
|
| 214 |
+
|
| 215 |
+
# 5. Query-per-timestep rate: indicates active probing vs idling.
|
| 216 |
+
if "query_count_cumulative" in df.columns and "timestep" in df.columns:
|
| 217 |
+
df["query_rate"] = (
|
| 218 |
+
df["query_count_cumulative"] / df["timestep"].clip(lower=1)
|
| 219 |
+
).astype(float)
|
| 220 |
+
else:
|
| 221 |
+
df["query_rate"] = 0.0
|
| 222 |
+
|
| 223 |
+
return df
|
| 224 |
+
|
| 225 |
+
|
| 226 |
+
# ---------------------------------------------------------------------------
|
| 227 |
+
# Public API
|
| 228 |
+
# ---------------------------------------------------------------------------
|
| 229 |
+
|
| 230 |
+
def build_features(
|
| 231 |
+
trajectories_path: str | Path,
|
| 232 |
+
topology_path: str | Path,
|
| 233 |
+
) -> tuple[pd.DataFrame, pd.Series, pd.Series, pd.Series, dict[str, Any]]:
|
| 234 |
+
"""
|
| 235 |
+
Load attack_trajectories.csv, join network_topology.csv, drop
|
| 236 |
+
target + identifiers + oracle columns, engineer features, one-hot
|
| 237 |
+
encode, return (X, y, ids, groups, meta).
|
| 238 |
+
"""
|
| 239 |
+
traj = pd.read_csv(trajectories_path)
|
| 240 |
+
topo = pd.read_csv(topology_path)
|
| 241 |
+
|
| 242 |
+
y = traj[TARGET_COLUMN].map(LABEL_TO_INT)
|
| 243 |
+
if y.isna().any():
|
| 244 |
+
bad = traj.loc[y.isna(), TARGET_COLUMN].unique()
|
| 245 |
+
raise ValueError(f"Unknown attack_phase values: {bad}")
|
| 246 |
+
y = y.astype(int)
|
| 247 |
+
ids = (
|
| 248 |
+
traj["campaign_id"].astype(str)
|
| 249 |
+
+ ":t"
|
| 250 |
+
+ traj["timestep"].astype(str)
|
| 251 |
+
)
|
| 252 |
+
groups = traj[GROUP_COLUMN].copy()
|
| 253 |
+
|
| 254 |
+
topo_cols_needed = (
|
| 255 |
+
["segment_id"]
|
| 256 |
+
+ SEGMENT_NUMERIC_FEATURES
|
| 257 |
+
+ SEGMENT_CATEGORICAL_FEATURES
|
| 258 |
+
)
|
| 259 |
+
traj = traj.merge(
|
| 260 |
+
topo[topo_cols_needed],
|
| 261 |
+
left_on="target_segment_id", right_on="segment_id",
|
| 262 |
+
how="left",
|
| 263 |
+
)
|
| 264 |
+
|
| 265 |
+
traj = _add_engineered_features(traj)
|
| 266 |
+
|
| 267 |
+
traj = traj.drop(
|
| 268 |
+
columns=ID_COLUMNS + [TARGET_COLUMN] + ORACLE_COLUMNS,
|
| 269 |
+
errors="ignore",
|
| 270 |
+
)
|
| 271 |
+
|
| 272 |
+
numeric_features = (
|
| 273 |
+
EVENT_NUMERIC_FEATURES
|
| 274 |
+
+ SEGMENT_NUMERIC_FEATURES
|
| 275 |
+
+ [
|
| 276 |
+
"progress_frac", "log_queries", "perturb_intensity",
|
| 277 |
+
"defender_weakness", "query_rate",
|
| 278 |
+
]
|
| 279 |
+
)
|
| 280 |
+
numeric_features = [c for c in numeric_features if c in traj.columns]
|
| 281 |
+
X_numeric = traj[numeric_features].astype(float)
|
| 282 |
+
|
| 283 |
+
all_categorical = EVENT_CATEGORICAL_FEATURES + SEGMENT_CATEGORICAL_FEATURES
|
| 284 |
+
categorical_levels: dict[str, list[str]] = {}
|
| 285 |
+
blocks: list[pd.DataFrame] = []
|
| 286 |
+
for col in all_categorical:
|
| 287 |
+
if col not in traj.columns:
|
| 288 |
+
continue
|
| 289 |
+
levels = sorted(traj[col].dropna().astype(str).unique().tolist())
|
| 290 |
+
categorical_levels[col] = levels
|
| 291 |
+
block = pd.get_dummies(
|
| 292 |
+
traj[col].astype(str).astype("category").cat.set_categories(levels),
|
| 293 |
+
prefix=col, dummy_na=False,
|
| 294 |
+
).astype(int)
|
| 295 |
+
blocks.append(block)
|
| 296 |
+
|
| 297 |
+
X = pd.concat(
|
| 298 |
+
[X_numeric.reset_index(drop=True)]
|
| 299 |
+
+ [b.reset_index(drop=True) for b in blocks],
|
| 300 |
+
axis=1,
|
| 301 |
+
).fillna(0.0)
|
| 302 |
+
|
| 303 |
+
meta = {
|
| 304 |
+
"feature_names": X.columns.tolist(),
|
| 305 |
+
"numeric_features": numeric_features,
|
| 306 |
+
"categorical_levels": categorical_levels,
|
| 307 |
+
"label_to_int": LABEL_TO_INT,
|
| 308 |
+
"int_to_label": INT_TO_LABEL,
|
| 309 |
+
"oracle_excluded": ORACLE_COLUMNS,
|
| 310 |
+
}
|
| 311 |
+
return X, y, ids, groups, meta
|
| 312 |
+
|
| 313 |
+
|
| 314 |
+
def transform_single(
|
| 315 |
+
record: dict | pd.DataFrame,
|
| 316 |
+
meta: dict[str, Any],
|
| 317 |
+
segment_lookup: dict | None = None,
|
| 318 |
+
) -> np.ndarray:
|
| 319 |
+
"""Encode a single trajectory record for inference."""
|
| 320 |
+
if isinstance(record, dict):
|
| 321 |
+
df = pd.DataFrame([record.copy()])
|
| 322 |
+
else:
|
| 323 |
+
df = record.copy()
|
| 324 |
+
|
| 325 |
+
if segment_lookup is not None and "target_segment_id" in df.columns:
|
| 326 |
+
seg_id = df["target_segment_id"].iloc[0]
|
| 327 |
+
seg_feats = segment_lookup.get(seg_id, {})
|
| 328 |
+
for k, v in seg_feats.items():
|
| 329 |
+
if k not in df.columns:
|
| 330 |
+
df[k] = v
|
| 331 |
+
|
| 332 |
+
df = _add_engineered_features(df)
|
| 333 |
+
|
| 334 |
+
numeric = pd.DataFrame({
|
| 335 |
+
col: df.get(col, pd.Series([0.0] * len(df))).astype(float).values
|
| 336 |
+
for col in meta["numeric_features"]
|
| 337 |
+
})
|
| 338 |
+
blocks: list[pd.DataFrame] = [numeric]
|
| 339 |
+
for col, levels in meta["categorical_levels"].items():
|
| 340 |
+
val = df.get(col, pd.Series([None] * len(df))).astype(str)
|
| 341 |
+
block = pd.get_dummies(
|
| 342 |
+
val.astype("category").cat.set_categories(levels),
|
| 343 |
+
prefix=col, dummy_na=False,
|
| 344 |
+
).astype(int)
|
| 345 |
+
for lvl in levels:
|
| 346 |
+
cname = f"{col}_{lvl}"
|
| 347 |
+
if cname not in block.columns:
|
| 348 |
+
block[cname] = 0
|
| 349 |
+
block = block[[f"{col}_{lvl}" for lvl in levels]]
|
| 350 |
+
blocks.append(block)
|
| 351 |
+
|
| 352 |
+
X = pd.concat(blocks, axis=1).fillna(0.0)
|
| 353 |
+
X = X.reindex(columns=meta["feature_names"], fill_value=0.0)
|
| 354 |
+
return X.values.astype(np.float32)
|
| 355 |
+
|
| 356 |
+
|
| 357 |
+
def save_meta(meta: dict[str, Any], path: str | Path) -> None:
|
| 358 |
+
serializable = {
|
| 359 |
+
"feature_names": meta["feature_names"],
|
| 360 |
+
"numeric_features": meta["numeric_features"],
|
| 361 |
+
"categorical_levels": meta["categorical_levels"],
|
| 362 |
+
"label_to_int": meta["label_to_int"],
|
| 363 |
+
"int_to_label": {str(k): v for k, v in meta["int_to_label"].items()},
|
| 364 |
+
"oracle_excluded": meta.get("oracle_excluded", []),
|
| 365 |
+
}
|
| 366 |
+
with open(path, "w") as f:
|
| 367 |
+
json.dump(serializable, f, indent=2)
|
| 368 |
+
|
| 369 |
+
|
| 370 |
+
def load_meta(path: str | Path) -> dict[str, Any]:
|
| 371 |
+
with open(path) as f:
|
| 372 |
+
meta = json.load(f)
|
| 373 |
+
meta["int_to_label"] = {int(k): v for k, v in meta["int_to_label"].items()}
|
| 374 |
+
return meta
|
| 375 |
+
|
| 376 |
+
|
| 377 |
+
def build_segment_lookup(topology_path: str | Path) -> dict[str, dict]:
|
| 378 |
+
"""Build {segment_id: {segment feature values}} for inference."""
|
| 379 |
+
topo = pd.read_csv(topology_path)
|
| 380 |
+
cols = SEGMENT_NUMERIC_FEATURES + SEGMENT_CATEGORICAL_FEATURES
|
| 381 |
+
out = {}
|
| 382 |
+
for _, row in topo.iterrows():
|
| 383 |
+
out[row["segment_id"]] = {c: row[c] for c in cols if c in topo.columns}
|
| 384 |
+
return out
|
| 385 |
+
|
| 386 |
+
|
| 387 |
+
if __name__ == "__main__":
|
| 388 |
+
import sys
|
| 389 |
+
base = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("/mnt/user-data/uploads")
|
| 390 |
+
X, y, ids, groups, meta = build_features(
|
| 391 |
+
base / "attack_trajectories.csv",
|
| 392 |
+
base / "network_topology.csv",
|
| 393 |
+
)
|
| 394 |
+
print(f"X shape: {X.shape}")
|
| 395 |
+
print(f"y shape: {y.shape}")
|
| 396 |
+
print(f"groups: {groups.nunique()} unique campaigns")
|
| 397 |
+
print(f"n_features: {len(meta['feature_names'])}")
|
| 398 |
+
print(f"label distribution:\n{y.map(INT_TO_LABEL).value_counts()}")
|
| 399 |
+
print(f"X has NaN: {X.isnull().any().any()}")
|
feature_meta.json
ADDED
|
@@ -0,0 +1,111 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"feature_names": [
|
| 3 |
+
"timestep",
|
| 4 |
+
"perturbation_magnitude",
|
| 5 |
+
"feature_delta_l2_norm",
|
| 6 |
+
"feature_delta_linf_norm",
|
| 7 |
+
"query_count_cumulative",
|
| 8 |
+
"trust_level",
|
| 9 |
+
"detection_coverage",
|
| 10 |
+
"feature_space_dim",
|
| 11 |
+
"alert_threshold",
|
| 12 |
+
"retraining_cadence_days",
|
| 13 |
+
"ensemble_size",
|
| 14 |
+
"detection_strength",
|
| 15 |
+
"adversarial_robustness",
|
| 16 |
+
"progress_frac",
|
| 17 |
+
"log_queries",
|
| 18 |
+
"perturb_intensity",
|
| 19 |
+
"defender_weakness",
|
| 20 |
+
"query_rate",
|
| 21 |
+
"attacker_capability_tier_advanced_persistent_threat",
|
| 22 |
+
"attacker_capability_tier_opportunistic",
|
| 23 |
+
"attacker_capability_tier_script_kiddie",
|
| 24 |
+
"segment_type_cloud_workload",
|
| 25 |
+
"segment_type_corporate_lan",
|
| 26 |
+
"segment_type_data_exfiltration_target",
|
| 27 |
+
"segment_type_dmz_perimeter",
|
| 28 |
+
"segment_type_endpoint_fleet",
|
| 29 |
+
"segment_type_ot_ics_control_network",
|
| 30 |
+
"segment_type_soc_management_plane",
|
| 31 |
+
"segment_type_zero_trust_segment",
|
| 32 |
+
"defender_architecture_autoencoder_anomaly",
|
| 33 |
+
"defender_architecture_ensemble_stacked",
|
| 34 |
+
"defender_architecture_gradient_boosted_tree",
|
| 35 |
+
"defender_architecture_isolation_forest",
|
| 36 |
+
"defender_architecture_lstm_behavioural",
|
| 37 |
+
"defender_architecture_neural_network_dense",
|
| 38 |
+
"defender_architecture_rule_based_threshold",
|
| 39 |
+
"defender_architecture_transformer_sequence"
|
| 40 |
+
],
|
| 41 |
+
"numeric_features": [
|
| 42 |
+
"timestep",
|
| 43 |
+
"perturbation_magnitude",
|
| 44 |
+
"feature_delta_l2_norm",
|
| 45 |
+
"feature_delta_linf_norm",
|
| 46 |
+
"query_count_cumulative",
|
| 47 |
+
"trust_level",
|
| 48 |
+
"detection_coverage",
|
| 49 |
+
"feature_space_dim",
|
| 50 |
+
"alert_threshold",
|
| 51 |
+
"retraining_cadence_days",
|
| 52 |
+
"ensemble_size",
|
| 53 |
+
"detection_strength",
|
| 54 |
+
"adversarial_robustness",
|
| 55 |
+
"progress_frac",
|
| 56 |
+
"log_queries",
|
| 57 |
+
"perturb_intensity",
|
| 58 |
+
"defender_weakness",
|
| 59 |
+
"query_rate"
|
| 60 |
+
],
|
| 61 |
+
"categorical_levels": {
|
| 62 |
+
"attacker_capability_tier": [
|
| 63 |
+
"advanced_persistent_threat",
|
| 64 |
+
"opportunistic",
|
| 65 |
+
"script_kiddie"
|
| 66 |
+
],
|
| 67 |
+
"segment_type": [
|
| 68 |
+
"cloud_workload",
|
| 69 |
+
"corporate_lan",
|
| 70 |
+
"data_exfiltration_target",
|
| 71 |
+
"dmz_perimeter",
|
| 72 |
+
"endpoint_fleet",
|
| 73 |
+
"ot_ics_control_network",
|
| 74 |
+
"soc_management_plane",
|
| 75 |
+
"zero_trust_segment"
|
| 76 |
+
],
|
| 77 |
+
"defender_architecture": [
|
| 78 |
+
"autoencoder_anomaly",
|
| 79 |
+
"ensemble_stacked",
|
| 80 |
+
"gradient_boosted_tree",
|
| 81 |
+
"isolation_forest",
|
| 82 |
+
"lstm_behavioural",
|
| 83 |
+
"neural_network_dense",
|
| 84 |
+
"rule_based_threshold",
|
| 85 |
+
"transformer_sequence"
|
| 86 |
+
]
|
| 87 |
+
},
|
| 88 |
+
"label_to_int": {
|
| 89 |
+
"reconnaissance": 0,
|
| 90 |
+
"feature_space_probe": 1,
|
| 91 |
+
"perturbation_craft": 2,
|
| 92 |
+
"evasion_attempt": 3,
|
| 93 |
+
"feedback_adaptation": 4,
|
| 94 |
+
"campaign_consolidation": 5,
|
| 95 |
+
"idle_dwell": 6
|
| 96 |
+
},
|
| 97 |
+
"int_to_label": {
|
| 98 |
+
"0": "reconnaissance",
|
| 99 |
+
"1": "feature_space_probe",
|
| 100 |
+
"2": "perturbation_craft",
|
| 101 |
+
"3": "evasion_attempt",
|
| 102 |
+
"4": "feedback_adaptation",
|
| 103 |
+
"5": "campaign_consolidation",
|
| 104 |
+
"6": "idle_dwell"
|
| 105 |
+
},
|
| 106 |
+
"oracle_excluded": [
|
| 107 |
+
"detection_outcome",
|
| 108 |
+
"detector_confidence_score",
|
| 109 |
+
"evasion_budget_consumed"
|
| 110 |
+
]
|
| 111 |
+
}
|
feature_scaler.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"mean": [35.5, 0.18952780945529293, 1.306543723946557, 0.13124837821171634, 22.172045220966083, 0.6039635149023638, 0.7373798047276464, 83.58273381294964, 0.6482954779033917, 29.53186022610483, 1.9398766700924974, 0.7428622816032887, 0.5362281603288798, 0.5071428571428571, 2.7516034029559706, 1.306543723946557, 0.1348483556012333, 0.5364089872594734, 0.10071942446043165, 0.381294964028777, 0.5179856115107914, 0.18088386433710174, 0.12435765673175746, 0.1264131551901336, 0.10996916752312436, 0.11664953751284686, 0.11613566289825282, 0.12281603288797534, 0.10277492291880781, 0.17317574511819117, 0.14131551901336073, 0.10123329907502569, 0.11459403905447071, 0.09198355601233299, 0.11356628982528263, 0.17060637204522097, 0.09352517985611511], "std": [20.206235725016693, 0.09167219921794667, 0.8164877299177017, 0.08880481196415957, 15.071372312011375, 0.1506360032773659, 0.11803727444435302, 18.394205279913415, 0.08031189726900136, 7.820137508750206, 1.280796109162329, 0.11317550157786087, 0.14072721772354296, 0.28866051035738133, 1.091565441029427, 0.8164877299177017, 0.09460859412712151, 0.2023285682155406, 0.3009723106774258, 0.48572972162326267, 0.4997020921518917, 0.38494171137992955, 0.33000609471109255, 0.3323314915581053, 0.31286739993889146, 0.3210187131250674, 0.3204039972341474, 0.3282427885969357, 0.30368028618535087, 0.37841858286034125, 0.3483646303082042, 0.3016528968584111, 0.3185477579797315, 0.2890175882967011, 0.3173000708343629, 0.37618397359867656, 0.2911819612538873]}
|
inference_example.ipynb
ADDED
|
@@ -0,0 +1,342 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"cells": [
|
| 3 |
+
{
|
| 4 |
+
"cell_type": "markdown",
|
| 5 |
+
"metadata": {},
|
| 6 |
+
"source": [
|
| 7 |
+
"# CYB011 Baseline Classifier — Inference Example\n",
|
| 8 |
+
"\n",
|
| 9 |
+
"End-to-end demo: load the trained XGBoost and PyTorch MLP models from the Hugging Face repo and predict the **adversarial attack phase** for a per-timestep trajectory record.\n",
|
| 10 |
+
"\n",
|
| 11 |
+
"**Models predict one of 7 phases:** `reconnaissance`, `feature_space_probe`, `perturbation_craft`, `evasion_attempt`, `feedback_adaptation`, `campaign_consolidation`, `idle_dwell`.\n",
|
| 12 |
+
"\n",
|
| 13 |
+
"**This is a baseline reference model**, not a production phase classifier. See the model card and **`leakage_diagnostic.json`** for the structural-leakage findings (6 oracle paths documented across the dataset, 4 README-suggested targets unlearnable after honest leak removal)."
|
| 14 |
+
]
|
| 15 |
+
},
|
| 16 |
+
{
|
| 17 |
+
"cell_type": "markdown",
|
| 18 |
+
"metadata": {},
|
| 19 |
+
"source": [
|
| 20 |
+
"## 1. Install dependencies"
|
| 21 |
+
]
|
| 22 |
+
},
|
| 23 |
+
{
|
| 24 |
+
"cell_type": "code",
|
| 25 |
+
"execution_count": null,
|
| 26 |
+
"metadata": {},
|
| 27 |
+
"outputs": [],
|
| 28 |
+
"source": [
|
| 29 |
+
"%pip install --quiet xgboost torch safetensors pandas numpy huggingface_hub"
|
| 30 |
+
]
|
| 31 |
+
},
|
| 32 |
+
{
|
| 33 |
+
"cell_type": "markdown",
|
| 34 |
+
"metadata": {},
|
| 35 |
+
"source": [
|
| 36 |
+
"## 2. Download model artifacts from Hugging Face"
|
| 37 |
+
]
|
| 38 |
+
},
|
| 39 |
+
{
|
| 40 |
+
"cell_type": "code",
|
| 41 |
+
"execution_count": null,
|
| 42 |
+
"metadata": {},
|
| 43 |
+
"outputs": [],
|
| 44 |
+
"source": [
|
| 45 |
+
"from huggingface_hub import hf_hub_download\n",
|
| 46 |
+
"\n",
|
| 47 |
+
"REPO_ID = \"xpertsystems/cyb011-baseline-classifier\"\n",
|
| 48 |
+
"\n",
|
| 49 |
+
"files = {}\n",
|
| 50 |
+
"for name in [\"model_xgb.json\", \"model_mlp.safetensors\",\n",
|
| 51 |
+
" \"feature_engineering.py\", \"feature_meta.json\",\n",
|
| 52 |
+
" \"feature_scaler.json\"]:\n",
|
| 53 |
+
" files[name] = hf_hub_download(repo_id=REPO_ID, filename=name)\n",
|
| 54 |
+
" print(f\" downloaded: {name}\")"
|
| 55 |
+
]
|
| 56 |
+
},
|
| 57 |
+
{
|
| 58 |
+
"cell_type": "code",
|
| 59 |
+
"execution_count": null,
|
| 60 |
+
"metadata": {},
|
| 61 |
+
"outputs": [],
|
| 62 |
+
"source": [
|
| 63 |
+
"import sys, os\n",
|
| 64 |
+
"fe_dir = os.path.dirname(files[\"feature_engineering.py\"])\n",
|
| 65 |
+
"if fe_dir not in sys.path:\n",
|
| 66 |
+
" sys.path.insert(0, fe_dir)\n",
|
| 67 |
+
"\n",
|
| 68 |
+
"from feature_engineering import (\n",
|
| 69 |
+
" transform_single, load_meta, build_segment_lookup, INT_TO_LABEL,\n",
|
| 70 |
+
")"
|
| 71 |
+
]
|
| 72 |
+
},
|
| 73 |
+
{
|
| 74 |
+
"cell_type": "markdown",
|
| 75 |
+
"metadata": {},
|
| 76 |
+
"source": [
|
| 77 |
+
"## 3. Load models and metadata"
|
| 78 |
+
]
|
| 79 |
+
},
|
| 80 |
+
{
|
| 81 |
+
"cell_type": "code",
|
| 82 |
+
"execution_count": null,
|
| 83 |
+
"metadata": {},
|
| 84 |
+
"outputs": [],
|
| 85 |
+
"source": [
|
| 86 |
+
"import json\n",
|
| 87 |
+
"import numpy as np\n",
|
| 88 |
+
"import torch\n",
|
| 89 |
+
"import torch.nn as nn\n",
|
| 90 |
+
"import xgboost as xgb\n",
|
| 91 |
+
"from safetensors.torch import load_file\n",
|
| 92 |
+
"\n",
|
| 93 |
+
"meta = load_meta(files[\"feature_meta.json\"])\n",
|
| 94 |
+
"with open(files[\"feature_scaler.json\"]) as f:\n",
|
| 95 |
+
" scaler = json.load(f)\n",
|
| 96 |
+
"\n",
|
| 97 |
+
"N_FEATURES = len(meta[\"feature_names\"])\n",
|
| 98 |
+
"N_CLASSES = len(meta[\"int_to_label\"])\n",
|
| 99 |
+
"print(f\"feature count: {N_FEATURES}\")\n",
|
| 100 |
+
"print(f\"class count: {N_CLASSES}\")\n",
|
| 101 |
+
"print(f\"label classes: {list(meta['int_to_label'].values())}\")\n",
|
| 102 |
+
"print(f\"\\noracle columns excluded (do not pass these to the model):\")\n",
|
| 103 |
+
"for c in meta.get(\"oracle_excluded\", []):\n",
|
| 104 |
+
" print(f\" - {c}\")"
|
| 105 |
+
]
|
| 106 |
+
},
|
| 107 |
+
{
|
| 108 |
+
"cell_type": "code",
|
| 109 |
+
"execution_count": null,
|
| 110 |
+
"metadata": {},
|
| 111 |
+
"outputs": [],
|
| 112 |
+
"source": [
|
| 113 |
+
"xgb_model = xgb.XGBClassifier()\n",
|
| 114 |
+
"xgb_model.load_model(files[\"model_xgb.json\"])\n",
|
| 115 |
+
"\n",
|
| 116 |
+
"# MLP architecture (must match training)\n",
|
| 117 |
+
"class PhaseMLP(nn.Module):\n",
|
| 118 |
+
" def __init__(self, n_features, n_classes=7, hidden1=128, hidden2=64, dropout=0.3):\n",
|
| 119 |
+
" super().__init__()\n",
|
| 120 |
+
" self.net = nn.Sequential(\n",
|
| 121 |
+
" nn.Linear(n_features, hidden1),\n",
|
| 122 |
+
" nn.BatchNorm1d(hidden1),\n",
|
| 123 |
+
" nn.ReLU(),\n",
|
| 124 |
+
" nn.Dropout(dropout),\n",
|
| 125 |
+
" nn.Linear(hidden1, hidden2),\n",
|
| 126 |
+
" nn.BatchNorm1d(hidden2),\n",
|
| 127 |
+
" nn.ReLU(),\n",
|
| 128 |
+
" nn.Dropout(dropout),\n",
|
| 129 |
+
" nn.Linear(hidden2, n_classes),\n",
|
| 130 |
+
" )\n",
|
| 131 |
+
" def forward(self, x):\n",
|
| 132 |
+
" return self.net(x)\n",
|
| 133 |
+
"\n",
|
| 134 |
+
"mlp_model = PhaseMLP(N_FEATURES, n_classes=N_CLASSES)\n",
|
| 135 |
+
"mlp_model.load_state_dict(load_file(files[\"model_mlp.safetensors\"]))\n",
|
| 136 |
+
"mlp_model.eval()\n",
|
| 137 |
+
"print(\"models loaded\")"
|
| 138 |
+
]
|
| 139 |
+
},
|
| 140 |
+
{
|
| 141 |
+
"cell_type": "markdown",
|
| 142 |
+
"metadata": {},
|
| 143 |
+
"source": [
|
| 144 |
+
"## 4. Load segment topology for defender-feature lookup\n",
|
| 145 |
+
"\n",
|
| 146 |
+
"The model uses segment context (defender_architecture, detection_strength, ensemble_size, etc.) as features. To predict on a new trajectory, we look up its segment features from the network_topology."
|
| 147 |
+
]
|
| 148 |
+
},
|
| 149 |
+
{
|
| 150 |
+
"cell_type": "code",
|
| 151 |
+
"execution_count": null,
|
| 152 |
+
"metadata": {},
|
| 153 |
+
"outputs": [],
|
| 154 |
+
"source": [
|
| 155 |
+
"from huggingface_hub import snapshot_download\n",
|
| 156 |
+
"\n",
|
| 157 |
+
"ds_path = snapshot_download(repo_id=\"xpertsystems/cyb011-sample\", repo_type=\"dataset\")\n",
|
| 158 |
+
"segment_lookup = build_segment_lookup(f\"{ds_path}/network_topology.csv\")\n",
|
| 159 |
+
"print(f\"loaded {len(segment_lookup)} segment records\")"
|
| 160 |
+
]
|
| 161 |
+
},
|
| 162 |
+
{
|
| 163 |
+
"cell_type": "markdown",
|
| 164 |
+
"metadata": {},
|
| 165 |
+
"source": [
|
| 166 |
+
"## 5. Prediction helper"
|
| 167 |
+
]
|
| 168 |
+
},
|
| 169 |
+
{
|
| 170 |
+
"cell_type": "code",
|
| 171 |
+
"execution_count": null,
|
| 172 |
+
"metadata": {},
|
| 173 |
+
"outputs": [],
|
| 174 |
+
"source": [
|
| 175 |
+
"MU = np.array(scaler[\"mean\"], dtype=np.float32)\n",
|
| 176 |
+
"SD = np.array(scaler[\"std\"], dtype=np.float32)\n",
|
| 177 |
+
"\n",
|
| 178 |
+
"def predict_attack_phase(record: dict) -> dict:\n",
|
| 179 |
+
" \"\"\"Predict the adversarial attack phase for one trajectory record.\n",
|
| 180 |
+
"\n",
|
| 181 |
+
" Note: do NOT include detection_outcome, detector_confidence_score,\n",
|
| 182 |
+
" or evasion_budget_consumed in the record. These were outcome leaks\n",
|
| 183 |
+
" in the training data and are excluded from the feature set.\n",
|
| 184 |
+
"\n",
|
| 185 |
+
" Segment features (defender_architecture, detection_strength, etc.)\n",
|
| 186 |
+
" are looked up from network_topology by target_segment_id.\n",
|
| 187 |
+
" \"\"\"\n",
|
| 188 |
+
" X = transform_single(record, meta, segment_lookup=segment_lookup)\n",
|
| 189 |
+
"\n",
|
| 190 |
+
" xgb_proba = xgb_model.predict_proba(X)[0]\n",
|
| 191 |
+
" xgb_label = INT_TO_LABEL[int(np.argmax(xgb_proba))]\n",
|
| 192 |
+
"\n",
|
| 193 |
+
" Xs = ((X - MU) / SD).astype(np.float32)\n",
|
| 194 |
+
" with torch.no_grad():\n",
|
| 195 |
+
" logits = mlp_model(torch.tensor(Xs))\n",
|
| 196 |
+
" mlp_proba = torch.softmax(logits, dim=1).numpy()[0]\n",
|
| 197 |
+
" mlp_label = INT_TO_LABEL[int(np.argmax(mlp_proba))]\n",
|
| 198 |
+
"\n",
|
| 199 |
+
" return {\n",
|
| 200 |
+
" \"xgboost\": {\n",
|
| 201 |
+
" \"label\": xgb_label,\n",
|
| 202 |
+
" \"probabilities\": {INT_TO_LABEL[i]: float(p) for i, p in enumerate(xgb_proba)},\n",
|
| 203 |
+
" },\n",
|
| 204 |
+
" \"mlp\": {\n",
|
| 205 |
+
" \"label\": mlp_label,\n",
|
| 206 |
+
" \"probabilities\": {INT_TO_LABEL[i]: float(p) for i, p in enumerate(mlp_proba)},\n",
|
| 207 |
+
" },\n",
|
| 208 |
+
" }"
|
| 209 |
+
]
|
| 210 |
+
},
|
| 211 |
+
{
|
| 212 |
+
"cell_type": "markdown",
|
| 213 |
+
"metadata": {},
|
| 214 |
+
"source": [
|
| 215 |
+
"## 6. Run on an example record\n",
|
| 216 |
+
"\n",
|
| 217 |
+
"Real APT-tier trajectory at timestep 21 (mid-campaign). True phase is `evasion_attempt` — the attacker has built up 11 queries and is actively perturbing inputs."
|
| 218 |
+
]
|
| 219 |
+
},
|
| 220 |
+
{
|
| 221 |
+
"cell_type": "code",
|
| 222 |
+
"execution_count": null,
|
| 223 |
+
"metadata": {},
|
| 224 |
+
"outputs": [],
|
| 225 |
+
"source": [
|
| 226 |
+
"# Real trajectory record from the sample dataset (true phase: evasion_attempt)\n",
|
| 227 |
+
"# Note: target_segment_id is supplied so segment features are auto-looked-up\n",
|
| 228 |
+
"example_record = {\n",
|
| 229 |
+
" \"target_segment_id\": \"SEG00197\",\n",
|
| 230 |
+
" \"timestep\": 21,\n",
|
| 231 |
+
" \"perturbation_magnitude\": 0.14152,\n",
|
| 232 |
+
" \"feature_delta_l2_norm\": 1.278436,\n",
|
| 233 |
+
" \"feature_delta_linf_norm\": 0.14152,\n",
|
| 234 |
+
" \"query_count_cumulative\": 11,\n",
|
| 235 |
+
" \"attacker_capability_tier\": \"advanced_persistent_threat\",\n",
|
| 236 |
+
"}\n",
|
| 237 |
+
"\n",
|
| 238 |
+
"result = predict_attack_phase(example_record)\n",
|
| 239 |
+
"\n",
|
| 240 |
+
"print(f\"XGBoost -> {result['xgboost']['label']}\")\n",
|
| 241 |
+
"for lbl, p in sorted(result['xgboost']['probabilities'].items(), key=lambda x: -x[1]):\n",
|
| 242 |
+
" print(f\" P({lbl:30s}) = {p:.4f}\")\n",
|
| 243 |
+
"\n",
|
| 244 |
+
"print(f\"\\nMLP -> {result['mlp']['label']}\")\n",
|
| 245 |
+
"for lbl, p in sorted(result['mlp']['probabilities'].items(), key=lambda x: -x[1]):\n",
|
| 246 |
+
" print(f\" P({lbl:30s}) = {p:.4f}\")"
|
| 247 |
+
]
|
| 248 |
+
},
|
| 249 |
+
{
|
| 250 |
+
"cell_type": "markdown",
|
| 251 |
+
"metadata": {},
|
| 252 |
+
"source": [
|
| 253 |
+
"### Per-class confidence patterns\n",
|
| 254 |
+
"\n",
|
| 255 |
+
"The model has strong confidence on `evasion_attempt` (per-class F1 1.00), `reconnaissance` (F1 0.89), and `campaign_consolidation` (F1 0.81) — these phases have distinctive feature signatures (query usage, timestep position, perturbation activity).\n",
|
| 256 |
+
"\n",
|
| 257 |
+
"The middle phases overlap more in feature space. `perturbation_craft` is the hardest class (F1 0.49) because its trajectory features look similar to `feature_space_probe` at the per-timestep level. A sequence model considering event ordering within campaigns would likely do better than per-timestep classification."
|
| 258 |
+
]
|
| 259 |
+
},
|
| 260 |
+
{
|
| 261 |
+
"cell_type": "markdown",
|
| 262 |
+
"metadata": {},
|
| 263 |
+
"source": [
|
| 264 |
+
"## 7. Batch prediction on the sample dataset"
|
| 265 |
+
]
|
| 266 |
+
},
|
| 267 |
+
{
|
| 268 |
+
"cell_type": "code",
|
| 269 |
+
"execution_count": null,
|
| 270 |
+
"metadata": {},
|
| 271 |
+
"outputs": [],
|
| 272 |
+
"source": [
|
| 273 |
+
"import pandas as pd\n",
|
| 274 |
+
"\n",
|
| 275 |
+
"trajectories = pd.read_csv(f\"{ds_path}/attack_trajectories.csv\")\n",
|
| 276 |
+
"\n",
|
| 277 |
+
"# Score the first 500 events\n",
|
| 278 |
+
"sample = trajectories.head(500).copy()\n",
|
| 279 |
+
"preds = [predict_attack_phase(row.to_dict())[\"xgboost\"][\"label\"] for _, row in sample.iterrows()]\n",
|
| 280 |
+
"sample[\"xgb_pred\"] = preds\n",
|
| 281 |
+
"\n",
|
| 282 |
+
"ct = pd.crosstab(sample[\"attack_phase\"], sample[\"xgb_pred\"],\n",
|
| 283 |
+
" rownames=[\"true\"], colnames=[\"pred\"])\n",
|
| 284 |
+
"print(\"Confusion on first 500 sample events (XGBoost):\")\n",
|
| 285 |
+
"print(ct)\n",
|
| 286 |
+
"acc = (sample[\"attack_phase\"] == sample[\"xgb_pred\"]).mean()\n",
|
| 287 |
+
"print(f\"\\nbatch accuracy on first 500 events (in-distribution): {acc:.4f}\")\n",
|
| 288 |
+
"print(\"\\nNote: this includes training-set events. See validation_results.json\\n\"\n",
|
| 289 |
+
" \"for proper held-out test metrics (group-aware split by campaign_id).\")"
|
| 290 |
+
]
|
| 291 |
+
},
|
| 292 |
+
{
|
| 293 |
+
"cell_type": "markdown",
|
| 294 |
+
"metadata": {},
|
| 295 |
+
"source": [
|
| 296 |
+
"## 8. Important reading: the leakage diagnostic\n",
|
| 297 |
+
"\n",
|
| 298 |
+
"Before using CYB011 sample data to train your own models, read **`leakage_diagnostic.json`** in this repo. It documents **6 oracle paths** across the sample's targets:\n",
|
| 299 |
+
"\n",
|
| 300 |
+
"**Phase target oracles (3 paths — dropped from features):**\n",
|
| 301 |
+
"1. `detection_outcome` (`!= suppressed_alert` → 100% `evasion_attempt`)\n",
|
| 302 |
+
"2. `detector_confidence_score` (threshold-derived from `detection_outcome`)\n",
|
| 303 |
+
"3. `evasion_budget_consumed` (`== 0` → 100% one of 3 early phases)\n",
|
| 304 |
+
"\n",
|
| 305 |
+
"**Other documented leaks (for transparency, not features for this model):**\n",
|
| 306 |
+
"4. `stealth_score` near-deterministic per `attacker_capability_tier` (campaign-level)\n",
|
| 307 |
+
"5. Topology fingerprint (7 segment-level features uniquely identify `defender_architecture`)\n",
|
| 308 |
+
"6. `timestep` partial oracle for 3 phases — **KEPT as legitimate campaign-progress observable**\n",
|
| 309 |
+
"\n",
|
| 310 |
+
"It also documents **4 README-suggested headline targets that are unlearnable on the sample** after honest leak removal: `campaign_success_flag`, `campaign_type` 8-class, `coordinated_attack_flag`, `defender_architecture` 8-class.\n",
|
| 311 |
+
"\n",
|
| 312 |
+
"And it documents the **missing `nation_state` attacker tier** — README claims 4 tiers, sample contains only 3 (script_kiddie, opportunistic, APT)."
|
| 313 |
+
]
|
| 314 |
+
},
|
| 315 |
+
{
|
| 316 |
+
"cell_type": "markdown",
|
| 317 |
+
"metadata": {},
|
| 318 |
+
"source": [
|
| 319 |
+
"## 9. Next steps\n",
|
| 320 |
+
"\n",
|
| 321 |
+
"- See `validation_results.json` for held-out test metrics (2,100 events from ~30 test campaigns).\n",
|
| 322 |
+
"- See `multi_seed_results.json` for the across-10-seeds picture (accuracy 0.867 ± 0.010, ROC-AUC 0.977 ± 0.002).\n",
|
| 323 |
+
"- See `ablation_results.json` for per-feature-group contribution. Perturbation features carry the most signal (−20pp accuracy when removed); query features second (−4pp).\n",
|
| 324 |
+
"- See **`leakage_diagnostic.json`** for the full 6-oracle-path audit and 4 unlearnable targets.\n",
|
| 325 |
+
"- For the full ~383k-row CYB011 dataset and commercial licensing, contact **pradeep@xpertsystems.ai**."
|
| 326 |
+
]
|
| 327 |
+
}
|
| 328 |
+
],
|
| 329 |
+
"metadata": {
|
| 330 |
+
"kernelspec": {
|
| 331 |
+
"display_name": "Python 3",
|
| 332 |
+
"language": "python",
|
| 333 |
+
"name": "python3"
|
| 334 |
+
},
|
| 335 |
+
"language_info": {
|
| 336 |
+
"name": "python",
|
| 337 |
+
"version": "3.10"
|
| 338 |
+
}
|
| 339 |
+
},
|
| 340 |
+
"nbformat": 4,
|
| 341 |
+
"nbformat_minor": 5
|
| 342 |
+
}
|
leakage_diagnostic.json
ADDED
|
@@ -0,0 +1,238 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"purpose": "CYB011 sample has multiple structural leakage patterns rooted in the generator's outcome-modeling logic. Three outcome columns (detection_outcome, detector_confidence_score, evasion_budget_consumed) are perfect or near-perfect oracles for attack_phase. Per-campaign features encode attacker_capability_tier via stealth_score. Per-segment topology features uniquely fingerprint each defender_architecture. The published baseline (attack_phase 7-class) trains with the three phase oracles excluded but retains timestep as a legitimate campaign-progress observable.",
|
| 3 |
+
"primary_target": "attack_phase (7-class, per-timestep)",
|
| 4 |
+
"split": "GroupShuffleSplit on campaign_id, 70/15/15 nested",
|
| 5 |
+
"missing_attacker_tier_note": {
|
| 6 |
+
"issue": "README claims 4 attacker_capability_tier values (script_kiddie, opportunistic, advanced_persistent_threat, nation_state). The sample data contains only 3: nation_state is entirely absent. Models trained on this sample cannot generalize to nation_state actors.",
|
| 7 |
+
"tier_counts_in_sample": {
|
| 8 |
+
"script_kiddie": 7000,
|
| 9 |
+
"opportunistic": 5600,
|
| 10 |
+
"advanced_persistent_threat": 1400
|
| 11 |
+
}
|
| 12 |
+
},
|
| 13 |
+
"oracle_paths_documented": {
|
| 14 |
+
"P1_detection_outcome": {
|
| 15 |
+
"target": "attack_phase",
|
| 16 |
+
"leak_column": "detection_outcome",
|
| 17 |
+
"mechanism": "Three of the four detection_outcome values (evasion_success, marginal_alert, high_confidence_alert) occur ONLY when attack_phase == 'evasion_attempt'. The fourth value (suppressed_alert) occurs across all 7 phases. So detection_outcome != suppressed_alert is a perfect oracle for evasion_attempt phase.",
|
| 18 |
+
"evidence_crosstab": {
|
| 19 |
+
"evasion_success": {
|
| 20 |
+
"campaign_consolidation": 0,
|
| 21 |
+
"evasion_attempt": 416,
|
| 22 |
+
"feature_space_probe": 0,
|
| 23 |
+
"feedback_adaptation": 0,
|
| 24 |
+
"idle_dwell": 0,
|
| 25 |
+
"perturbation_craft": 0,
|
| 26 |
+
"reconnaissance": 0
|
| 27 |
+
},
|
| 28 |
+
"high_confidence_alert": {
|
| 29 |
+
"campaign_consolidation": 0,
|
| 30 |
+
"evasion_attempt": 1102,
|
| 31 |
+
"feature_space_probe": 0,
|
| 32 |
+
"feedback_adaptation": 0,
|
| 33 |
+
"idle_dwell": 0,
|
| 34 |
+
"perturbation_craft": 0,
|
| 35 |
+
"reconnaissance": 0
|
| 36 |
+
},
|
| 37 |
+
"marginal_alert": {
|
| 38 |
+
"campaign_consolidation": 0,
|
| 39 |
+
"evasion_attempt": 3228,
|
| 40 |
+
"feature_space_probe": 0,
|
| 41 |
+
"feedback_adaptation": 0,
|
| 42 |
+
"idle_dwell": 0,
|
| 43 |
+
"perturbation_craft": 0,
|
| 44 |
+
"reconnaissance": 0
|
| 45 |
+
},
|
| 46 |
+
"suppressed_alert": {
|
| 47 |
+
"campaign_consolidation": 829,
|
| 48 |
+
"evasion_attempt": 2460,
|
| 49 |
+
"feature_space_probe": 1465,
|
| 50 |
+
"feedback_adaptation": 496,
|
| 51 |
+
"idle_dwell": 2450,
|
| 52 |
+
"perturbation_craft": 745,
|
| 53 |
+
"reconnaissance": 809
|
| 54 |
+
}
|
| 55 |
+
},
|
| 56 |
+
"verdict": "Perfect oracle for evasion_attempt (51% of all events)."
|
| 57 |
+
},
|
| 58 |
+
"P2_detector_confidence_score": {
|
| 59 |
+
"target": "attack_phase (via detection_outcome)",
|
| 60 |
+
"leak_column": "detector_confidence_score",
|
| 61 |
+
"mechanism": "detector_confidence_score is threshold-derived from detection_outcome: <0.25 -> evasion_success, [0.52,0.78] -> marginal_alert, >=0.78 -> high_confidence_alert. Non-overlapping ranges mean detection_outcome is mechanically decoded from this score, indirectly oracling attack_phase.",
|
| 62 |
+
"score_ranges_by_outcome": {
|
| 63 |
+
"evasion_success": {
|
| 64 |
+
"min": 0.001,
|
| 65 |
+
"max": 0.25,
|
| 66 |
+
"mean": 0.1801,
|
| 67 |
+
"std": 0.0553
|
| 68 |
+
},
|
| 69 |
+
"high_confidence_alert": {
|
| 70 |
+
"min": 0.7801,
|
| 71 |
+
"max": 0.999,
|
| 72 |
+
"mean": 0.8558,
|
| 73 |
+
"std": 0.0561
|
| 74 |
+
},
|
| 75 |
+
"marginal_alert": {
|
| 76 |
+
"min": 0.5201,
|
| 77 |
+
"max": 0.7797,
|
| 78 |
+
"mean": 0.6436,
|
| 79 |
+
"std": 0.0737
|
| 80 |
+
},
|
| 81 |
+
"suppressed_alert": {
|
| 82 |
+
"min": 0.001,
|
| 83 |
+
"max": 0.999,
|
| 84 |
+
"mean": 0.3992,
|
| 85 |
+
"std": 0.1817
|
| 86 |
+
}
|
| 87 |
+
},
|
| 88 |
+
"verdict": "Mechanical decoder for detection_outcome -> indirect oracle for phase."
|
| 89 |
+
},
|
| 90 |
+
"P3_evasion_budget_consumed_zero": {
|
| 91 |
+
"target": "attack_phase (3 early phases)",
|
| 92 |
+
"leak_column": "evasion_budget_consumed",
|
| 93 |
+
"mechanism": "evasion_budget_consumed == 0 occurs in 100% of {reconnaissance, feature_space_probe, perturbation_craft} events (the 3 early phases that don't submit evasion attempts). > 0 occurs in 100% of the 4 later phases.",
|
| 94 |
+
"early_phase_events_at_zero": 3019,
|
| 95 |
+
"verdict": "Perfect oracle for the 3 early phases."
|
| 96 |
+
},
|
| 97 |
+
"P4_stealth_score_to_tier": {
|
| 98 |
+
"target": "attacker_capability_tier (campaign level)",
|
| 99 |
+
"leak_column": "stealth_score",
|
| 100 |
+
"mechanism": "stealth_score has tier-discriminative ranges with modest overlap: APT in [0.806, 0.938] (mean 0.912), opportunistic in [0.751, 0.924] (mean 0.882), script_kiddie in [0.715, 0.950] (mean 0.846). Drives per-campaign tier prediction to 0.94 accuracy vs 0.50 majority - artificially inflated.",
|
| 101 |
+
"stealth_ranges_by_tier": {
|
| 102 |
+
"advanced_persistent_threat": {
|
| 103 |
+
"min": 0.806,
|
| 104 |
+
"max": 0.938,
|
| 105 |
+
"mean": 0.9116,
|
| 106 |
+
"std": 0.0277
|
| 107 |
+
},
|
| 108 |
+
"opportunistic": {
|
| 109 |
+
"min": 0.7508,
|
| 110 |
+
"max": 0.9236,
|
| 111 |
+
"mean": 0.8816,
|
| 112 |
+
"std": 0.0359
|
| 113 |
+
},
|
| 114 |
+
"script_kiddie": {
|
| 115 |
+
"min": 0.7148,
|
| 116 |
+
"max": 0.95,
|
| 117 |
+
"mean": 0.8456,
|
| 118 |
+
"std": 0.0462
|
| 119 |
+
}
|
| 120 |
+
},
|
| 121 |
+
"verdict": "Near-deterministic per-tier feature. Per-campaign tier prediction is structurally inflated by this leak."
|
| 122 |
+
},
|
| 123 |
+
"P5_topology_fingerprint": {
|
| 124 |
+
"target": "defender_architecture",
|
| 125 |
+
"leak_column": "(combination of 7 topology features)",
|
| 126 |
+
"mechanism": "Each defender_architecture has detection_strength and adversarial_robustness as a CONSTANT (std = 0.0 across all rows of that architecture). Combined with ranges of ensemble_size, alert_threshold, detection_coverage, feature_space_dim, and retraining_cadence_days, each topology row uniquely fingerprints its defender. The 8-class defender_architecture target hits 100% accuracy via this combination.",
|
| 127 |
+
"detection_strength_std_within_arch": {
|
| 128 |
+
"autoencoder_anomaly": 0.0,
|
| 129 |
+
"ensemble_stacked": 0.0,
|
| 130 |
+
"gradient_boosted_tree": 0.0,
|
| 131 |
+
"isolation_forest": 0.0,
|
| 132 |
+
"lstm_behavioural": 0.0,
|
| 133 |
+
"neural_network_dense": 0.0,
|
| 134 |
+
"rule_based_threshold": 0.0,
|
| 135 |
+
"transformer_sequence": 0.0
|
| 136 |
+
},
|
| 137 |
+
"adversarial_robustness_std_within_arch": {
|
| 138 |
+
"autoencoder_anomaly": 0.0,
|
| 139 |
+
"ensemble_stacked": 0.0,
|
| 140 |
+
"gradient_boosted_tree": 0.0,
|
| 141 |
+
"isolation_forest": 0.0,
|
| 142 |
+
"lstm_behavioural": 0.0,
|
| 143 |
+
"neural_network_dense": 0.0,
|
| 144 |
+
"rule_based_threshold": 0.0,
|
| 145 |
+
"transformer_sequence": 0.0
|
| 146 |
+
},
|
| 147 |
+
"verdict": "Trivially leaky 8-class target. Each segment row uniquely identifies its defender architecture by feature combination."
|
| 148 |
+
},
|
| 149 |
+
"P6_timestep_partial": {
|
| 150 |
+
"target": "attack_phase (partial)",
|
| 151 |
+
"leak_column": "timestep",
|
| 152 |
+
"mechanism": "Phases have characteristic timestep ranges due to the sequential lifecycle structure. reconnaissance is timestep 1-7 (mean 3.16), campaign_consolidation is 65-70 (mean 67.96), feedback_adaptation is 63-66 (mean 64.15). The middle phases overlap broadly. NOTE: timestep is KEPT as a feature in the published model because it's a legitimate campaign-progress observable a defender would have at decision time. Documenting here for transparency: removing timestep drops headline accuracy by ~9pp (0.87 -> 0.78).",
|
| 153 |
+
"timestep_ranges_by_phase": {
|
| 154 |
+
"campaign_consolidation": {
|
| 155 |
+
"min": 65,
|
| 156 |
+
"max": 70,
|
| 157 |
+
"mean": 67.96
|
| 158 |
+
},
|
| 159 |
+
"evasion_attempt": {
|
| 160 |
+
"min": 11,
|
| 161 |
+
"max": 62,
|
| 162 |
+
"mean": 40.32
|
| 163 |
+
},
|
| 164 |
+
"feature_space_probe": {
|
| 165 |
+
"min": 4,
|
| 166 |
+
"max": 35,
|
| 167 |
+
"mean": 11.29
|
| 168 |
+
},
|
| 169 |
+
"feedback_adaptation": {
|
| 170 |
+
"min": 63,
|
| 171 |
+
"max": 66,
|
| 172 |
+
"mean": 64.15
|
| 173 |
+
},
|
| 174 |
+
"idle_dwell": {
|
| 175 |
+
"min": 1,
|
| 176 |
+
"max": 70,
|
| 177 |
+
"mean": 35.44
|
| 178 |
+
},
|
| 179 |
+
"perturbation_craft": {
|
| 180 |
+
"min": 8,
|
| 181 |
+
"max": 38,
|
| 182 |
+
"mean": 16.65
|
| 183 |
+
},
|
| 184 |
+
"reconnaissance": {
|
| 185 |
+
"min": 1,
|
| 186 |
+
"max": 7,
|
| 187 |
+
"mean": 3.16
|
| 188 |
+
}
|
| 189 |
+
},
|
| 190 |
+
"verdict": "Partial oracle for 3 phases (reconnaissance, feedback_adaptation, campaign_consolidation). KEPT as legitimate progress feature."
|
| 191 |
+
}
|
| 192 |
+
},
|
| 193 |
+
"unlearnable_targets": [
|
| 194 |
+
{
|
| 195 |
+
"target": "campaign_success_flag (per-campaign)",
|
| 196 |
+
"n_campaigns": 200,
|
| 197 |
+
"majority_baseline": 0.605,
|
| 198 |
+
"honest_accuracy": 0.5111111111111111,
|
| 199 |
+
"honest_roc_auc": 0.48765432098765427,
|
| 200 |
+
"verdict": "below_majority"
|
| 201 |
+
},
|
| 202 |
+
{
|
| 203 |
+
"target": "campaign_type (per-campaign)",
|
| 204 |
+
"n_campaigns": 200,
|
| 205 |
+
"majority_baseline": 0.17,
|
| 206 |
+
"honest_accuracy": 0.11111111111111112,
|
| 207 |
+
"honest_roc_auc": 0.48226979604757386,
|
| 208 |
+
"verdict": "below_majority"
|
| 209 |
+
},
|
| 210 |
+
{
|
| 211 |
+
"target": "coordinated_attack_flag (per-campaign)",
|
| 212 |
+
"n_campaigns": 200,
|
| 213 |
+
"majority_baseline": 0.9,
|
| 214 |
+
"honest_accuracy": 0.8333333333333334,
|
| 215 |
+
"honest_roc_auc": 0.38271604938271603,
|
| 216 |
+
"verdict": "below_majority"
|
| 217 |
+
},
|
| 218 |
+
{
|
| 219 |
+
"target": "defender_architecture (per-campaign, all 7 topology fingerprint features dropped)",
|
| 220 |
+
"n_campaigns": 200,
|
| 221 |
+
"majority_baseline": 0.17,
|
| 222 |
+
"honest_accuracy": 0.13333333333333333,
|
| 223 |
+
"honest_roc_auc": 0.5770656344684122,
|
| 224 |
+
"verdict": "below_majority",
|
| 225 |
+
"note": "With all 7 topology fingerprint features included, defender_architecture hits 100% trivially. With all 7 dropped, performance collapses to or below majority. The target is not learnable from the trajectory features themselves - only from the segment fingerprint."
|
| 226 |
+
}
|
| 227 |
+
],
|
| 228 |
+
"unlearnable_summary": "Four README-suggested headline targets are unlearnable on the sample after honest oracle removal: campaign_success_flag (acc ~0.51 vs maj 0.61), campaign_type 8-class (acc ~0.11 vs maj 0.17), coordinated_attack_flag (acc ~0.83 vs maj 0.90), and defender_architecture 8-class (trivially leaky via topology fingerprint; collapses when the fingerprint is dropped). Only attack_phase 7-class learns honestly with a respectable lift over majority.",
|
| 229 |
+
"recommendations_to_dataset_author": [
|
| 230 |
+
"Make detector_confidence_score have OVERLAPPING ranges across detection_outcome values. As shipped, the ranges are perfectly non-overlapping (high_confidence_alert >=0.78, marginal_alert [0.52, 0.78], evasion_success <0.25). This makes detection_outcome a mechanical function of the score.",
|
| 231 |
+
"Allow evasion_budget_consumed to be positive in some reconnaissance / feature_space_probe / perturbation_craft events. The current zero-only encoding creates a perfect oracle for these 3 phases.",
|
| 232 |
+
"Add per-tier feature noise. stealth_score has tier-discriminative ranges (APT >0.80, script_kiddie <0.95) but with substantial overlap. Tighten the noise so the per-campaign tier-attribution task isn't structurally inflated.",
|
| 233 |
+
"Add per-segment NOISE to detection_strength and adversarial_robustness. Currently these are CONSTANT per defender_architecture (std=0.0). Real systems have deployment-specific tuning, so these should vary within an architecture class.",
|
| 234 |
+
"Include the missing nation_state attacker tier in the sample. The README lists 4 tiers but the sample contains only 3. Buyers cannot validate nation_state-specific modeling on the sample.",
|
| 235 |
+
"Increase coordinated_attack positives in the sample (only 20 of 200 campaigns at 10%). With n=20 positives, the binary task has insufficient statistical power for honest evaluation.",
|
| 236 |
+
"For campaign_type 8-class, add stronger per-type feature signatures. Currently the 8 types are not discriminable from trajectory features at n=200 campaigns."
|
| 237 |
+
]
|
| 238 |
+
}
|
model_mlp.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:06e8b3f2322f2b94a9a376f77306c119fc335173f3979c893910a729be379bf1
|
| 3 |
+
size 58596
|
model_xgb.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
multi_seed_results.json
ADDED
|
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"purpose": "Multi-seed evaluation across 10 group-aware splits of the 14,000-event sample (200 campaigns).",
|
| 3 |
+
"seeds_evaluated": [
|
| 4 |
+
42,
|
| 5 |
+
7,
|
| 6 |
+
13,
|
| 7 |
+
17,
|
| 8 |
+
23,
|
| 9 |
+
31,
|
| 10 |
+
45,
|
| 11 |
+
99,
|
| 12 |
+
123,
|
| 13 |
+
200
|
| 14 |
+
],
|
| 15 |
+
"per_seed": [
|
| 16 |
+
{
|
| 17 |
+
"seed": 42,
|
| 18 |
+
"test_n_classes": 7,
|
| 19 |
+
"accuracy": 0.8642857142857143,
|
| 20 |
+
"macro_f1": 0.7693247628697397,
|
| 21 |
+
"macro_roc_auc_ovr": 0.9752868672798508
|
| 22 |
+
},
|
| 23 |
+
{
|
| 24 |
+
"seed": 7,
|
| 25 |
+
"test_n_classes": 7,
|
| 26 |
+
"accuracy": 0.8733333333333333,
|
| 27 |
+
"macro_f1": 0.7868555284450741,
|
| 28 |
+
"macro_roc_auc_ovr": 0.9786952359398997
|
| 29 |
+
},
|
| 30 |
+
{
|
| 31 |
+
"seed": 13,
|
| 32 |
+
"test_n_classes": 7,
|
| 33 |
+
"accuracy": 0.8752380952380953,
|
| 34 |
+
"macro_f1": 0.7750991458229394,
|
| 35 |
+
"macro_roc_auc_ovr": 0.9779387743730787
|
| 36 |
+
},
|
| 37 |
+
{
|
| 38 |
+
"seed": 17,
|
| 39 |
+
"test_n_classes": 7,
|
| 40 |
+
"accuracy": 0.8738095238095238,
|
| 41 |
+
"macro_f1": 0.7814925647016364,
|
| 42 |
+
"macro_roc_auc_ovr": 0.9776960470844541
|
| 43 |
+
},
|
| 44 |
+
{
|
| 45 |
+
"seed": 23,
|
| 46 |
+
"test_n_classes": 7,
|
| 47 |
+
"accuracy": 0.8838095238095238,
|
| 48 |
+
"macro_f1": 0.7978303920930874,
|
| 49 |
+
"macro_roc_auc_ovr": 0.9798719092961202
|
| 50 |
+
},
|
| 51 |
+
{
|
| 52 |
+
"seed": 31,
|
| 53 |
+
"test_n_classes": 7,
|
| 54 |
+
"accuracy": 0.8690476190476191,
|
| 55 |
+
"macro_f1": 0.7726664814609271,
|
| 56 |
+
"macro_roc_auc_ovr": 0.9759310226918093
|
| 57 |
+
},
|
| 58 |
+
{
|
| 59 |
+
"seed": 45,
|
| 60 |
+
"test_n_classes": 7,
|
| 61 |
+
"accuracy": 0.8519047619047619,
|
| 62 |
+
"macro_f1": 0.7504006897882468,
|
| 63 |
+
"macro_roc_auc_ovr": 0.9727919502752255
|
| 64 |
+
},
|
| 65 |
+
{
|
| 66 |
+
"seed": 99,
|
| 67 |
+
"test_n_classes": 7,
|
| 68 |
+
"accuracy": 0.8585714285714285,
|
| 69 |
+
"macro_f1": 0.7746640410602633,
|
| 70 |
+
"macro_roc_auc_ovr": 0.9769979540429897
|
| 71 |
+
},
|
| 72 |
+
{
|
| 73 |
+
"seed": 123,
|
| 74 |
+
"test_n_classes": 7,
|
| 75 |
+
"accuracy": 0.8533333333333334,
|
| 76 |
+
"macro_f1": 0.771942700676468,
|
| 77 |
+
"macro_roc_auc_ovr": 0.9738063729400632
|
| 78 |
+
},
|
| 79 |
+
{
|
| 80 |
+
"seed": 200,
|
| 81 |
+
"test_n_classes": 7,
|
| 82 |
+
"accuracy": 0.8652380952380953,
|
| 83 |
+
"macro_f1": 0.7668641323226082,
|
| 84 |
+
"macro_roc_auc_ovr": 0.9762239650477442
|
| 85 |
+
}
|
| 86 |
+
],
|
| 87 |
+
"aggregate": {
|
| 88 |
+
"accuracy_mean": 0.8668571428571428,
|
| 89 |
+
"accuracy_std": 0.009680145423468645,
|
| 90 |
+
"accuracy_min": 0.8519047619047619,
|
| 91 |
+
"accuracy_max": 0.8838095238095238,
|
| 92 |
+
"macro_f1_mean": 0.774714043924099,
|
| 93 |
+
"macro_f1_std": 0.011922910105924629,
|
| 94 |
+
"roc_auc_mean": 0.9765240098971235,
|
| 95 |
+
"roc_auc_std": 0.0020690216988592247
|
| 96 |
+
},
|
| 97 |
+
"published_artifact_seed": 42
|
| 98 |
+
}
|
validation_results.json
ADDED
|
@@ -0,0 +1,247 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"version": "1.0.0",
|
| 3 |
+
"dataset": "xpertsystems/cyb011-sample",
|
| 4 |
+
"task": "7-class attack_phase classification",
|
| 5 |
+
"baselines": {
|
| 6 |
+
"always_predict_majority_accuracy": 0.5033333333333333,
|
| 7 |
+
"majority_class": "evasion_attempt",
|
| 8 |
+
"random_guess_accuracy": 0.14285714285714285
|
| 9 |
+
},
|
| 10 |
+
"split": {
|
| 11 |
+
"strategy": "group-aware (GroupShuffleSplit on campaign_id, nested 70/15/15)",
|
| 12 |
+
"rationale": "200 campaigns x 70 timesteps each. Timesteps from the same campaign share attacker, target segment, and tier - so train/test contamination is a real risk with random splitting. ~30 test campaigns per fold.",
|
| 13 |
+
"events_train": 9730,
|
| 14 |
+
"events_val": 2170,
|
| 15 |
+
"events_test": 2100,
|
| 16 |
+
"seed": 42
|
| 17 |
+
},
|
| 18 |
+
"n_features": 37,
|
| 19 |
+
"label_classes": [
|
| 20 |
+
"reconnaissance",
|
| 21 |
+
"feature_space_probe",
|
| 22 |
+
"perturbation_craft",
|
| 23 |
+
"evasion_attempt",
|
| 24 |
+
"feedback_adaptation",
|
| 25 |
+
"campaign_consolidation",
|
| 26 |
+
"idle_dwell"
|
| 27 |
+
],
|
| 28 |
+
"class_distribution_train": {
|
| 29 |
+
"evasion_attempt": 5082,
|
| 30 |
+
"idle_dwell": 1677,
|
| 31 |
+
"feature_space_probe": 983,
|
| 32 |
+
"campaign_consolidation": 571,
|
| 33 |
+
"reconnaissance": 558,
|
| 34 |
+
"perturbation_craft": 511,
|
| 35 |
+
"feedback_adaptation": 348
|
| 36 |
+
},
|
| 37 |
+
"class_distribution_test": {
|
| 38 |
+
"evasion_attempt": 1057,
|
| 39 |
+
"idle_dwell": 388,
|
| 40 |
+
"feature_space_probe": 220,
|
| 41 |
+
"reconnaissance": 128,
|
| 42 |
+
"campaign_consolidation": 116,
|
| 43 |
+
"perturbation_craft": 115,
|
| 44 |
+
"feedback_adaptation": 76
|
| 45 |
+
},
|
| 46 |
+
"oracle_excluded_features": [
|
| 47 |
+
"detection_outcome (perfect oracle for evasion_attempt phase)",
|
| 48 |
+
"detector_confidence_score (mechanical decoder for detection_outcome)",
|
| 49 |
+
"evasion_budget_consumed (==0 is perfect oracle for 3 early phases)"
|
| 50 |
+
],
|
| 51 |
+
"timestep_kept_as_legitimate_feature": "timestep is KEPT as a feature. It's a partial oracle for 3 phases (reconnaissance, feedback_adaptation, campaign_consolidation) but is a legitimate campaign-progress observable a defender would have at decision time. Removing it drops accuracy by ~9pp.",
|
| 52 |
+
"leakage_audit_note": "See leakage_diagnostic.json for the full 6-oracle-path audit, 4 unlearnable README-suggested targets, and the missing nation_state attacker tier note.",
|
| 53 |
+
"models": {
|
| 54 |
+
"xgboost": {
|
| 55 |
+
"architecture": "Gradient-boosted decision trees, multi:softprob, 7 classes",
|
| 56 |
+
"framework": "xgboost",
|
| 57 |
+
"test_metrics": {
|
| 58 |
+
"model": "xgboost",
|
| 59 |
+
"accuracy": 0.8642857142857143,
|
| 60 |
+
"macro_f1": 0.7693247628697397,
|
| 61 |
+
"weighted_f1": 0.8650489644308249,
|
| 62 |
+
"per_class_f1": {
|
| 63 |
+
"reconnaissance": 0.8865248226950354,
|
| 64 |
+
"feature_space_probe": 0.7829977628635347,
|
| 65 |
+
"perturbation_craft": 0.4927536231884058,
|
| 66 |
+
"evasion_attempt": 0.9962013295346629,
|
| 67 |
+
"feedback_adaptation": 0.7151515151515152,
|
| 68 |
+
"campaign_consolidation": 0.8075471698113208,
|
| 69 |
+
"idle_dwell": 0.7040971168437026
|
| 70 |
+
},
|
| 71 |
+
"confusion_matrix": {
|
| 72 |
+
"labels": [
|
| 73 |
+
"reconnaissance",
|
| 74 |
+
"feature_space_probe",
|
| 75 |
+
"perturbation_craft",
|
| 76 |
+
"evasion_attempt",
|
| 77 |
+
"feedback_adaptation",
|
| 78 |
+
"campaign_consolidation",
|
| 79 |
+
"idle_dwell"
|
| 80 |
+
],
|
| 81 |
+
"matrix": [
|
| 82 |
+
[
|
| 83 |
+
125,
|
| 84 |
+
0,
|
| 85 |
+
0,
|
| 86 |
+
0,
|
| 87 |
+
0,
|
| 88 |
+
0,
|
| 89 |
+
3
|
| 90 |
+
],
|
| 91 |
+
[
|
| 92 |
+
0,
|
| 93 |
+
175,
|
| 94 |
+
43,
|
| 95 |
+
0,
|
| 96 |
+
0,
|
| 97 |
+
0,
|
| 98 |
+
2
|
| 99 |
+
],
|
| 100 |
+
[
|
| 101 |
+
0,
|
| 102 |
+
20,
|
| 103 |
+
68,
|
| 104 |
+
0,
|
| 105 |
+
0,
|
| 106 |
+
0,
|
| 107 |
+
27
|
| 108 |
+
],
|
| 109 |
+
[
|
| 110 |
+
0,
|
| 111 |
+
0,
|
| 112 |
+
2,
|
| 113 |
+
1049,
|
| 114 |
+
0,
|
| 115 |
+
0,
|
| 116 |
+
6
|
| 117 |
+
],
|
| 118 |
+
[
|
| 119 |
+
0,
|
| 120 |
+
0,
|
| 121 |
+
0,
|
| 122 |
+
0,
|
| 123 |
+
59,
|
| 124 |
+
16,
|
| 125 |
+
1
|
| 126 |
+
],
|
| 127 |
+
[
|
| 128 |
+
0,
|
| 129 |
+
0,
|
| 130 |
+
0,
|
| 131 |
+
0,
|
| 132 |
+
9,
|
| 133 |
+
107,
|
| 134 |
+
0
|
| 135 |
+
],
|
| 136 |
+
[
|
| 137 |
+
29,
|
| 138 |
+
32,
|
| 139 |
+
48,
|
| 140 |
+
0,
|
| 141 |
+
21,
|
| 142 |
+
26,
|
| 143 |
+
232
|
| 144 |
+
]
|
| 145 |
+
]
|
| 146 |
+
},
|
| 147 |
+
"macro_roc_auc_ovr": 0.9752868672798508
|
| 148 |
+
}
|
| 149 |
+
},
|
| 150 |
+
"mlp": {
|
| 151 |
+
"architecture": "PyTorch MLP, 37 -> 128 -> 64 -> 7, BatchNorm1d + ReLU + Dropout, weighted cross-entropy loss",
|
| 152 |
+
"framework": "pytorch",
|
| 153 |
+
"test_metrics": {
|
| 154 |
+
"model": "mlp",
|
| 155 |
+
"accuracy": 0.8385714285714285,
|
| 156 |
+
"macro_f1": 0.7344635260259678,
|
| 157 |
+
"weighted_f1": 0.8387834443096441,
|
| 158 |
+
"per_class_f1": {
|
| 159 |
+
"reconnaissance": 0.8737201365187713,
|
| 160 |
+
"feature_space_probe": 0.746606334841629,
|
| 161 |
+
"perturbation_craft": 0.49707602339181284,
|
| 162 |
+
"evasion_attempt": 0.9928537398761315,
|
| 163 |
+
"feedback_adaptation": 0.627906976744186,
|
| 164 |
+
"campaign_consolidation": 0.784452296819788,
|
| 165 |
+
"idle_dwell": 0.6186291739894552
|
| 166 |
+
},
|
| 167 |
+
"confusion_matrix": {
|
| 168 |
+
"labels": [
|
| 169 |
+
"reconnaissance",
|
| 170 |
+
"feature_space_probe",
|
| 171 |
+
"perturbation_craft",
|
| 172 |
+
"evasion_attempt",
|
| 173 |
+
"feedback_adaptation",
|
| 174 |
+
"campaign_consolidation",
|
| 175 |
+
"idle_dwell"
|
| 176 |
+
],
|
| 177 |
+
"matrix": [
|
| 178 |
+
[
|
| 179 |
+
128,
|
| 180 |
+
0,
|
| 181 |
+
0,
|
| 182 |
+
0,
|
| 183 |
+
0,
|
| 184 |
+
0,
|
| 185 |
+
0
|
| 186 |
+
],
|
| 187 |
+
[
|
| 188 |
+
0,
|
| 189 |
+
165,
|
| 190 |
+
55,
|
| 191 |
+
0,
|
| 192 |
+
0,
|
| 193 |
+
0,
|
| 194 |
+
0
|
| 195 |
+
],
|
| 196 |
+
[
|
| 197 |
+
5,
|
| 198 |
+
24,
|
| 199 |
+
85,
|
| 200 |
+
0,
|
| 201 |
+
0,
|
| 202 |
+
0,
|
| 203 |
+
1
|
| 204 |
+
],
|
| 205 |
+
[
|
| 206 |
+
0,
|
| 207 |
+
4,
|
| 208 |
+
2,
|
| 209 |
+
1042,
|
| 210 |
+
4,
|
| 211 |
+
1,
|
| 212 |
+
4
|
| 213 |
+
],
|
| 214 |
+
[
|
| 215 |
+
0,
|
| 216 |
+
0,
|
| 217 |
+
0,
|
| 218 |
+
0,
|
| 219 |
+
54,
|
| 220 |
+
22,
|
| 221 |
+
0
|
| 222 |
+
],
|
| 223 |
+
[
|
| 224 |
+
0,
|
| 225 |
+
0,
|
| 226 |
+
0,
|
| 227 |
+
0,
|
| 228 |
+
5,
|
| 229 |
+
111,
|
| 230 |
+
0
|
| 231 |
+
],
|
| 232 |
+
[
|
| 233 |
+
32,
|
| 234 |
+
29,
|
| 235 |
+
85,
|
| 236 |
+
0,
|
| 237 |
+
33,
|
| 238 |
+
33,
|
| 239 |
+
176
|
| 240 |
+
]
|
| 241 |
+
]
|
| 242 |
+
},
|
| 243 |
+
"macro_roc_auc_ovr": 0.9705026035482472
|
| 244 |
+
}
|
| 245 |
+
}
|
| 246 |
+
}
|
| 247 |
+
}
|