Initial release: attack_phase 7-class baseline + 6-oracle-path leakage diagnostic + missing tier note

03d64e5 verified 1 day ago

21 kB

	---
	license: cc-by-nc-4.0
	library_name: pytorch
	tags:
	- cybersecurity
	- adversarial-machine-learning
	- ai-security
	- adversarial-attacks
	- evasion-attacks
	- apt
	- tabular-classification
	- synthetic-data
	- xgboost
	- baseline
	- leakage-diagnostic
	pipeline_tag: tabular-classification
	base_model: []
	datasets:
	- xpertsystems/cyb011-sample
	metrics:
	- accuracy
	- f1
	- roc_auc
	model-index:
	- name: cyb011-baseline-classifier
	results:
	- task:
	type: tabular-classification
	name: 7-class adversarial attack phase classification
	dataset:
	type: xpertsystems/cyb011-sample
	name: CYB011 Synthetic AI Evasion Attack Trajectory Dataset (Sample)
	metrics:
	- type: roc_auc
	value: 0.9753
	name: Test macro ROC-AUC OvR (XGBoost, seed 42)
	- type: accuracy
	value: 0.8643
	name: Test accuracy (XGBoost, seed 42)
	- type: f1
	value: 0.7693
	name: Test macro-F1 (XGBoost, seed 42)
	- type: accuracy
	value: 0.867
	name: Multi-seed accuracy mean ± 0.010 (XGBoost, 10 seeds)
	- type: roc_auc
	value: 0.977
	name: Multi-seed ROC-AUC mean ± 0.002 (XGBoost, 10 seeds)
	---

	# CYB011 Baseline Classifier

	**Adversarial attack phase classifier (7-class) trained on the CYB011
	synthetic AI evasion attack trajectory sample. Predicts which of 7
	attack phases (`reconnaissance` / `feature_space_probe` /
	`perturbation_craft` / `evasion_attempt` / `feedback_adaptation` /
	`campaign_consolidation` / `idle_dwell`) a per-timestep trajectory
	event belongs to, from per-event features. ALSO ships a comprehensive
	`leakage_diagnostic.json` documenting 6 oracle paths discovered
	across the dataset's targets, 4 README-suggested targets that are
	unlearnable on the sample after honest leak removal, and the missing
	`nation_state` attacker tier.**

	> Read this first. This repo ships two related artifacts:
	> (1) a working baseline classifier for `attack_phase` (the dataset's
	> headline target), and (2) `leakage_diagnostic.json` documenting 6
	> separate oracle paths, 4 unlearnable targets, and one missing
	> attacker tier. Both files matter; the diagnostic is required reading
	> for anyone evaluating CYB011 for adversarial ML research.

	## Model overview

	\| Property \| Value \|
	\|---\|---\|
	\| Primary task \| 7-class `attack_phase` classification \|
	\| Secondary artifact \| `leakage_diagnostic.json` — 6 oracle paths + 4 unlearnable targets \|
	\| Training data \| `xpertsystems/cyb011-sample` (14,000 events / 200 campaigns) \|
	\| Models \| XGBoost + PyTorch MLP \|
	\| Input features \| 37 (after one-hot encoding) \|
	\| Split \| Group-aware (GroupShuffleSplit on `campaign_id`) \|
	\| Validation \| Single seed (artifact) + multi-seed aggregate across 10 seeds \|
	\| License \| CC-BY-NC-4.0 (matches dataset) \|
	\| Status \| Reference baseline + comprehensive leakage diagnostic \|

	## Why this task — and what was dropped

	The CYB011 README describes a "6-phase adversarial state machine."
	The actual sample data contains 7 phases — it adds `idle_dwell`
	as a class (18% of all events, the second-largest class). The
	published baseline trains on all 7.

	We piloted nine candidate targets and found:

	- `attack_phase` 7-class: strongest honest result. Acc 0.867 ±
	0.010, ROC-AUC 0.977 ± 0.002 (multi-seed). All 7 classes
	represented, per-class F1 range 0.49–1.00.

	- `attacker_capability_tier` 3-class (per-timestep): weak honest
	result (acc 0.68, mF1 0.64). The 3 tiers do not strongly
	distinguish each other at the per-timestep level — feature means
	are within ~1% across tiers.

	- `attacker_capability_tier` 3-class (per-campaign): hits acc 0.94
	but is structurally inflated by `stealth_score` leakage
	(near-deterministic ranges per tier). Documented in the diagnostic.

	- `detection_outcome` 4-class: hits 100% trivially via
	`detector_confidence_score` thresholds. Pure oracle.

	- `defender_architecture` 8-class: hits 100% trivially via the
	topology fingerprint (7 segment features uniquely identify each
	architecture). Collapses to acc 0.13 vs majority 0.17 when the
	fingerprint is dropped.

	- `campaign_success_flag` / `campaign_type` / `coordinated_attack_flag`:
	all below majority baseline at n=200 campaigns.

	### Three oracle columns dropped from features

	The phase task has three direct outcome-leak columns. Each is a perfect
	or near-perfect oracle for specific phases:

	\| Column \| Oracle relationship \|
	\|---\|---\|
	\| `detection_outcome` \| `!= suppressed_alert` → 100% `evasion_attempt` phase \|
	\| `detector_confidence_score` \| Threshold-derived from `detection_outcome` (<0.25 → evasion_success, [0.52,0.78] → marginal, ≥0.78 → high_confidence) \|
	\| `evasion_budget_consumed` \| `== 0` → 100% one of 3 early phases (reconnaissance, feature_space_probe, perturbation_craft) \|

	With these three columns present, a plain XGBoost achieves 100%
	accuracy. The published baseline trains with all three excluded.

	### `timestep` kept as a legitimate observable

	`timestep` is a partial oracle for 3 phases (reconnaissance is
	always timestep 1-7, feedback_adaptation is 63-66, campaign_consolidation
	is 65-70). It's kept in the feature set because campaign-progress
	position is a real observable a defender would have at decision time
	— it's not encoding the label, it's encoding the lifecycle position.

	Removing `timestep` drops headline accuracy by ~9pp (0.87 → 0.78).
	Documented in the diagnostic for transparency.

	Two model artifacts are published. They are designed to be used
	together:

	- `model_xgb.json` — gradient-boosted trees (higher F1)
	- `model_mlp.safetensors` — PyTorch MLP

	## Quick start

	```bash
	pip install xgboost torch safetensors pandas huggingface_hub
	```

	```python
	from huggingface_hub import hf_hub_download, snapshot_download
	import json, numpy as np, torch, xgboost as xgb
	from safetensors.torch import load_file

	REPO = "xpertsystems/cyb011-baseline-classifier"

	paths = {n: hf_hub_download(REPO, n) for n in [
	"model_xgb.json", "model_mlp.safetensors",
	"feature_engineering.py", "feature_meta.json", "feature_scaler.json",
	]}

	import sys, os
	sys.path.insert(0, os.path.dirname(paths["feature_engineering.py"]))
	from feature_engineering import (
	transform_single, load_meta, build_segment_lookup, INT_TO_LABEL,
	)

	meta = load_meta(paths["feature_meta.json"])

	# Segment features are joined from network_topology.csv at inference time
	ds = snapshot_download("xpertsystems/cyb011-sample", repo_type="dataset")
	segment_lookup = build_segment_lookup(f"{ds}/network_topology.csv")

	xgb_model = xgb.XGBClassifier(); xgb_model.load_model(paths["model_xgb.json"])

	# Predict (see inference_example.ipynb for the full pattern)
	# Note: do NOT include detection_outcome, detector_confidence_score,
	# or evasion_budget_consumed — those were the outcome leak columns.
	X = transform_single(my_event, meta, segment_lookup=segment_lookup)
	proba = xgb_model.predict_proba(X)[0]
	print(INT_TO_LABEL[int(np.argmax(proba))])
	```

	See [`inference_example.ipynb`](./inference_example.ipynb) for the full
	copy-paste demo.

	## Training data

	Trained on the public sample of CYB011, 14,000 per-timestep records:

	\| Phase \| Events \| Class share \|
	\|---\|---:\|---:\|
	\| `evasion_attempt` \| 7,206 \| 51.5% \|
	\| `idle_dwell` \| 2,450 \| 17.5% \|
	\| `feature_space_probe` \| 1,465 \| 10.5% \|
	\| `campaign_consolidation` \| 829 \| 5.9% \|
	\| `reconnaissance` \| 809 \| 5.8% \|
	\| `perturbation_craft` \| 745 \| 5.3% \|
	\| `feedback_adaptation` \| 496 \| 3.5% \|

	### Group-aware split by campaign_id

	200 campaigns × 70 timesteps each. Timesteps from the same campaign
	share attacker, target segment, and tier — so train/test contamination
	is a real risk with random splitting. The baseline uses
	GroupShuffleSplit on `campaign_id` (nested 70/15/15):

	\| Fold \| Events \| Campaigns \|
	\|---\|---:\|---:\|
	\| Train \| 9,730 \| ~140 \|
	\| Validation \| 2,170 \| ~30 \|
	\| Test \| 2,100 \| ~30 \|

	All 10 multi-seed evaluations yielded all 7 classes in the test fold.
	Class imbalance is addressed with `class_weight='balanced'` (XGBoost
	`sample_weight`) and weighted cross-entropy (MLP).

	## Feature pipeline

	The bundled `feature_engineering.py` is the canonical recipe. 37
	features survive after encoding, drawn from:

	- Per-timestep numeric (5): `timestep`, `perturbation_magnitude`,
	`feature_delta_l2_norm`, `feature_delta_linf_norm`, `query_count_cumulative`
	- Per-timestep categorical (1, one-hot): `attacker_capability_tier`
	(3 values in sample)
	- Segment features (joined from `network_topology.csv`): 8 numeric
	+ 2 categorical (segment_type, defender_architecture)
	- Engineered (5): `progress_frac`, `log_queries`, `perturb_intensity`,
	`defender_weakness`, `query_rate`

	## Evaluation

	### Test-set metrics, seed 42 (n = 2,100 events from ~30 test campaigns)

	XGBoost (the published `model_xgb.json` artifact)

	\| Metric \| Value \|
	\|---\|---:\|
	\| Macro ROC-AUC (OvR) \| 0.9753 \|
	\| Accuracy \| 0.8643 \|
	\| Macro-F1 \| 0.7693 \|
	\| Weighted-F1 \| 0.8703 \|

	MLP (the published `model_mlp.safetensors` artifact)

	\| Metric \| Value \|
	\|---\|---:\|
	\| Macro ROC-AUC (OvR) \| 0.9705 \|
	\| Accuracy \| 0.8386 \|
	\| Macro-F1 \| 0.7345 \|
	\| Weighted-F1 \| 0.8462 \|

	XGBoost slightly outperforms MLP (acc 0.864 vs 0.839, macro-F1 0.769
	vs 0.735). The gap is consistent across seeds.

	### Multi-seed robustness (XGBoost, 10 seeds)

	\| Metric \| Mean \| Std \| Min \| Max \|
	\|---\|---:\|---:\|---:\|---:\|
	\| Accuracy \| 0.867 \| 0.010 \| 0.852 \| 0.884 \|
	\| Macro-F1 \| 0.775 \| 0.012 \| 0.750 \| 0.798 \|
	\| Macro ROC-AUC OvR \| 0.977 \| 0.002 \| 0.973 \| 0.980 \|

	All 10 seeds yielded all 7 classes in the test fold. Full per-seed
	results in [`multi_seed_results.json`](./multi_seed_results.json).

	### Per-class F1 (seed 42)

	\| Phase \| Class share \| XGBoost F1 \| MLP F1 \|
	\|---\|---:\|---:\|---:\|
	\| `evasion_attempt` \| 51.5% \| 0.996 \| 0.993 \|
	\| `reconnaissance` \| 5.8% \| 0.886 \| 0.874 \|
	\| `campaign_consolidation` \| 5.9% \| 0.808 \| 0.785 \|
	\| `feature_space_probe` \| 10.5% \| 0.783 \| 0.747 \|
	\| `feedback_adaptation` \| 3.5% \| 0.715 \| 0.628 \|
	\| `idle_dwell` \| 17.5% \| 0.704 \| 0.619 \|
	\| `perturbation_craft` \| 5.3% \| 0.493 \| 0.497 \|

	`evasion_attempt` is nearly perfectly separable because of its
	distinctive query-usage and perturbation-activity signatures.
	`reconnaissance` and `campaign_consolidation` are well-separated by
	their characteristic timestep ranges. `perturbation_craft` is the
	hardest class (F1 0.49) because its per-timestep features overlap
	heavily with `feature_space_probe` — both involve probing model
	behavior at moderate query counts without submitting a final evasion
	attempt.

	### Ablation: which feature groups matter

	\| Configuration \| Accuracy \| Macro-F1 \| ROC-AUC \| Δ accuracy \| Δ macro-F1 \|
	\|---\|---:\|---:\|---:\|---:\|---:\|
	\| Full feature set (published) \| 0.8643 \| 0.7693 \| 0.9753 \| — \| — \|
	\| No perturbation features \| 0.6595 \| 0.6451 \| 0.8979 \| −0.205 \| −0.124 \|
	\| No query features \| 0.8210 \| 0.7080 \| 0.9669 \| −0.043 \| −0.061 \|
	\| No engineered features \| 0.8590 \| 0.7619 \| 0.9751 \| −0.005 \| −0.007 \|
	\| No tier (one-hot) \| 0.8614 \| 0.7647 \| 0.9752 \| −0.003 \| −0.005 \|
	\| No timestep \| 0.8557 \| 0.7549 \| 0.9696 \| −0.009 \| −0.014 \|
	\| No topology features \| 0.8648 \| 0.7745 \| 0.9760 \| +0.001 \| +0.005 \|

	Three findings:

	1. Perturbation features carry the dominant signal (−20pp accuracy,
	−12pp F1 when removed). `feature_delta_l2_norm`,
	`feature_delta_linf_norm`, and `perturbation_magnitude` directly
	encode whether the attacker is actively perturbing inputs.
	2. Query features are second-strongest (−4pp accuracy, −6pp F1).
	Cumulative query count distinguishes active phases (evasion_attempt,
	probe) from idle phases.
	3. Topology features contribute nothing on this task (+0.1pp
	accuracy when removed). Clean confirmation that the topology
	fingerprint isn't leaking phase information — topology
	fingerprints defender_architecture, not attack_phase.

	### Architecture

	XGBoost: multi-class gradient boosting (`multi:softprob`, 7 classes),
	`hist` tree method, class-balanced sample weights, early stopping on
	validation mlogloss.

	MLP: `37 → 128 → 64 → 7`, each hidden layer followed by `BatchNorm1d`
	→ `ReLU` → `Dropout(0.3)`, weighted cross-entropy loss, AdamW optimizer,
	early stopping on validation macro-F1.

	Training hyperparameters are held internally by XpertSystems.

	## Limitations

	This is a baseline reference, not a production phase classifier.

	1. The leakage diagnostic is required reading. Three direct
	oracle columns for the phase task plus three additional documented
	leaks (timestep partial, stealth_score per-tier, topology
	fingerprint) are in `leakage_diagnostic.json`. If you use CYB011
	sample data for your own training, you MUST drop the three direct
	oracles or your model will learn the oracles instead of the task.

	2. `perturbation_craft` F1 0.49 is the weakest class. This phase's
	per-timestep features overlap heavily with `feature_space_probe`.
	A sequence model considering event ordering within campaigns would
	likely do better than per-timestep classification.

	3. `nation_state` attacker tier is MISSING from the sample. The
	README claims 4 tiers (script_kiddie, opportunistic, APT,
	nation_state). The sample contains only 3 — nation_state events
	are entirely absent. Models trained on this sample cannot
	generalize to nation_state actors.

	4. **Four README-suggested headline targets are unlearnable on the
	sample** after honest leak removal: `campaign_success_flag` (acc
	0.51 vs majority 0.61), `campaign_type` 8-class (acc 0.11 vs 0.17),
	`coordinated_attack_flag` (acc 0.83 vs 0.90 — only 20 positives in
	200 campaigns), and `defender_architecture` 8-class (collapses to
	acc 0.13 when the 7-feature topology fingerprint is dropped).

	5. Per-campaign tasks are structurally limited at n=200. With ~30
	test campaigns per fold, statistical power is limited. The full
	~5,500-campaign product would yield much tighter per-campaign
	metrics.

	6. Synthetic-vs-real transfer. The dataset is synthetic, calibrated
	to 12 benchmarks from MITRE ATLAS / NIST AI 100-2 / OWASP ML Top 10
	/ USENIX / IBM ART / Anthropic-OpenAI red team reports. Real
	adversarial ML telemetry has different noise characteristics, and
	in particular the threshold-encoded `detector_confidence_score`
	and zero-sentinel `evasion_budget_consumed` patterns documented in
	the diagnostic would not be present in real data. Real telemetry
	has continuous, overlapping distributions.

	## Notes on dataset schema

	The CYB011 sample dataset README describes some fields differently
	from the actual schema. The model was trained on the actual schema;
	this note helps buyers reconcile what they read with what they receive.

	\| What the README says \| What the data actually contains \|
	\|---\|---\|
	\| `attack_trajectories` has 18 columns \| Data has 13 columns \|
	\| Field renames \| `adversarial_phase` → `attack_phase`, `attacker_tier` → `attacker_capability_tier`, `perturbation_linf` → `feature_delta_linf_norm`, `perturbation_l2` → `feature_delta_l2_norm`, `queries_used` → `query_count_cumulative` \|
	\| README missing from `attack_trajectories` \| `detector_confidence_score`, `detection_outcome`, `evasion_budget_consumed` are in data but not documented \|
	\| README claims `gradient_access`, `evasion_attempted`, `evasion_succeeded`, `query_budget_remaining`, `defender_detection_strength`, `concept_drift_injected`, `transfer_attack_used`, `stealth_score`, `feature_space_dim` \| None of these columns exist in `attack_trajectories`. `defender_detection_strength`, `feature_space_dim`, and `stealth_score` exist in `network_topology` or `campaign_summary` respectively, not in `attack_trajectories` \|
	\| `attacker_capability_tier` has 4 values \| Data has 3 values — `nation_state` MISSING entirely \|
	\| `attack_phase` 6-phase lifecycle \| Data has 7 phases — adds `idle_dwell` (18% of events) \|
	\| `campaign_summary` has 14 columns \| Data has 25 columns \|
	\| README documents no schema for `network_topology` \| Data has 12 columns \|

	None of these affects model correctness — the feature pipeline uses
	the actual column names. If you build your own pipeline against the
	dataset, use the actual columns.

	## Intended use

	- Evaluating fit of the CYB011 dataset for your adversarial ML
	research
	- Baseline reference for new model architectures on the attack-
	phase classification task
	- Reference example of structural-leakage diagnostics for
	synthetic adversarial ML datasets — the methodology is reusable
	- Feature engineering reference for per-timestep adversarial
	trajectory telemetry

	## Out-of-scope use

	- Production adversarial detection on real ML systems
	- Attacker tier attribution (3-class per-timestep is weak; per-campaign
	is leaky via stealth_score)
	- Defender architecture vulnerability assessment (trivially leaky on
	this sample; collapses when topology fingerprint is dropped)
	- Campaign success prediction (unlearnable on sample)
	- Any nation_state-specific modeling (tier absent from sample)
	- Any operational AI security decision without further validation on
	real adversarial telemetry

	## Reproducibility

	Outputs above were produced with `seed = 42` (published artifact),
	nested `GroupShuffleSplit` on `campaign_id` (70/15/15), on the
	published sample (`xpertsystems/cyb011-sample`, version 1.0.0,
	generated 2026-05-16). The feature pipeline in `feature_engineering.py`
	is deterministic and the trained weights in this repo correspond
	exactly to the metrics above.

	Multi-seed results (seeds 42, 7, 13, 17, 23, 31, 45, 99, 123, 200)
	in `multi_seed_results.json` confirm robust performance across splits
	(std 0.010 on accuracy, 0.002 on ROC-AUC).

	The training script itself is private to XpertSystems.

	## Files in this repo

	\| File \| Purpose \|
	\|---\|---\|
	\| `model_xgb.json` \| XGBoost weights (seed 42) \|
	\| `model_mlp.safetensors` \| PyTorch MLP weights (seed 42) \|
	\| `feature_engineering.py` \| Feature pipeline \|
	\| `feature_meta.json` \| Feature column order + categorical levels \|
	\| `feature_scaler.json` \| MLP input mean/std (XGBoost ignores) \|
	\| `validation_results.json` \| Per-class metrics, confusion matrix, architecture \|
	\| `ablation_results.json` \| Per-feature-group ablation \|
	\| `multi_seed_results.json` \| XGBoost metrics across 10 seeds \|
	\| `leakage_diagnostic.json` \| 6-oracle-path audit + 4 unlearnable targets + missing tier note \|
	\| `inference_example.ipynb` \| End-to-end inference demo notebook \|
	\| `README.md` \| This file \|

	## Contact and full product

	The full CYB011 dataset contains ~383,000 rows across four files,
	with calibrated benchmark validation against 12 metrics drawn from
	authoritative adversarial ML research (MITRE ATLAS, NIST AI 100-2
	Adversarial ML Taxonomy, OWASP ML Top 10, USENIX Security adversarial
	ML papers, IEEE SaTML, Microsoft Counterfit, IBM Adversarial Robustness
	Toolbox, Anthropic / OpenAI red team reports).

	The full XpertSystems.ai synthetic data catalogue spans 41 SKUs across
	Cybersecurity, Healthcare, Insurance & Risk, Oil & Gas, and Materials
	& Energy.

	- 📧 pradeep@xpertsystems.ai
	- 🌐 https://xpertsystems.ai
	- 🗂 Dataset: https://huggingface.co/datasets/xpertsystems/cyb011-sample
	- 🤖 Companion models:
	- https://huggingface.co/xpertsystems/cyb001-baseline-classifier (network traffic)
	- https://huggingface.co/xpertsystems/cyb002-baseline-classifier (ATT&CK kill-chain)
	- https://huggingface.co/xpertsystems/cyb003-baseline-classifier (malware execution phase)
	- https://huggingface.co/xpertsystems/cyb004-baseline-classifier (phishing campaign phase)
	- https://huggingface.co/xpertsystems/cyb005-baseline-classifier (ransomware actor-tier attribution)
	- https://huggingface.co/xpertsystems/cyb006-baseline-classifier (user risk tier + leakage diagnostic)
	- https://huggingface.co/xpertsystems/cyb007-baseline-classifier (insider threat type)
	- https://huggingface.co/xpertsystems/cyb008-baseline-classifier (SOC alert triage + leakage diagnostic)
	- https://huggingface.co/xpertsystems/cyb009-baseline-classifier (vulnerability classification + leakage diagnostic)
	- https://huggingface.co/xpertsystems/cyb010-baseline-classifier (attack lifecycle phase + leakage diagnostic)

	## Citation

	```bibtex
	@misc{xpertsystems_cyb011_baseline_2026,
	title = {CYB011 Baseline Classifier: XGBoost and MLP for Adversarial Attack Phase Classification, with 6-Oracle-Path Leakage Diagnostic},
	author = {XpertSystems.ai},
	year = {2026},
	url = {https://huggingface.co/xpertsystems/cyb011-baseline-classifier},
	note = {Baseline reference model + leakage audit trained on xpertsystems/cyb011-sample}
	}
	```