pradeep-xpert's picture
Initial release: attack_phase 7-class baseline + 6-oracle-path leakage diagnostic + missing tier note
03d64e5 verified
---
license: cc-by-nc-4.0
library_name: pytorch
tags:
- cybersecurity
- adversarial-machine-learning
- ai-security
- adversarial-attacks
- evasion-attacks
- apt
- tabular-classification
- synthetic-data
- xgboost
- baseline
- leakage-diagnostic
pipeline_tag: tabular-classification
base_model: []
datasets:
- xpertsystems/cyb011-sample
metrics:
- accuracy
- f1
- roc_auc
model-index:
- name: cyb011-baseline-classifier
results:
- task:
type: tabular-classification
name: 7-class adversarial attack phase classification
dataset:
type: xpertsystems/cyb011-sample
name: CYB011 Synthetic AI Evasion Attack Trajectory Dataset (Sample)
metrics:
- type: roc_auc
value: 0.9753
name: Test macro ROC-AUC OvR (XGBoost, seed 42)
- type: accuracy
value: 0.8643
name: Test accuracy (XGBoost, seed 42)
- type: f1
value: 0.7693
name: Test macro-F1 (XGBoost, seed 42)
- type: accuracy
value: 0.867
name: Multi-seed accuracy mean ± 0.010 (XGBoost, 10 seeds)
- type: roc_auc
value: 0.977
name: Multi-seed ROC-AUC mean ± 0.002 (XGBoost, 10 seeds)
---
# CYB011 Baseline Classifier
**Adversarial attack phase classifier (7-class) trained on the CYB011
synthetic AI evasion attack trajectory sample. Predicts which of 7
attack phases (`reconnaissance` / `feature_space_probe` /
`perturbation_craft` / `evasion_attempt` / `feedback_adaptation` /
`campaign_consolidation` / `idle_dwell`) a per-timestep trajectory
event belongs to, from per-event features. ALSO ships a comprehensive
`leakage_diagnostic.json` documenting 6 oracle paths discovered
across the dataset's targets, 4 README-suggested targets that are
unlearnable on the sample after honest leak removal, and the missing
`nation_state` attacker tier.**
> **Read this first.** This repo ships two related artifacts:
> (1) a working baseline classifier for `attack_phase` (the dataset's
> headline target), and (2) `leakage_diagnostic.json` documenting 6
> separate oracle paths, 4 unlearnable targets, and one missing
> attacker tier. Both files matter; the diagnostic is required reading
> for anyone evaluating CYB011 for adversarial ML research.
## Model overview
| Property | Value |
|---|---|
| Primary task | 7-class `attack_phase` classification |
| Secondary artifact | `leakage_diagnostic.json` — 6 oracle paths + 4 unlearnable targets |
| Training data | `xpertsystems/cyb011-sample` (14,000 events / 200 campaigns) |
| Models | XGBoost + PyTorch MLP |
| Input features | 37 (after one-hot encoding) |
| Split | **Group-aware** (GroupShuffleSplit on `campaign_id`) |
| Validation | Single seed (artifact) + multi-seed aggregate across 10 seeds |
| License | CC-BY-NC-4.0 (matches dataset) |
| Status | Reference baseline + comprehensive leakage diagnostic |
## Why this task — and what was dropped
The CYB011 README describes a "6-phase adversarial state machine."
The actual sample data contains **7 phases** — it adds `idle_dwell`
as a class (18% of all events, the second-largest class). The
published baseline trains on all 7.
We piloted nine candidate targets and found:
- **`attack_phase` 7-class**: strongest honest result. Acc 0.867 ±
0.010, ROC-AUC 0.977 ± 0.002 (multi-seed). All 7 classes
represented, per-class F1 range 0.49–1.00.
- **`attacker_capability_tier` 3-class (per-timestep)**: weak honest
result (acc 0.68, mF1 0.64). The 3 tiers do not strongly
distinguish each other at the per-timestep level — feature means
are within ~1% across tiers.
- **`attacker_capability_tier` 3-class (per-campaign)**: hits acc 0.94
but is structurally inflated by `stealth_score` leakage
(near-deterministic ranges per tier). Documented in the diagnostic.
- **`detection_outcome` 4-class**: hits 100% trivially via
`detector_confidence_score` thresholds. Pure oracle.
- **`defender_architecture` 8-class**: hits 100% trivially via the
topology fingerprint (7 segment features uniquely identify each
architecture). Collapses to acc 0.13 vs majority 0.17 when the
fingerprint is dropped.
- **`campaign_success_flag` / `campaign_type` / `coordinated_attack_flag`**:
all below majority baseline at n=200 campaigns.
### Three oracle columns dropped from features
The phase task has three direct outcome-leak columns. Each is a perfect
or near-perfect oracle for specific phases:
| Column | Oracle relationship |
|---|---|
| `detection_outcome` | `!= suppressed_alert` → 100% `evasion_attempt` phase |
| `detector_confidence_score` | Threshold-derived from `detection_outcome` (<0.25 → evasion_success, [0.52,0.78] → marginal, ≥0.78 → high_confidence) |
| `evasion_budget_consumed` | `== 0` → 100% one of 3 early phases (reconnaissance, feature_space_probe, perturbation_craft) |
With these three columns present, a plain XGBoost achieves 100%
accuracy. The published baseline trains with all three excluded.
### `timestep` kept as a legitimate observable
`timestep` is a partial oracle for 3 phases (reconnaissance is
always timestep 1-7, feedback_adaptation is 63-66, campaign_consolidation
is 65-70). It's **kept** in the feature set because campaign-progress
position is a real observable a defender would have at decision time
— it's not encoding the label, it's encoding the lifecycle position.
Removing `timestep` drops headline accuracy by ~9pp (0.87 → 0.78).
Documented in the diagnostic for transparency.
Two model artifacts are published. They are designed to be used
together:
- `model_xgb.json` — gradient-boosted trees (higher F1)
- `model_mlp.safetensors` — PyTorch MLP
## Quick start
```bash
pip install xgboost torch safetensors pandas huggingface_hub
```
```python
from huggingface_hub import hf_hub_download, snapshot_download
import json, numpy as np, torch, xgboost as xgb
from safetensors.torch import load_file
REPO = "xpertsystems/cyb011-baseline-classifier"
paths = {n: hf_hub_download(REPO, n) for n in [
"model_xgb.json", "model_mlp.safetensors",
"feature_engineering.py", "feature_meta.json", "feature_scaler.json",
]}
import sys, os
sys.path.insert(0, os.path.dirname(paths["feature_engineering.py"]))
from feature_engineering import (
transform_single, load_meta, build_segment_lookup, INT_TO_LABEL,
)
meta = load_meta(paths["feature_meta.json"])
# Segment features are joined from network_topology.csv at inference time
ds = snapshot_download("xpertsystems/cyb011-sample", repo_type="dataset")
segment_lookup = build_segment_lookup(f"{ds}/network_topology.csv")
xgb_model = xgb.XGBClassifier(); xgb_model.load_model(paths["model_xgb.json"])
# Predict (see inference_example.ipynb for the full pattern)
# Note: do NOT include detection_outcome, detector_confidence_score,
# or evasion_budget_consumed — those were the outcome leak columns.
X = transform_single(my_event, meta, segment_lookup=segment_lookup)
proba = xgb_model.predict_proba(X)[0]
print(INT_TO_LABEL[int(np.argmax(proba))])
```
See [`inference_example.ipynb`](./inference_example.ipynb) for the full
copy-paste demo.
## Training data
Trained on the public sample of CYB011, 14,000 per-timestep records:
| Phase | Events | Class share |
|---|---:|---:|
| `evasion_attempt` | 7,206 | 51.5% |
| `idle_dwell` | 2,450 | 17.5% |
| `feature_space_probe` | 1,465 | 10.5% |
| `campaign_consolidation` | 829 | 5.9% |
| `reconnaissance` | 809 | 5.8% |
| `perturbation_craft` | 745 | 5.3% |
| `feedback_adaptation` | 496 | 3.5% |
### Group-aware split by campaign_id
200 campaigns × 70 timesteps each. Timesteps from the same campaign
share attacker, target segment, and tier — so train/test contamination
is a real risk with random splitting. The baseline uses
**GroupShuffleSplit** on `campaign_id` (nested 70/15/15):
| Fold | Events | Campaigns |
|---|---:|---:|
| Train | 9,730 | ~140 |
| Validation | 2,170 | ~30 |
| Test | 2,100 | ~30 |
All 10 multi-seed evaluations yielded all 7 classes in the test fold.
Class imbalance is addressed with `class_weight='balanced'` (XGBoost
`sample_weight`) and weighted cross-entropy (MLP).
## Feature pipeline
The bundled `feature_engineering.py` is the canonical recipe. 37
features survive after encoding, drawn from:
- **Per-timestep numeric** (5): `timestep`, `perturbation_magnitude`,
`feature_delta_l2_norm`, `feature_delta_linf_norm`, `query_count_cumulative`
- **Per-timestep categorical** (1, one-hot): `attacker_capability_tier`
(3 values in sample)
- **Segment features** (joined from `network_topology.csv`): 8 numeric
+ 2 categorical (segment_type, defender_architecture)
- **Engineered** (5): `progress_frac`, `log_queries`, `perturb_intensity`,
`defender_weakness`, `query_rate`
## Evaluation
### Test-set metrics, seed 42 (n = 2,100 events from ~30 test campaigns)
**XGBoost** (the published `model_xgb.json` artifact)
| Metric | Value |
|---|---:|
| Macro ROC-AUC (OvR) | **0.9753** |
| Accuracy | **0.8643** |
| Macro-F1 | 0.7693 |
| Weighted-F1 | 0.8703 |
**MLP** (the published `model_mlp.safetensors` artifact)
| Metric | Value |
|---|---:|
| Macro ROC-AUC (OvR) | **0.9705** |
| Accuracy | **0.8386** |
| Macro-F1 | 0.7345 |
| Weighted-F1 | 0.8462 |
XGBoost slightly outperforms MLP (acc 0.864 vs 0.839, macro-F1 0.769
vs 0.735). The gap is consistent across seeds.
### Multi-seed robustness (XGBoost, 10 seeds)
| Metric | Mean | Std | Min | Max |
|---|---:|---:|---:|---:|
| Accuracy | 0.867 | 0.010 | 0.852 | 0.884 |
| Macro-F1 | 0.775 | 0.012 | 0.750 | 0.798 |
| Macro ROC-AUC OvR | 0.977 | 0.002 | 0.973 | 0.980 |
All 10 seeds yielded all 7 classes in the test fold. Full per-seed
results in [`multi_seed_results.json`](./multi_seed_results.json).
### Per-class F1 (seed 42)
| Phase | Class share | XGBoost F1 | MLP F1 |
|---|---:|---:|---:|
| `evasion_attempt` | 51.5% | **0.996** | 0.993 |
| `reconnaissance` | 5.8% | **0.886** | 0.874 |
| `campaign_consolidation` | 5.9% | 0.808 | 0.785 |
| `feature_space_probe` | 10.5% | 0.783 | 0.747 |
| `feedback_adaptation` | 3.5% | 0.715 | 0.628 |
| `idle_dwell` | 17.5% | 0.704 | 0.619 |
| `perturbation_craft` | 5.3% | **0.493** | 0.497 |
`evasion_attempt` is nearly perfectly separable because of its
distinctive query-usage and perturbation-activity signatures.
`reconnaissance` and `campaign_consolidation` are well-separated by
their characteristic timestep ranges. `perturbation_craft` is the
hardest class (F1 0.49) because its per-timestep features overlap
heavily with `feature_space_probe` — both involve probing model
behavior at moderate query counts without submitting a final evasion
attempt.
### Ablation: which feature groups matter
| Configuration | Accuracy | Macro-F1 | ROC-AUC | Δ accuracy | Δ macro-F1 |
|---|---:|---:|---:|---:|---:|
| Full feature set (published) | 0.8643 | 0.7693 | 0.9753 | — | — |
| No perturbation features | 0.6595 | 0.6451 | 0.8979 | **−0.205** | **−0.124** |
| No query features | 0.8210 | 0.7080 | 0.9669 | −0.043 | −0.061 |
| No engineered features | 0.8590 | 0.7619 | 0.9751 | −0.005 | −0.007 |
| No tier (one-hot) | 0.8614 | 0.7647 | 0.9752 | −0.003 | −0.005 |
| No timestep | 0.8557 | 0.7549 | 0.9696 | −0.009 | −0.014 |
| No topology features | 0.8648 | 0.7745 | 0.9760 | +0.001 | +0.005 |
Three findings:
1. **Perturbation features carry the dominant signal** (−20pp accuracy,
−12pp F1 when removed). `feature_delta_l2_norm`,
`feature_delta_linf_norm`, and `perturbation_magnitude` directly
encode whether the attacker is actively perturbing inputs.
2. **Query features are second-strongest** (−4pp accuracy, −6pp F1).
Cumulative query count distinguishes active phases (evasion_attempt,
probe) from idle phases.
3. **Topology features contribute nothing on this task** (+0.1pp
accuracy when removed). Clean confirmation that the topology
fingerprint isn't leaking phase information — topology
fingerprints defender_architecture, not attack_phase.
### Architecture
**XGBoost:** multi-class gradient boosting (`multi:softprob`, 7 classes),
`hist` tree method, class-balanced sample weights, early stopping on
validation mlogloss.
**MLP:** `37 → 128 → 64 → 7`, each hidden layer followed by `BatchNorm1d`
→ `ReLU` → `Dropout(0.3)`, weighted cross-entropy loss, AdamW optimizer,
early stopping on validation macro-F1.
Training hyperparameters are held internally by XpertSystems.
## Limitations
**This is a baseline reference, not a production phase classifier.**
1. **The leakage diagnostic is required reading.** Three direct
oracle columns for the phase task plus three additional documented
leaks (timestep partial, stealth_score per-tier, topology
fingerprint) are in `leakage_diagnostic.json`. If you use CYB011
sample data for your own training, you MUST drop the three direct
oracles or your model will learn the oracles instead of the task.
2. **`perturbation_craft` F1 0.49 is the weakest class.** This phase's
per-timestep features overlap heavily with `feature_space_probe`.
A sequence model considering event ordering within campaigns would
likely do better than per-timestep classification.
3. **`nation_state` attacker tier is MISSING from the sample.** The
README claims 4 tiers (script_kiddie, opportunistic, APT,
nation_state). The sample contains only 3 — nation_state events
are entirely absent. Models trained on this sample cannot
generalize to nation_state actors.
4. **Four README-suggested headline targets are unlearnable on the
sample** after honest leak removal: `campaign_success_flag` (acc
0.51 vs majority 0.61), `campaign_type` 8-class (acc 0.11 vs 0.17),
`coordinated_attack_flag` (acc 0.83 vs 0.90 — only 20 positives in
200 campaigns), and `defender_architecture` 8-class (collapses to
acc 0.13 when the 7-feature topology fingerprint is dropped).
5. **Per-campaign tasks are structurally limited at n=200.** With ~30
test campaigns per fold, statistical power is limited. The full
~5,500-campaign product would yield much tighter per-campaign
metrics.
6. **Synthetic-vs-real transfer.** The dataset is synthetic, calibrated
to 12 benchmarks from MITRE ATLAS / NIST AI 100-2 / OWASP ML Top 10
/ USENIX / IBM ART / Anthropic-OpenAI red team reports. Real
adversarial ML telemetry has different noise characteristics, and
in particular the threshold-encoded `detector_confidence_score`
and zero-sentinel `evasion_budget_consumed` patterns documented in
the diagnostic would not be present in real data. Real telemetry
has continuous, overlapping distributions.
## Notes on dataset schema
The CYB011 sample dataset README describes some fields differently
from the actual schema. The model was trained on the actual schema;
this note helps buyers reconcile what they read with what they receive.
| What the README says | What the data actually contains |
|---|---|
| `attack_trajectories` has 18 columns | Data has **13 columns** |
| Field renames | `adversarial_phase``attack_phase`, `attacker_tier``attacker_capability_tier`, `perturbation_linf``feature_delta_linf_norm`, `perturbation_l2``feature_delta_l2_norm`, `queries_used``query_count_cumulative` |
| README missing from `attack_trajectories` | `detector_confidence_score`, `detection_outcome`, `evasion_budget_consumed` are in data but not documented |
| README claims `gradient_access`, `evasion_attempted`, `evasion_succeeded`, `query_budget_remaining`, `defender_detection_strength`, `concept_drift_injected`, `transfer_attack_used`, `stealth_score`, `feature_space_dim` | None of these columns exist in `attack_trajectories`. `defender_detection_strength`, `feature_space_dim`, and `stealth_score` exist in `network_topology` or `campaign_summary` respectively, not in `attack_trajectories` |
| `attacker_capability_tier` has 4 values | Data has **3 values**`nation_state` MISSING entirely |
| `attack_phase` 6-phase lifecycle | Data has **7 phases** — adds `idle_dwell` (18% of events) |
| `campaign_summary` has 14 columns | Data has **25 columns** |
| README documents no schema for `network_topology` | Data has **12 columns** |
None of these affects model correctness — the feature pipeline uses
the actual column names. If you build your own pipeline against the
dataset, use the actual columns.
## Intended use
- **Evaluating fit** of the CYB011 dataset for your adversarial ML
research
- **Baseline reference** for new model architectures on the attack-
phase classification task
- **Reference example of structural-leakage diagnostics** for
synthetic adversarial ML datasets — the methodology is reusable
- **Feature engineering reference** for per-timestep adversarial
trajectory telemetry
## Out-of-scope use
- Production adversarial detection on real ML systems
- Attacker tier attribution (3-class per-timestep is weak; per-campaign
is leaky via stealth_score)
- Defender architecture vulnerability assessment (trivially leaky on
this sample; collapses when topology fingerprint is dropped)
- Campaign success prediction (unlearnable on sample)
- Any nation_state-specific modeling (tier absent from sample)
- Any operational AI security decision without further validation on
real adversarial telemetry
## Reproducibility
Outputs above were produced with `seed = 42` (published artifact),
nested `GroupShuffleSplit` on `campaign_id` (70/15/15), on the
published sample (`xpertsystems/cyb011-sample`, version 1.0.0,
generated 2026-05-16). The feature pipeline in `feature_engineering.py`
is deterministic and the trained weights in this repo correspond
exactly to the metrics above.
Multi-seed results (seeds 42, 7, 13, 17, 23, 31, 45, 99, 123, 200)
in `multi_seed_results.json` confirm robust performance across splits
(std 0.010 on accuracy, 0.002 on ROC-AUC).
The training script itself is private to XpertSystems.
## Files in this repo
| File | Purpose |
|---|---|
| `model_xgb.json` | XGBoost weights (seed 42) |
| `model_mlp.safetensors` | PyTorch MLP weights (seed 42) |
| `feature_engineering.py` | Feature pipeline |
| `feature_meta.json` | Feature column order + categorical levels |
| `feature_scaler.json` | MLP input mean/std (XGBoost ignores) |
| `validation_results.json` | Per-class metrics, confusion matrix, architecture |
| `ablation_results.json` | Per-feature-group ablation |
| `multi_seed_results.json` | XGBoost metrics across 10 seeds |
| **`leakage_diagnostic.json`** | **6-oracle-path audit + 4 unlearnable targets + missing tier note** |
| `inference_example.ipynb` | End-to-end inference demo notebook |
| `README.md` | This file |
## Contact and full product
The full **CYB011** dataset contains **~383,000 rows** across four files,
with calibrated benchmark validation against 12 metrics drawn from
authoritative adversarial ML research (MITRE ATLAS, NIST AI 100-2
Adversarial ML Taxonomy, OWASP ML Top 10, USENIX Security adversarial
ML papers, IEEE SaTML, Microsoft Counterfit, IBM Adversarial Robustness
Toolbox, Anthropic / OpenAI red team reports).
The full XpertSystems.ai synthetic data catalogue spans 41 SKUs across
Cybersecurity, Healthcare, Insurance & Risk, Oil & Gas, and Materials
& Energy.
- 📧 **pradeep@xpertsystems.ai**
- 🌐 **https://xpertsystems.ai**
- 🗂 Dataset: https://huggingface.co/datasets/xpertsystems/cyb011-sample
- 🤖 Companion models:
- https://huggingface.co/xpertsystems/cyb001-baseline-classifier (network traffic)
- https://huggingface.co/xpertsystems/cyb002-baseline-classifier (ATT&CK kill-chain)
- https://huggingface.co/xpertsystems/cyb003-baseline-classifier (malware execution phase)
- https://huggingface.co/xpertsystems/cyb004-baseline-classifier (phishing campaign phase)
- https://huggingface.co/xpertsystems/cyb005-baseline-classifier (ransomware actor-tier attribution)
- https://huggingface.co/xpertsystems/cyb006-baseline-classifier (user risk tier + leakage diagnostic)
- https://huggingface.co/xpertsystems/cyb007-baseline-classifier (insider threat type)
- https://huggingface.co/xpertsystems/cyb008-baseline-classifier (SOC alert triage + leakage diagnostic)
- https://huggingface.co/xpertsystems/cyb009-baseline-classifier (vulnerability classification + leakage diagnostic)
- https://huggingface.co/xpertsystems/cyb010-baseline-classifier (attack lifecycle phase + leakage diagnostic)
## Citation
```bibtex
@misc{xpertsystems_cyb011_baseline_2026,
title = {CYB011 Baseline Classifier: XGBoost and MLP for Adversarial Attack Phase Classification, with 6-Oracle-Path Leakage Diagnostic},
author = {XpertSystems.ai},
year = {2026},
url = {https://huggingface.co/xpertsystems/cyb011-baseline-classifier},
note = {Baseline reference model + leakage audit trained on xpertsystems/cyb011-sample}
}
```