File size: 20,977 Bytes

03d64e5

---
license: cc-by-nc-4.0
library_name: pytorch
tags:
  - cybersecurity
  - adversarial-machine-learning
  - ai-security
  - adversarial-attacks
  - evasion-attacks
  - apt
  - tabular-classification
  - synthetic-data
  - xgboost
  - baseline
  - leakage-diagnostic
pipeline_tag: tabular-classification
base_model: []
datasets:
  - xpertsystems/cyb011-sample
metrics:
  - accuracy
  - f1
  - roc_auc
model-index:
  - name: cyb011-baseline-classifier
    results:
      - task:
          type: tabular-classification
          name: 7-class adversarial attack phase classification
        dataset:
          type: xpertsystems/cyb011-sample
          name: CYB011 Synthetic AI Evasion Attack Trajectory Dataset (Sample)
        metrics:
          - type: roc_auc
            value: 0.9753
            name: Test macro ROC-AUC OvR (XGBoost, seed 42)
          - type: accuracy
            value: 0.8643
            name: Test accuracy (XGBoost, seed 42)
          - type: f1
            value: 0.7693
            name: Test macro-F1 (XGBoost, seed 42)
          - type: accuracy
            value: 0.867
            name: Multi-seed accuracy mean ± 0.010 (XGBoost, 10 seeds)
          - type: roc_auc
            value: 0.977
            name: Multi-seed ROC-AUC mean ± 0.002 (XGBoost, 10 seeds)
---

# CYB011 Baseline Classifier

**Adversarial attack phase classifier (7-class) trained on the CYB011
synthetic AI evasion attack trajectory sample. Predicts which of 7
attack phases (`reconnaissance` / `feature_space_probe` /
`perturbation_craft` / `evasion_attempt` / `feedback_adaptation` /
`campaign_consolidation` / `idle_dwell`) a per-timestep trajectory
event belongs to, from per-event features. ALSO ships a comprehensive
`leakage_diagnostic.json` documenting 6 oracle paths discovered
across the dataset's targets, 4 README-suggested targets that are
unlearnable on the sample after honest leak removal, and the missing
`nation_state` attacker tier.**

> **Read this first.** This repo ships two related artifacts:
> (1) a working baseline classifier for `attack_phase` (the dataset's
> headline target), and (2) `leakage_diagnostic.json` documenting 6
> separate oracle paths, 4 unlearnable targets, and one missing
> attacker tier. Both files matter; the diagnostic is required reading
> for anyone evaluating CYB011 for adversarial ML research.

## Model overview

| Property | Value |
|---|---|
| Primary task | 7-class `attack_phase` classification |
| Secondary artifact | `leakage_diagnostic.json` — 6 oracle paths + 4 unlearnable targets |
| Training data | `xpertsystems/cyb011-sample` (14,000 events / 200 campaigns) |
| Models | XGBoost + PyTorch MLP |
| Input features | 37 (after one-hot encoding) |
| Split | **Group-aware** (GroupShuffleSplit on `campaign_id`) |
| Validation | Single seed (artifact) + multi-seed aggregate across 10 seeds |
| License | CC-BY-NC-4.0 (matches dataset) |
| Status | Reference baseline + comprehensive leakage diagnostic |

## Why this task — and what was dropped

The CYB011 README describes a "6-phase adversarial state machine."
The actual sample data contains **7 phases** — it adds `idle_dwell`
as a class (18% of all events, the second-largest class). The
published baseline trains on all 7.

We piloted nine candidate targets and found:

- **`attack_phase` 7-class**: strongest honest result. Acc 0.867 ±
  0.010, ROC-AUC 0.977 ± 0.002 (multi-seed). All 7 classes
  represented, per-class F1 range 0.49–1.00.

- **`attacker_capability_tier` 3-class (per-timestep)**: weak honest
  result (acc 0.68, mF1 0.64). The 3 tiers do not strongly
  distinguish each other at the per-timestep level — feature means
  are within ~1% across tiers.

- **`attacker_capability_tier` 3-class (per-campaign)**: hits acc 0.94
  but is structurally inflated by `stealth_score` leakage
  (near-deterministic ranges per tier). Documented in the diagnostic.

- **`detection_outcome` 4-class**: hits 100% trivially via
  `detector_confidence_score` thresholds. Pure oracle.

- **`defender_architecture` 8-class**: hits 100% trivially via the
  topology fingerprint (7 segment features uniquely identify each
  architecture). Collapses to acc 0.13 vs majority 0.17 when the
  fingerprint is dropped.

- **`campaign_success_flag` / `campaign_type` / `coordinated_attack_flag`**:
  all below majority baseline at n=200 campaigns.

### Three oracle columns dropped from features

The phase task has three direct outcome-leak columns. Each is a perfect
or near-perfect oracle for specific phases:

| Column | Oracle relationship |
|---|---|
| `detection_outcome` | `!= suppressed_alert` → 100% `evasion_attempt` phase |
| `detector_confidence_score` | Threshold-derived from `detection_outcome` (<0.25 → evasion_success, [0.52,0.78] → marginal, ≥0.78 → high_confidence) |
| `evasion_budget_consumed` | `== 0` → 100% one of 3 early phases (reconnaissance, feature_space_probe, perturbation_craft) |

With these three columns present, a plain XGBoost achieves 100%
accuracy. The published baseline trains with all three excluded.

### `timestep` kept as a legitimate observable

`timestep` is a partial oracle for 3 phases (reconnaissance is
always timestep 1-7, feedback_adaptation is 63-66, campaign_consolidation
is 65-70). It's **kept** in the feature set because campaign-progress
position is a real observable a defender would have at decision time
— it's not encoding the label, it's encoding the lifecycle position.

Removing `timestep` drops headline accuracy by ~9pp (0.87 → 0.78).
Documented in the diagnostic for transparency.

Two model artifacts are published. They are designed to be used
together:

- `model_xgb.json` — gradient-boosted trees (higher F1)
- `model_mlp.safetensors` — PyTorch MLP

## Quick start

```bash
pip install xgboost torch safetensors pandas huggingface_hub
```

```python
from huggingface_hub import hf_hub_download, snapshot_download
import json, numpy as np, torch, xgboost as xgb
from safetensors.torch import load_file

REPO = "xpertsystems/cyb011-baseline-classifier"

paths = {n: hf_hub_download(REPO, n) for n in [
    "model_xgb.json", "model_mlp.safetensors",
    "feature_engineering.py", "feature_meta.json", "feature_scaler.json",
]}

import sys, os
sys.path.insert(0, os.path.dirname(paths["feature_engineering.py"]))
from feature_engineering import (
    transform_single, load_meta, build_segment_lookup, INT_TO_LABEL,
)

meta = load_meta(paths["feature_meta.json"])

# Segment features are joined from network_topology.csv at inference time
ds = snapshot_download("xpertsystems/cyb011-sample", repo_type="dataset")
segment_lookup = build_segment_lookup(f"{ds}/network_topology.csv")

xgb_model = xgb.XGBClassifier(); xgb_model.load_model(paths["model_xgb.json"])

# Predict (see inference_example.ipynb for the full pattern)
# Note: do NOT include detection_outcome, detector_confidence_score,
# or evasion_budget_consumed — those were the outcome leak columns.
X = transform_single(my_event, meta, segment_lookup=segment_lookup)
proba = xgb_model.predict_proba(X)[0]
print(INT_TO_LABEL[int(np.argmax(proba))])
```

See [`inference_example.ipynb`](./inference_example.ipynb) for the full
copy-paste demo.

## Training data

Trained on the public sample of CYB011, 14,000 per-timestep records:

| Phase | Events | Class share |
|---|---:|---:|
| `evasion_attempt` | 7,206 | 51.5% |
| `idle_dwell` | 2,450 | 17.5% |
| `feature_space_probe` | 1,465 | 10.5% |
| `campaign_consolidation` | 829 | 5.9% |
| `reconnaissance` | 809 | 5.8% |
| `perturbation_craft` | 745 | 5.3% |
| `feedback_adaptation` | 496 | 3.5% |

### Group-aware split by campaign_id

200 campaigns × 70 timesteps each. Timesteps from the same campaign
share attacker, target segment, and tier — so train/test contamination
is a real risk with random splitting. The baseline uses
**GroupShuffleSplit** on `campaign_id` (nested 70/15/15):

| Fold | Events | Campaigns |
|---|---:|---:|
| Train | 9,730 | ~140 |
| Validation | 2,170 | ~30 |
| Test | 2,100 | ~30 |

All 10 multi-seed evaluations yielded all 7 classes in the test fold.
Class imbalance is addressed with `class_weight='balanced'` (XGBoost
`sample_weight`) and weighted cross-entropy (MLP).

## Feature pipeline

The bundled `feature_engineering.py` is the canonical recipe. 37
features survive after encoding, drawn from:

- **Per-timestep numeric** (5): `timestep`, `perturbation_magnitude`,
  `feature_delta_l2_norm`, `feature_delta_linf_norm`, `query_count_cumulative`
- **Per-timestep categorical** (1, one-hot): `attacker_capability_tier`
  (3 values in sample)
- **Segment features** (joined from `network_topology.csv`): 8 numeric
  + 2 categorical (segment_type, defender_architecture)
- **Engineered** (5): `progress_frac`, `log_queries`, `perturb_intensity`,
  `defender_weakness`, `query_rate`

## Evaluation

### Test-set metrics, seed 42 (n = 2,100 events from ~30 test campaigns)

**XGBoost** (the published `model_xgb.json` artifact)

| Metric | Value |
|---|---:|
| Macro ROC-AUC (OvR) | **0.9753** |
| Accuracy | **0.8643** |
| Macro-F1 | 0.7693 |
| Weighted-F1 | 0.8703 |

**MLP** (the published `model_mlp.safetensors` artifact)

| Metric | Value |
|---|---:|
| Macro ROC-AUC (OvR) | **0.9705** |
| Accuracy | **0.8386** |
| Macro-F1 | 0.7345 |
| Weighted-F1 | 0.8462 |

XGBoost slightly outperforms MLP (acc 0.864 vs 0.839, macro-F1 0.769
vs 0.735). The gap is consistent across seeds.

### Multi-seed robustness (XGBoost, 10 seeds)

| Metric | Mean | Std | Min | Max |
|---|---:|---:|---:|---:|
| Accuracy | 0.867 | 0.010 | 0.852 | 0.884 |
| Macro-F1 | 0.775 | 0.012 | 0.750 | 0.798 |
| Macro ROC-AUC OvR | 0.977 | 0.002 | 0.973 | 0.980 |

All 10 seeds yielded all 7 classes in the test fold. Full per-seed
results in [`multi_seed_results.json`](./multi_seed_results.json).

### Per-class F1 (seed 42)

| Phase | Class share | XGBoost F1 | MLP F1 |
|---|---:|---:|---:|
| `evasion_attempt` | 51.5% | **0.996** | 0.993 |
| `reconnaissance` | 5.8% | **0.886** | 0.874 |
| `campaign_consolidation` | 5.9% | 0.808 | 0.785 |
| `feature_space_probe` | 10.5% | 0.783 | 0.747 |
| `feedback_adaptation` | 3.5% | 0.715 | 0.628 |
| `idle_dwell` | 17.5% | 0.704 | 0.619 |
| `perturbation_craft` | 5.3% | **0.493** | 0.497 |

`evasion_attempt` is nearly perfectly separable because of its
distinctive query-usage and perturbation-activity signatures.
`reconnaissance` and `campaign_consolidation` are well-separated by
their characteristic timestep ranges. `perturbation_craft` is the
hardest class (F1 0.49) because its per-timestep features overlap
heavily with `feature_space_probe` — both involve probing model
behavior at moderate query counts without submitting a final evasion
attempt.

### Ablation: which feature groups matter

| Configuration | Accuracy | Macro-F1 | ROC-AUC | Δ accuracy | Δ macro-F1 |
|---|---:|---:|---:|---:|---:|
| Full feature set (published) | 0.8643 | 0.7693 | 0.9753 | — | — |
| No perturbation features | 0.6595 | 0.6451 | 0.8979 | **−0.205** | **−0.124** |
| No query features | 0.8210 | 0.7080 | 0.9669 | −0.043 | −0.061 |
| No engineered features | 0.8590 | 0.7619 | 0.9751 | −0.005 | −0.007 |
| No tier (one-hot) | 0.8614 | 0.7647 | 0.9752 | −0.003 | −0.005 |
| No timestep | 0.8557 | 0.7549 | 0.9696 | −0.009 | −0.014 |
| No topology features | 0.8648 | 0.7745 | 0.9760 | +0.001 | +0.005 |

Three findings:

1. **Perturbation features carry the dominant signal** (−20pp accuracy,
   −12pp F1 when removed). `feature_delta_l2_norm`,
   `feature_delta_linf_norm`, and `perturbation_magnitude` directly
   encode whether the attacker is actively perturbing inputs.
2. **Query features are second-strongest** (−4pp accuracy, −6pp F1).
   Cumulative query count distinguishes active phases (evasion_attempt,
   probe) from idle phases.
3. **Topology features contribute nothing on this task** (+0.1pp
   accuracy when removed). Clean confirmation that the topology
   fingerprint isn't leaking phase information — topology
   fingerprints defender_architecture, not attack_phase.

### Architecture

**XGBoost:** multi-class gradient boosting (`multi:softprob`, 7 classes),
`hist` tree method, class-balanced sample weights, early stopping on
validation mlogloss.

**MLP:** `37 → 128 → 64 → 7`, each hidden layer followed by `BatchNorm1d`
→ `ReLU` → `Dropout(0.3)`, weighted cross-entropy loss, AdamW optimizer,
early stopping on validation macro-F1.

Training hyperparameters are held internally by XpertSystems.

## Limitations

**This is a baseline reference, not a production phase classifier.**

1. **The leakage diagnostic is required reading.** Three direct
   oracle columns for the phase task plus three additional documented
   leaks (timestep partial, stealth_score per-tier, topology
   fingerprint) are in `leakage_diagnostic.json`. If you use CYB011
   sample data for your own training, you MUST drop the three direct
   oracles or your model will learn the oracles instead of the task.

2. **`perturbation_craft` F1 0.49 is the weakest class.** This phase's
   per-timestep features overlap heavily with `feature_space_probe`.
   A sequence model considering event ordering within campaigns would
   likely do better than per-timestep classification.

3. **`nation_state` attacker tier is MISSING from the sample.** The
   README claims 4 tiers (script_kiddie, opportunistic, APT,
   nation_state). The sample contains only 3 — nation_state events
   are entirely absent. Models trained on this sample cannot
   generalize to nation_state actors.

4. **Four README-suggested headline targets are unlearnable on the
   sample** after honest leak removal: `campaign_success_flag` (acc
   0.51 vs majority 0.61), `campaign_type` 8-class (acc 0.11 vs 0.17),
   `coordinated_attack_flag` (acc 0.83 vs 0.90 — only 20 positives in
   200 campaigns), and `defender_architecture` 8-class (collapses to
   acc 0.13 when the 7-feature topology fingerprint is dropped).

5. **Per-campaign tasks are structurally limited at n=200.** With ~30
   test campaigns per fold, statistical power is limited. The full
   ~5,500-campaign product would yield much tighter per-campaign
   metrics.

6. **Synthetic-vs-real transfer.** The dataset is synthetic, calibrated
   to 12 benchmarks from MITRE ATLAS / NIST AI 100-2 / OWASP ML Top 10
   / USENIX / IBM ART / Anthropic-OpenAI red team reports. Real
   adversarial ML telemetry has different noise characteristics, and
   in particular the threshold-encoded `detector_confidence_score`
   and zero-sentinel `evasion_budget_consumed` patterns documented in
   the diagnostic would not be present in real data. Real telemetry
   has continuous, overlapping distributions.

## Notes on dataset schema

The CYB011 sample dataset README describes some fields differently
from the actual schema. The model was trained on the actual schema;
this note helps buyers reconcile what they read with what they receive.

| What the README says | What the data actually contains |
|---|---|
| `attack_trajectories` has 18 columns | Data has **13 columns** |
| Field renames | `adversarial_phase` → `attack_phase`, `attacker_tier` → `attacker_capability_tier`, `perturbation_linf` → `feature_delta_linf_norm`, `perturbation_l2` → `feature_delta_l2_norm`, `queries_used` → `query_count_cumulative` |
| README missing from `attack_trajectories` | `detector_confidence_score`, `detection_outcome`, `evasion_budget_consumed` are in data but not documented |
| README claims `gradient_access`, `evasion_attempted`, `evasion_succeeded`, `query_budget_remaining`, `defender_detection_strength`, `concept_drift_injected`, `transfer_attack_used`, `stealth_score`, `feature_space_dim` | None of these columns exist in `attack_trajectories`. `defender_detection_strength`, `feature_space_dim`, and `stealth_score` exist in `network_topology` or `campaign_summary` respectively, not in `attack_trajectories` |
| `attacker_capability_tier` has 4 values | Data has **3 values** — `nation_state` MISSING entirely |
| `attack_phase` 6-phase lifecycle | Data has **7 phases** — adds `idle_dwell` (18% of events) |
| `campaign_summary` has 14 columns | Data has **25 columns** |
| README documents no schema for `network_topology` | Data has **12 columns** |

None of these affects model correctness — the feature pipeline uses
the actual column names. If you build your own pipeline against the
dataset, use the actual columns.

## Intended use

- **Evaluating fit** of the CYB011 dataset for your adversarial ML
  research
- **Baseline reference** for new model architectures on the attack-
  phase classification task
- **Reference example of structural-leakage diagnostics** for
  synthetic adversarial ML datasets — the methodology is reusable
- **Feature engineering reference** for per-timestep adversarial
  trajectory telemetry

## Out-of-scope use

- Production adversarial detection on real ML systems
- Attacker tier attribution (3-class per-timestep is weak; per-campaign
  is leaky via stealth_score)
- Defender architecture vulnerability assessment (trivially leaky on
  this sample; collapses when topology fingerprint is dropped)
- Campaign success prediction (unlearnable on sample)
- Any nation_state-specific modeling (tier absent from sample)
- Any operational AI security decision without further validation on
  real adversarial telemetry

## Reproducibility

Outputs above were produced with `seed = 42` (published artifact),
nested `GroupShuffleSplit` on `campaign_id` (70/15/15), on the
published sample (`xpertsystems/cyb011-sample`, version 1.0.0,
generated 2026-05-16). The feature pipeline in `feature_engineering.py`
is deterministic and the trained weights in this repo correspond
exactly to the metrics above.

Multi-seed results (seeds 42, 7, 13, 17, 23, 31, 45, 99, 123, 200)
in `multi_seed_results.json` confirm robust performance across splits
(std 0.010 on accuracy, 0.002 on ROC-AUC).

The training script itself is private to XpertSystems.

## Files in this repo

| File | Purpose |
|---|---|
| `model_xgb.json` | XGBoost weights (seed 42) |
| `model_mlp.safetensors` | PyTorch MLP weights (seed 42) |
| `feature_engineering.py` | Feature pipeline |
| `feature_meta.json` | Feature column order + categorical levels |
| `feature_scaler.json` | MLP input mean/std (XGBoost ignores) |
| `validation_results.json` | Per-class metrics, confusion matrix, architecture |
| `ablation_results.json` | Per-feature-group ablation |
| `multi_seed_results.json` | XGBoost metrics across 10 seeds |
| **`leakage_diagnostic.json`** | **6-oracle-path audit + 4 unlearnable targets + missing tier note** |
| `inference_example.ipynb` | End-to-end inference demo notebook |
| `README.md` | This file |

## Contact and full product

The full **CYB011** dataset contains **~383,000 rows** across four files,
with calibrated benchmark validation against 12 metrics drawn from
authoritative adversarial ML research (MITRE ATLAS, NIST AI 100-2
Adversarial ML Taxonomy, OWASP ML Top 10, USENIX Security adversarial
ML papers, IEEE SaTML, Microsoft Counterfit, IBM Adversarial Robustness
Toolbox, Anthropic / OpenAI red team reports).

The full XpertSystems.ai synthetic data catalogue spans 41 SKUs across
Cybersecurity, Healthcare, Insurance & Risk, Oil & Gas, and Materials
& Energy.

- 📧 **pradeep@xpertsystems.ai**
- 🌐 **https://xpertsystems.ai**
- 🗂  Dataset: https://huggingface.co/datasets/xpertsystems/cyb011-sample
- 🤖 Companion models:
  - https://huggingface.co/xpertsystems/cyb001-baseline-classifier (network traffic)
  - https://huggingface.co/xpertsystems/cyb002-baseline-classifier (ATT&CK kill-chain)
  - https://huggingface.co/xpertsystems/cyb003-baseline-classifier (malware execution phase)
  - https://huggingface.co/xpertsystems/cyb004-baseline-classifier (phishing campaign phase)
  - https://huggingface.co/xpertsystems/cyb005-baseline-classifier (ransomware actor-tier attribution)
  - https://huggingface.co/xpertsystems/cyb006-baseline-classifier (user risk tier + leakage diagnostic)
  - https://huggingface.co/xpertsystems/cyb007-baseline-classifier (insider threat type)
  - https://huggingface.co/xpertsystems/cyb008-baseline-classifier (SOC alert triage + leakage diagnostic)
  - https://huggingface.co/xpertsystems/cyb009-baseline-classifier (vulnerability classification + leakage diagnostic)
  - https://huggingface.co/xpertsystems/cyb010-baseline-classifier (attack lifecycle phase + leakage diagnostic)

## Citation

```bibtex
@misc{xpertsystems_cyb011_baseline_2026,
  title  = {CYB011 Baseline Classifier: XGBoost and MLP for Adversarial Attack Phase Classification, with 6-Oracle-Path Leakage Diagnostic},
  author = {XpertSystems.ai},
  year   = {2026},
  url    = {https://huggingface.co/xpertsystems/cyb011-baseline-classifier},
  note   = {Baseline reference model + leakage audit trained on xpertsystems/cyb011-sample}
}
```