Initial release: attack_phase 7-class baseline + 6-oracle-path leakage diagnostic + missing tier note
03d64e5 verified | { | |
| "purpose": "CYB011 sample has multiple structural leakage patterns rooted in the generator's outcome-modeling logic. Three outcome columns (detection_outcome, detector_confidence_score, evasion_budget_consumed) are perfect or near-perfect oracles for attack_phase. Per-campaign features encode attacker_capability_tier via stealth_score. Per-segment topology features uniquely fingerprint each defender_architecture. The published baseline (attack_phase 7-class) trains with the three phase oracles excluded but retains timestep as a legitimate campaign-progress observable.", | |
| "primary_target": "attack_phase (7-class, per-timestep)", | |
| "split": "GroupShuffleSplit on campaign_id, 70/15/15 nested", | |
| "missing_attacker_tier_note": { | |
| "issue": "README claims 4 attacker_capability_tier values (script_kiddie, opportunistic, advanced_persistent_threat, nation_state). The sample data contains only 3: nation_state is entirely absent. Models trained on this sample cannot generalize to nation_state actors.", | |
| "tier_counts_in_sample": { | |
| "script_kiddie": 7000, | |
| "opportunistic": 5600, | |
| "advanced_persistent_threat": 1400 | |
| } | |
| }, | |
| "oracle_paths_documented": { | |
| "P1_detection_outcome": { | |
| "target": "attack_phase", | |
| "leak_column": "detection_outcome", | |
| "mechanism": "Three of the four detection_outcome values (evasion_success, marginal_alert, high_confidence_alert) occur ONLY when attack_phase == 'evasion_attempt'. The fourth value (suppressed_alert) occurs across all 7 phases. So detection_outcome != suppressed_alert is a perfect oracle for evasion_attempt phase.", | |
| "evidence_crosstab": { | |
| "evasion_success": { | |
| "campaign_consolidation": 0, | |
| "evasion_attempt": 416, | |
| "feature_space_probe": 0, | |
| "feedback_adaptation": 0, | |
| "idle_dwell": 0, | |
| "perturbation_craft": 0, | |
| "reconnaissance": 0 | |
| }, | |
| "high_confidence_alert": { | |
| "campaign_consolidation": 0, | |
| "evasion_attempt": 1102, | |
| "feature_space_probe": 0, | |
| "feedback_adaptation": 0, | |
| "idle_dwell": 0, | |
| "perturbation_craft": 0, | |
| "reconnaissance": 0 | |
| }, | |
| "marginal_alert": { | |
| "campaign_consolidation": 0, | |
| "evasion_attempt": 3228, | |
| "feature_space_probe": 0, | |
| "feedback_adaptation": 0, | |
| "idle_dwell": 0, | |
| "perturbation_craft": 0, | |
| "reconnaissance": 0 | |
| }, | |
| "suppressed_alert": { | |
| "campaign_consolidation": 829, | |
| "evasion_attempt": 2460, | |
| "feature_space_probe": 1465, | |
| "feedback_adaptation": 496, | |
| "idle_dwell": 2450, | |
| "perturbation_craft": 745, | |
| "reconnaissance": 809 | |
| } | |
| }, | |
| "verdict": "Perfect oracle for evasion_attempt (51% of all events)." | |
| }, | |
| "P2_detector_confidence_score": { | |
| "target": "attack_phase (via detection_outcome)", | |
| "leak_column": "detector_confidence_score", | |
| "mechanism": "detector_confidence_score is threshold-derived from detection_outcome: <0.25 -> evasion_success, [0.52,0.78] -> marginal_alert, >=0.78 -> high_confidence_alert. Non-overlapping ranges mean detection_outcome is mechanically decoded from this score, indirectly oracling attack_phase.", | |
| "score_ranges_by_outcome": { | |
| "evasion_success": { | |
| "min": 0.001, | |
| "max": 0.25, | |
| "mean": 0.1801, | |
| "std": 0.0553 | |
| }, | |
| "high_confidence_alert": { | |
| "min": 0.7801, | |
| "max": 0.999, | |
| "mean": 0.8558, | |
| "std": 0.0561 | |
| }, | |
| "marginal_alert": { | |
| "min": 0.5201, | |
| "max": 0.7797, | |
| "mean": 0.6436, | |
| "std": 0.0737 | |
| }, | |
| "suppressed_alert": { | |
| "min": 0.001, | |
| "max": 0.999, | |
| "mean": 0.3992, | |
| "std": 0.1817 | |
| } | |
| }, | |
| "verdict": "Mechanical decoder for detection_outcome -> indirect oracle for phase." | |
| }, | |
| "P3_evasion_budget_consumed_zero": { | |
| "target": "attack_phase (3 early phases)", | |
| "leak_column": "evasion_budget_consumed", | |
| "mechanism": "evasion_budget_consumed == 0 occurs in 100% of {reconnaissance, feature_space_probe, perturbation_craft} events (the 3 early phases that don't submit evasion attempts). > 0 occurs in 100% of the 4 later phases.", | |
| "early_phase_events_at_zero": 3019, | |
| "verdict": "Perfect oracle for the 3 early phases." | |
| }, | |
| "P4_stealth_score_to_tier": { | |
| "target": "attacker_capability_tier (campaign level)", | |
| "leak_column": "stealth_score", | |
| "mechanism": "stealth_score has tier-discriminative ranges with modest overlap: APT in [0.806, 0.938] (mean 0.912), opportunistic in [0.751, 0.924] (mean 0.882), script_kiddie in [0.715, 0.950] (mean 0.846). Drives per-campaign tier prediction to 0.94 accuracy vs 0.50 majority - artificially inflated.", | |
| "stealth_ranges_by_tier": { | |
| "advanced_persistent_threat": { | |
| "min": 0.806, | |
| "max": 0.938, | |
| "mean": 0.9116, | |
| "std": 0.0277 | |
| }, | |
| "opportunistic": { | |
| "min": 0.7508, | |
| "max": 0.9236, | |
| "mean": 0.8816, | |
| "std": 0.0359 | |
| }, | |
| "script_kiddie": { | |
| "min": 0.7148, | |
| "max": 0.95, | |
| "mean": 0.8456, | |
| "std": 0.0462 | |
| } | |
| }, | |
| "verdict": "Near-deterministic per-tier feature. Per-campaign tier prediction is structurally inflated by this leak." | |
| }, | |
| "P5_topology_fingerprint": { | |
| "target": "defender_architecture", | |
| "leak_column": "(combination of 7 topology features)", | |
| "mechanism": "Each defender_architecture has detection_strength and adversarial_robustness as a CONSTANT (std = 0.0 across all rows of that architecture). Combined with ranges of ensemble_size, alert_threshold, detection_coverage, feature_space_dim, and retraining_cadence_days, each topology row uniquely fingerprints its defender. The 8-class defender_architecture target hits 100% accuracy via this combination.", | |
| "detection_strength_std_within_arch": { | |
| "autoencoder_anomaly": 0.0, | |
| "ensemble_stacked": 0.0, | |
| "gradient_boosted_tree": 0.0, | |
| "isolation_forest": 0.0, | |
| "lstm_behavioural": 0.0, | |
| "neural_network_dense": 0.0, | |
| "rule_based_threshold": 0.0, | |
| "transformer_sequence": 0.0 | |
| }, | |
| "adversarial_robustness_std_within_arch": { | |
| "autoencoder_anomaly": 0.0, | |
| "ensemble_stacked": 0.0, | |
| "gradient_boosted_tree": 0.0, | |
| "isolation_forest": 0.0, | |
| "lstm_behavioural": 0.0, | |
| "neural_network_dense": 0.0, | |
| "rule_based_threshold": 0.0, | |
| "transformer_sequence": 0.0 | |
| }, | |
| "verdict": "Trivially leaky 8-class target. Each segment row uniquely identifies its defender architecture by feature combination." | |
| }, | |
| "P6_timestep_partial": { | |
| "target": "attack_phase (partial)", | |
| "leak_column": "timestep", | |
| "mechanism": "Phases have characteristic timestep ranges due to the sequential lifecycle structure. reconnaissance is timestep 1-7 (mean 3.16), campaign_consolidation is 65-70 (mean 67.96), feedback_adaptation is 63-66 (mean 64.15). The middle phases overlap broadly. NOTE: timestep is KEPT as a feature in the published model because it's a legitimate campaign-progress observable a defender would have at decision time. Documenting here for transparency: removing timestep drops headline accuracy by ~9pp (0.87 -> 0.78).", | |
| "timestep_ranges_by_phase": { | |
| "campaign_consolidation": { | |
| "min": 65, | |
| "max": 70, | |
| "mean": 67.96 | |
| }, | |
| "evasion_attempt": { | |
| "min": 11, | |
| "max": 62, | |
| "mean": 40.32 | |
| }, | |
| "feature_space_probe": { | |
| "min": 4, | |
| "max": 35, | |
| "mean": 11.29 | |
| }, | |
| "feedback_adaptation": { | |
| "min": 63, | |
| "max": 66, | |
| "mean": 64.15 | |
| }, | |
| "idle_dwell": { | |
| "min": 1, | |
| "max": 70, | |
| "mean": 35.44 | |
| }, | |
| "perturbation_craft": { | |
| "min": 8, | |
| "max": 38, | |
| "mean": 16.65 | |
| }, | |
| "reconnaissance": { | |
| "min": 1, | |
| "max": 7, | |
| "mean": 3.16 | |
| } | |
| }, | |
| "verdict": "Partial oracle for 3 phases (reconnaissance, feedback_adaptation, campaign_consolidation). KEPT as legitimate progress feature." | |
| } | |
| }, | |
| "unlearnable_targets": [ | |
| { | |
| "target": "campaign_success_flag (per-campaign)", | |
| "n_campaigns": 200, | |
| "majority_baseline": 0.605, | |
| "honest_accuracy": 0.5111111111111111, | |
| "honest_roc_auc": 0.48765432098765427, | |
| "verdict": "below_majority" | |
| }, | |
| { | |
| "target": "campaign_type (per-campaign)", | |
| "n_campaigns": 200, | |
| "majority_baseline": 0.17, | |
| "honest_accuracy": 0.11111111111111112, | |
| "honest_roc_auc": 0.48226979604757386, | |
| "verdict": "below_majority" | |
| }, | |
| { | |
| "target": "coordinated_attack_flag (per-campaign)", | |
| "n_campaigns": 200, | |
| "majority_baseline": 0.9, | |
| "honest_accuracy": 0.8333333333333334, | |
| "honest_roc_auc": 0.38271604938271603, | |
| "verdict": "below_majority" | |
| }, | |
| { | |
| "target": "defender_architecture (per-campaign, all 7 topology fingerprint features dropped)", | |
| "n_campaigns": 200, | |
| "majority_baseline": 0.17, | |
| "honest_accuracy": 0.13333333333333333, | |
| "honest_roc_auc": 0.5770656344684122, | |
| "verdict": "below_majority", | |
| "note": "With all 7 topology fingerprint features included, defender_architecture hits 100% trivially. With all 7 dropped, performance collapses to or below majority. The target is not learnable from the trajectory features themselves - only from the segment fingerprint." | |
| } | |
| ], | |
| "unlearnable_summary": "Four README-suggested headline targets are unlearnable on the sample after honest oracle removal: campaign_success_flag (acc ~0.51 vs maj 0.61), campaign_type 8-class (acc ~0.11 vs maj 0.17), coordinated_attack_flag (acc ~0.83 vs maj 0.90), and defender_architecture 8-class (trivially leaky via topology fingerprint; collapses when the fingerprint is dropped). Only attack_phase 7-class learns honestly with a respectable lift over majority.", | |
| "recommendations_to_dataset_author": [ | |
| "Make detector_confidence_score have OVERLAPPING ranges across detection_outcome values. As shipped, the ranges are perfectly non-overlapping (high_confidence_alert >=0.78, marginal_alert [0.52, 0.78], evasion_success <0.25). This makes detection_outcome a mechanical function of the score.", | |
| "Allow evasion_budget_consumed to be positive in some reconnaissance / feature_space_probe / perturbation_craft events. The current zero-only encoding creates a perfect oracle for these 3 phases.", | |
| "Add per-tier feature noise. stealth_score has tier-discriminative ranges (APT >0.80, script_kiddie <0.95) but with substantial overlap. Tighten the noise so the per-campaign tier-attribution task isn't structurally inflated.", | |
| "Add per-segment NOISE to detection_strength and adversarial_robustness. Currently these are CONSTANT per defender_architecture (std=0.0). Real systems have deployment-specific tuning, so these should vary within an architecture class.", | |
| "Include the missing nation_state attacker tier in the sample. The README lists 4 tiers but the sample contains only 3. Buyers cannot validate nation_state-specific modeling on the sample.", | |
| "Increase coordinated_attack positives in the sample (only 20 of 200 campaigns at 10%). With n=20 positives, the binary task has insufficient statistical power for honest evaluation.", | |
| "For campaign_type 8-class, add stronger per-type feature signatures. Currently the 8 types are not discriminable from trajectory features at n=200 campaigns." | |
| ] | |
| } |