{ "purpose": "Document the three structural oracle columns dropped from the primary feature pipeline, and the unlearnable-target finding for mitre_tactic. CYB008 is calibrated against 12 SOC-operations benchmarks but encodes the resolution_outcome label structurally into alert_lifecycle_phase, automation_resolved, and escalation_flag. Real SOC telemetry has substantial overlap between these signals; the sample does not.", "primary_target": "resolution_outcome (5-class)", "split": "StratifiedShuffleSplit, 70/15/15 nested", "oracle_structural_findings": { "alert_lifecycle_phase": { "deterministic_mapping": { "auto_closed": "100% -> auto_resolved_soar", "escalated": "100% -> true_positive_escalated", "suppressed_duplicate": "100% -> duplicate_merged", "resolved": "splits ~62/38 false_positive_closed / true_positive_remediated" }, "note": "3 of 4 lifecycle phases are perfect class oracles. Drop required to evaluate honest learning." }, "automation_resolved": { "deterministic_mapping": { "1": "100% -> auto_resolved_soar", "0": "0 cases of auto_resolved_soar" }, "note": "Exact 1:1 oracle with auto_resolved_soar outcome class." }, "escalation_flag": { "deterministic_mapping": { "1 (n=1875)": "1319 true_positive_escalated + 556 auto_resolved_soar", "0 (n=7325)": "0 cases of true_positive_escalated" }, "note": "Near-perfect oracle for true_positive_escalated outcome." } }, "ablation_experiments": [ { "config": "full features (all oracles intact)", "n_features": 53, "accuracy": 1.0, "roc_auc": 1.0 }, { "config": "cumulative drop through alert_lifecycle_phase", "dropped_so_far": [ "alert_lifecycle_phase" ], "n_features": 49, "accuracy": 1.0, "roc_auc": 1.0 }, { "config": "cumulative drop through automation_resolved", "dropped_so_far": [ "alert_lifecycle_phase", "automation_resolved" ], "n_features": 48, "accuracy": 0.8388888888888889, "roc_auc": 0.9726930344746759 }, { "config": "cumulative drop through escalation_flag", "dropped_so_far": [ "alert_lifecycle_phase", "automation_resolved", "escalation_flag" ], "n_features": 47, "accuracy": 0.7898550724637681, "roc_auc": 0.9562439021017856 } ], "conclusion": "With all three oracle columns dropped, test accuracy is 0.79 (vs 1.00 with oracles intact, and 0.33 majority baseline). The honest model still ROC-AUC 0.96 on a 5-class task - real learning, real signal, no mechanical leakage. The published baseline trains with the three oracle columns excluded.", "mitre_tactic_unlearnable": { "purpose": "The CYB008 README's first suggested use case is 'MITRE ATT&CK tactic classification from alert features'. We test this on the sample dataset and find it is NOT LEARNABLE - features do not distinguish tactics, the model performs below majority baseline.", "task": "mitre_tactic 12-class (with mitre_technique_id excluded - it would be a perfect ATT&CK oracle)", "majority_baseline_accuracy": 0.14097826086956522, "xgboost_accuracy_mean_3seeds": 0.07971014492753624, "interpretation": "Per-tactic feature distributions are nearly identical (raw_score 0.37-0.39, enriched_score similar, fatigue 0.64 across all 12 tactics). Without mitre_technique_id (which is a 100% ATT&CK-by-design oracle), alert_source is the only discriminating signal, and it has cross-tactic purity of 0.14 - close to random. Real SOC telemetry has stronger source-to-tactic associations and per-tactic feature distributions; the sample does not reproduce these.", "recommendation_to_dataset_author": "To make tactic classification a viable benchmark, the generator should produce stronger per-tactic feature signatures (differentiated raw_score / enriched_score distributions per tactic, source-tactic affinity > 0.3 purity, characteristic MTTD / MTTR per tactic)." } }