Initial release: XGBoost + MLP for SOC alert triage outcome classification, with structural-leakage and unlearnable-target diagnostic
001717c verified | license: cc-by-nc-4.0 | |
| library_name: pytorch | |
| tags: | |
| - cybersecurity | |
| - soc-operations | |
| - alert-triage | |
| - mitre-attack | |
| - soar | |
| - siem | |
| - tabular-classification | |
| - synthetic-data | |
| - xgboost | |
| - baseline | |
| - leakage-diagnostic | |
| pipeline_tag: tabular-classification | |
| base_model: [] | |
| datasets: | |
| - xpertsystems/cyb008-sample | |
| metrics: | |
| - accuracy | |
| - f1 | |
| - roc_auc | |
| model-index: | |
| - name: cyb008-baseline-classifier | |
| results: | |
| - task: | |
| type: tabular-classification | |
| name: 5-class SOC alert triage outcome classification | |
| dataset: | |
| type: xpertsystems/cyb008-sample | |
| name: CYB008 Synthetic SOC Alert Dataset (Sample) | |
| metrics: | |
| - type: roc_auc | |
| value: 0.9522 | |
| name: Test macro ROC-AUC OvR (XGBoost, seed 42) | |
| - type: accuracy | |
| value: 0.7659 | |
| name: Test accuracy (XGBoost, seed 42) | |
| - type: f1 | |
| value: 0.7430 | |
| name: Test macro-F1 (XGBoost, seed 42) | |
| - type: accuracy | |
| value: 0.777 | |
| name: Multi-seed accuracy mean ± 0.007 (XGBoost, 10 seeds) | |
| - type: roc_auc | |
| value: 0.955 | |
| name: Multi-seed ROC-AUC mean ± 0.003 (XGBoost, 10 seeds) | |
| - type: roc_auc | |
| value: 0.9552 | |
| name: Test macro ROC-AUC OvR (MLP, seed 42) | |
| - type: accuracy | |
| value: 0.7674 | |
| name: Test accuracy (MLP, seed 42) | |
| - type: f1 | |
| value: 0.7510 | |
| name: Test macro-F1 (MLP, seed 42) | |
| # CYB008 Baseline Classifier | |
| **SOC alert triage classifier trained on the CYB008 synthetic SOC alert | |
| sample. Predicts which of 5 triage outcome classes | |
| (`auto_resolved_soar` / `duplicate_merged` / `false_positive_closed` / | |
| `true_positive_remediated` / `true_positive_escalated`) an alert | |
| will reach, from per-alert features. ALSO ships a leakage diagnostic | |
| for the three structural-oracle columns dropped from the feature | |
| pipeline.** | |
| > **Read this first.** This repo ships two related artifacts: | |
| > (1) a working baseline classifier for `resolution_outcome` (the | |
| > primary product), and (2) a `leakage_diagnostic.json` file | |
| > documenting (a) the three structural oracle columns that were | |
| > dropped from the feature set, and (b) the separate finding that the | |
| > README's first suggested use case — MITRE ATT&CK tactic | |
| > classification — is **not learnable** on this sample. Both files | |
| > matter; the diagnostic is required reading for anyone evaluating | |
| > CYB008 for a triage product. | |
| ## Model overview | |
| | Property | Value | | |
| |---|---| | |
| | Primary task | 5-class `resolution_outcome` classification (SOC alert triage) | | |
| | Secondary artifact | `leakage_diagnostic.json` — structural oracle + unlearnable-target audit | | |
| | Training data | `xpertsystems/cyb008-sample` (9,200 alerts) | | |
| | Models | XGBoost + PyTorch MLP | | |
| | Input features | 53 (after one-hot encoding) | | |
| | Split | **Stratified random** (no natural group key in this dataset — see rationale below) | | |
| | Validation | Single seed (artifact) + multi-seed aggregate across 10 seeds | | |
| | License | CC-BY-NC-4.0 (matches dataset) | | |
| | Status | Reference baseline + leakage diagnostic | | |
| ## Why this task — and what was dropped | |
| The CYB008 README lists **alert triage (TP vs FP prediction)** as its | |
| first suggested use case and **MITRE ATT&CK tactic classification** as | |
| its second. We piloted both on the sample dataset: | |
| - **Triage outcome:** works honestly. After dropping 3 structural | |
| oracle columns, the model achieves **acc 0.777 ± 0.007, ROC-AUC | |
| 0.955 ± 0.003** on 5-class classification. This is the primary | |
| baseline. | |
| - **MITRE tactic classification:** **does NOT work on this sample.** | |
| Without `mitre_technique_id` (which is a perfect ATT&CK-by-design | |
| oracle), the per-tactic feature distributions are nearly identical | |
| (raw_score 0.37–0.39 across all 12 tactics, similar for enriched | |
| score and fatigue). A trained XGBoost achieves accuracy 0.08, | |
| below the majority baseline of 0.14. The README's stated use case | |
| cannot be honestly demonstrated on the sample. See | |
| [`leakage_diagnostic.json`](./leakage_diagnostic.json) for the full | |
| finding and our recommendation to the dataset author. | |
| ### The three structural oracle columns (dropped) | |
| CYB008 has three columns that structurally encode the | |
| `resolution_outcome` label: | |
| | Column | Oracle relationship | | |
| |---|---| | |
| | `alert_lifecycle_phase` | 3 of 4 values deterministically map to specific outcomes (auto_closed → auto_resolved_soar; escalated → true_positive_escalated; suppressed_duplicate → duplicate_merged) | | |
| | `automation_resolved` | Exact 1:1 with `auto_resolved_soar` outcome | | |
| | `escalation_flag` | 1319 escalation flags = 1319 `true_positive_escalated` outcomes (near-1:1) | | |
| With all three present, plain XGBoost achieves **100% test accuracy | |
| across all seeds** — mechanical, not learned. With all three dropped, | |
| accuracy is **0.79 with ROC-AUC 0.96**: real learning on a | |
| non-trivial 5-class task. The published baseline trains with these | |
| three columns excluded. | |
| Two model artifacts are published. They are designed to be used | |
| together — disagreement is a useful triage signal: | |
| - `model_xgb.json` — gradient-boosted trees | |
| - `model_mlp.safetensors` — PyTorch MLP in SafeTensors format | |
| On CYB008 the MLP slightly outperforms XGBoost on the test fold | |
| (0.767 vs 0.766 accuracy, 0.955 vs 0.952 ROC-AUC at seed 42) — only | |
| the second SKU in the XpertSystems baseline catalog where this | |
| happens (after CYB007). | |
| ## Quick start | |
| ```bash | |
| pip install xgboost torch safetensors pandas huggingface_hub | |
| ``` | |
| ```python | |
| from huggingface_hub import hf_hub_download | |
| import json, numpy as np, torch, xgboost as xgb | |
| from safetensors.torch import load_file | |
| REPO = "xpertsystems/cyb008-baseline-classifier" | |
| paths = {n: hf_hub_download(REPO, n) for n in [ | |
| "model_xgb.json", "model_mlp.safetensors", | |
| "feature_engineering.py", "feature_meta.json", "feature_scaler.json", | |
| ]} | |
| import sys, os | |
| sys.path.insert(0, os.path.dirname(paths["feature_engineering.py"])) | |
| from feature_engineering import transform_single, load_meta, INT_TO_LABEL | |
| meta = load_meta(paths["feature_meta.json"]) | |
| xgb_model = xgb.XGBClassifier(); xgb_model.load_model(paths["model_xgb.json"]) | |
| # Predict (see inference_example.ipynb for the full pattern) | |
| # Note: do NOT include alert_lifecycle_phase, automation_resolved, or | |
| # escalation_flag in your record - those were the oracle columns. | |
| X = transform_single(my_alert_record, meta) | |
| proba = xgb_model.predict_proba(X)[0] | |
| print(INT_TO_LABEL[int(np.argmax(proba))]) | |
| ``` | |
| See [`inference_example.ipynb`](./inference_example.ipynb) for the full | |
| copy-paste demo. | |
| ## Training data | |
| Trained on the public sample of CYB008, 9,200 per-alert records: | |
| | Outcome | Alerts | Class share | | |
| |---|---:|---:| | |
| | `false_positive_closed` | 2,996 | 32.6% | | |
| | `auto_resolved_soar` | 2,642 | 28.7% | | |
| | `true_positive_remediated` | 1,848 | 20.1% | | |
| | `true_positive_escalated` | 1,319 | 14.3% | | |
| | `duplicate_merged` | 395 | 4.3% | | |
| ### Stratified split (no natural group key) | |
| CYB008 does not have a natural row-level group key for group-aware | |
| splitting: | |
| - 25 analysts — group-aware split would yield only ~4 test analysts | |
| - 5 SOCs — would yield 1 test SOC | |
| - 589 incidents — only 9% of alerts have a non-null `incident_id` | |
| Alerts are essentially independent given features, so we use | |
| **StratifiedShuffleSplit** (nested 70/15/15), the same approach as | |
| CYB001 for network flow classification: | |
| | Fold | Alerts | | |
| |---|---:| | |
| | Train | 6,440 | | |
| | Validation | 1,380 | | |
| | Test | 1,380 | | |
| Class imbalance is addressed with `class_weight='balanced'` (XGBoost | |
| `sample_weight`) and weighted cross-entropy (MLP). | |
| ## Feature pipeline | |
| The bundled `feature_engineering.py` is the canonical feature recipe. | |
| 53 features survive after encoding, drawn from: | |
| - **Per-alert numeric** (9): `raw_score`, `enriched_score`, `time_in_phase_minutes`, `queue_depth_at_ingestion`, `soar_playbook_triggered`, `sla_breached_flag`, `mttd_minutes`, `mttr_minutes`, `fatigue_score_at_alert` | |
| - **Per-alert categorical** (5, one-hot): `alert_severity` (7 values), `alert_source` (8 values), `mitre_tactic` (12 values), `analyst_tier` (3 values), `siem_platform` (8 values) | |
| - **Engineered** (6): `enrichment_lift`, `log_mttr`, `log_mttd`, `queue_pressure`, `enrichment_per_minute`, `is_high_confidence` | |
| ### Excluded columns | |
| **Oracle columns** (dropped to allow honest evaluation): | |
| | Column | Why excluded | | |
| |---|---| | |
| | `alert_lifecycle_phase` | 3 of 4 values are deterministic outcome oracles | | |
| | `automation_resolved` | 1:1 with `auto_resolved_soar` outcome | | |
| | `escalation_flag` | Near-1:1 with `true_positive_escalated` outcome | | |
| **High-cardinality columns** (dropped for tractability): | |
| | Column | Why excluded | | |
| |---|---| | |
| | `mitre_technique_id` | 36 unique values; perfect oracle for `mitre_tactic` but unrelated to this target | | |
| | `detection_rule_id` | 656 unique values; one-hot explosion with no real per-tactic affinity (only 5% of rules map to a single tactic) | | |
| ### Partial-oracle features (kept as legitimate observables) | |
| `soar_playbook_triggered` is a *necessary but not sufficient* condition | |
| for `auto_resolved_soar` — when 0, the alert is never auto-resolved; | |
| when 1, the outcome is auto-resolved 68% of the time but can also be | |
| TP-remediated, TP-escalated, FP-closed, or duplicate-merged. This is | |
| a legitimate observable that downstream operators would already have | |
| on hand at decision time. KEPT in the pipeline. | |
| ## Evaluation | |
| ### Test-set metrics, seed 42 (n = 1,380 alerts) | |
| **XGBoost** (the published `model_xgb.json` artifact) | |
| | Metric | Value | | |
| |---|---:| | |
| | Macro ROC-AUC (OvR) | **0.9522** | | |
| | Accuracy | **0.7659** | | |
| | Macro-F1 | 0.7430 | | |
| | Weighted-F1 | 0.7672 | | |
| **MLP** (the published `model_mlp.safetensors` artifact) — **slightly outperforms XGBoost** | |
| | Metric | Value | | |
| |---|---:| | |
| | Macro ROC-AUC (OvR) | **0.9552** | | |
| | Accuracy | **0.7674** | | |
| | Macro-F1 | 0.7510 | | |
| | Weighted-F1 | 0.7691 | | |
| With 6,440 training rows and 53 features, the MLP has enough data to | |
| compete favorably with boosted trees. Both models are published. | |
| ### Multi-seed robustness (XGBoost, 10 seeds) | |
| Very stable performance — std 0.007 on accuracy is among the tightest | |
| in the XpertSystems catalog: | |
| | Metric | Mean | Std | Min | Max | | |
| |---|---:|---:|---:|---:| | |
| | Accuracy | 0.777 | 0.007 | 0.766 | 0.792 | | |
| | Macro-F1 | 0.765 | 0.011 | 0.743 | 0.783 | | |
| | Macro ROC-AUC OvR | 0.955 | 0.003 | 0.950 | 0.960 | | |
| Full per-seed results in [`multi_seed_results.json`](./multi_seed_results.json). | |
| All 10 seeds yielded all 5 classes in the test fold (stratified split | |
| guarantees this). | |
| ### Per-class F1 (seed 42) | |
| | Outcome | Class share | XGBoost F1 | MLP F1 | | |
| |---|---:|---:|---:| | |
| | `false_positive_closed` | 32.6% | **0.904** | 0.910 | | |
| | `duplicate_merged` | 4.3% | 0.794 | 0.825 | | |
| | `auto_resolved_soar` | 28.7% | 0.757 | 0.751 | | |
| | `true_positive_remediated` | 20.1% | 0.701 | 0.698 | | |
| | `true_positive_escalated` | 14.3% | 0.559 | 0.571 | | |
| The model performs best on `false_positive_closed` (clearest behavioural | |
| profile — low scores, fast resolution by L1 analysts) and | |
| `duplicate_merged` (smallest class but distinctive — duplicate-suppressed | |
| severity is a strong tell). The hardest discrimination is between | |
| `true_positive_remediated` and `true_positive_escalated` — both are | |
| genuine threats, differing primarily by whether the alert was closed | |
| by the original analyst or passed to a higher tier. In production this | |
| matters less because both are TP outcomes; binary TP-vs-FP recall is | |
| much higher. | |
| ### Ablation: which feature groups matter | |
| | Configuration | Accuracy | Macro-F1 | ROC-AUC | Δ accuracy | | |
| |---|---:|---:|---:|---:| | |
| | Full feature set (published) | 0.7659 | 0.7430 | 0.9522 | — | | |
| | No alert severity | 0.5138 | 0.3933 | 0.7304 | **−0.2522** | | |
| | No `soar_playbook_triggered` | 0.6188 | 0.5773 | 0.8369 | **−0.1471** | | |
| | No analyst tier | 0.7717 | 0.7471 | 0.9524 | +0.0058 | | |
| | No siem platform | 0.7681 | 0.7474 | 0.9522 | +0.0022 | | |
| | No alert source | 0.7638 | 0.7406 | 0.9511 | −0.0022 | | |
| | No engineered features | 0.7681 | 0.7480 | 0.9533 | +0.0022 | | |
| | No mitre_tactic | 0.7812 | 0.7656 | 0.9530 | +0.0152 | | |
| | No timing features | 0.7775 | 0.7572 | 0.9547 | +0.0116 | | |
| | No score features | 0.7710 | 0.7569 | 0.9541 | +0.0051 | | |
| Four findings: | |
| 1. **Alert severity carries the dominant signal** (drops 25 pp | |
| accuracy, 22 pp ROC-AUC). This is intuitive: severity directly | |
| drives triage priority, which drives outcome. `false_positive` | |
| severity → `false_positive_closed`; `duplicate_suppressed` severity | |
| → `duplicate_merged`. | |
| 2. **`soar_playbook_triggered` is the second-strongest signal** | |
| (drops 15 pp accuracy). It's a partial oracle for the | |
| `auto_resolved_soar` outcome class. | |
| 3. **MITRE tactic and analyst tier contribute essentially nothing.** | |
| The model performs marginally *better* without them — they add | |
| noise that the trees over-fit on the training set. | |
| 4. **Engineered features and timing features are near-flat.** The | |
| trees recover composites from raw inputs. Kept in the pipeline as | |
| a documented baseline reference. | |
| ### Architecture | |
| **XGBoost:** multi-class gradient boosting (`multi:softprob`, 5 classes), | |
| `hist` tree method, class-balanced sample weights, early stopping on | |
| validation mlogloss. | |
| **MLP:** `53 → 128 → 64 → 5`, each hidden layer followed by `BatchNorm1d` | |
| → `ReLU` → `Dropout(0.3)`, weighted cross-entropy loss, AdamW optimizer, | |
| early stopping on validation macro-F1. | |
| Training hyperparameters are held internally by XpertSystems. | |
| ## Limitations | |
| **This is a baseline reference, not a production SOC triage system.** | |
| 1. **MITRE tactic classification is unlearnable on this sample.** The | |
| README lists it as a suggested use case but the per-tactic feature | |
| distributions are too similar (raw_score 0.37–0.39 across all 12 | |
| tactics). See [`leakage_diagnostic.json`](./leakage_diagnostic.json) | |
| for the full audit. Real SOC data has stronger per-tactic feature | |
| signatures. | |
| 2. **TP-remediated vs TP-escalated is the hardest discrimination.** | |
| F1 0.56 on TP-escalated is the weakest per-class result. Both are | |
| genuine threats; the difference is workflow rather than threat | |
| nature. For most operational uses (TP-vs-FP recall, SLA-breach | |
| reduction), this confusion does not matter. | |
| 3. **MLP modestly outperforms XGBoost.** Both are shipped; we | |
| recommend running both and treating disagreement as a triage | |
| triage signal. The boost is modest enough that for production | |
| deployment, the choice between them is essentially an engineering | |
| preference. | |
| 4. **Synthetic-vs-real transfer.** The dataset is synthetic and | |
| calibrated to 12 SOC-operations benchmarks (SANS SOC Survey, IBM | |
| Cost of Data Breach, Mandiant M-Trends, Forrester Wave SOAR, | |
| Gartner SIEM Magic Quadrant, SOC.OS, CrowdStrike, Splunk State of | |
| Security, Verizon DBIR). Real SOC telemetry has different noise | |
| characteristics and the structural-oracle pattern documented | |
| above (alert_lifecycle_phase deterministically encoding outcome) | |
| would not be present in real data — real lifecycle phases | |
| transition stochastically. Do not assume metrics transfer | |
| end-to-end. | |
| 5. **9,200 alerts is a modest training set.** The 1,380-alert test | |
| fold yields stable multi-seed metrics (std 0.007), but full | |
| confidence intervals for downstream production decisions should | |
| come from the full ~280k-alert product. | |
| ## Notes on dataset schema | |
| The CYB008 sample dataset README describes some fields differently | |
| from the actual schema. The model was trained on the actual schema; | |
| this note helps buyers reconcile what they read with what they receive. | |
| | What the README says | What the data actually contains | | |
| |---|---| | |
| | `incident_summary` has 8 columns | Data has **23 columns** including incident_type, kill_chain_stages_observed, false_positive_rate, soar_actions_taken, etc. | | |
| | `alert_severity` has 6 values (info / low / medium / high / critical / false_positive) | **7 values**: adds `duplicate_suppressed`. All values are suffixed (`high_severity`, `low_severity`, `critical_confirmed`, `informational`). | | |
| | `analyst_tier` has 4 values (tier_1 / tier_2 / tier_3 / manager) | 3 values on alerts (`L1_junior`, `L2_senior`, `L3_threat_hunter`); 4 on `soc_topology` (adds `L4_incident_commander`). | | |
| | 14 MITRE ATT&CK tactics | 12 tactics in the data (no `reconnaissance` or `resource_development` from PRE-ATT&CK). | | |
| | Detection source mix: edr, siem, ndr, ids, ueba, casb, deception, threat intel | Field is `alert_source` (not `detection_source`); 8 values: `edr_behavioural_engine`, `nids_signature`, `ueba_user_anomaly`, `cspm_cloud_rule`, `siem_correlation_rule`, `threat_intel_ioc_match`, `honeypot_trigger`, `itdr_identity_anomaly`. | | |
| | `triage_score` / `enrichment_score` columns | Actual names: `raw_score` / `enriched_score`. | | |
| | `alert_timestamp` (ISO string) | Actual: `alert_timestamp_min` (integer minutes from epoch). | | |
| | `kill_chain_stage`, `storm_event_flag` columns on alerts | Not present in the data. | | |
| | Field rename: `detection_source` ↔ data `alert_source` | Same fact noted twice | | |
| | `resolution_outcome` values (true_positive / false_positive / duplicate / suppressed) | Actual 5 values: `auto_resolved_soar`, `duplicate_merged`, `false_positive_closed`, `true_positive_escalated`, `true_positive_remediated`. | | |
| | Extra columns in data not in README | `shift_id`, `time_in_phase_minutes`, `queue_depth_at_ingestion`, `fatigue_score_at_alert`, `siem_platform`, `soar_playbook_id`, `detection_rule_id`, `alert_lifecycle_phase` | | |
| None of these affects model correctness — the feature pipeline uses | |
| the actual column names. If you build your own pipeline against the | |
| dataset, use the actual columns. | |
| ## Intended use | |
| - **Evaluating fit** of the CYB008 dataset for your SOC-triage research | |
| - **Baseline reference** for new model architectures | |
| - **Reference example of structural-leakage diagnostics** in | |
| synthetic SOC datasets — the diagnostic methodology is reusable | |
| - **Feature engineering reference** for per-alert SOC telemetry | |
| ## Out-of-scope use | |
| - Production SOC triage decisions on real telemetry | |
| - MITRE ATT&CK tactic prediction (this baseline establishes that | |
| task is unlearnable on the sample) | |
| - SLA-breach prediction (also tested as unlearnable on the sample — | |
| acc 0.68 vs majority 0.82) | |
| - Any operational decision affecting actual security operations | |
| without further validation on your own data | |
| ## Reproducibility | |
| Outputs above were produced with `seed = 42` (published artifact), | |
| nested `StratifiedShuffleSplit` (70/15/15), on the published sample | |
| (`xpertsystems/cyb008-sample`, version 1.0.0, generated 2026-05-16). | |
| The feature pipeline in `feature_engineering.py` is deterministic and | |
| the trained weights in this repo correspond exactly to the metrics | |
| above. | |
| Multi-seed results (seeds 42, 7, 13, 17, 23, 31, 45, 99, 123, 200) | |
| in `multi_seed_results.json` confirm robust performance across splits. | |
| The training script itself is private to XpertSystems. | |
| ## Files in this repo | |
| | File | Purpose | | |
| |---|---| | |
| | `model_xgb.json` | XGBoost weights (seed 42) | | |
| | `model_mlp.safetensors` | PyTorch MLP weights (seed 42) | | |
| | `feature_engineering.py` | Feature pipeline | | |
| | `feature_meta.json` | Feature column order + categorical levels | | |
| | `feature_scaler.json` | MLP input mean/std (XGBoost ignores) | | |
| | `validation_results.json` | Per-class metrics, confusion matrix, architecture | | |
| | `ablation_results.json` | Per-feature-group ablation | | |
| | `multi_seed_results.json` | XGBoost metrics across 10 seeds | | |
| | `leakage_diagnostic.json` | **Structural-oracle audit + unlearnable-target finding** | | |
| | `inference_example.ipynb` | End-to-end inference demo notebook | | |
| | `README.md` | This file | | |
| ## Contact and full product | |
| The full **CYB008** dataset contains ~335,000 rows across four files, | |
| with calibrated benchmark validation against 12 metrics drawn from | |
| authoritative SOC operations and threat intelligence sources (SANS | |
| SOC Survey, IBM Cost of Data Breach, Mandiant M-Trends, Forrester | |
| Wave SOAR, Gartner SIEM Magic Quadrant, SOC.OS, CrowdStrike, Splunk | |
| State of Security, Verizon DBIR). The full XpertSystems.ai synthetic | |
| data catalogue spans 41 SKUs across Cybersecurity, Healthcare, | |
| Insurance & Risk, Oil & Gas, and Materials & Energy. | |
| - 📧 **pradeep@xpertsystems.ai** | |
| - 🌐 **https://xpertsystems.ai** | |
| - 🗂 Dataset: https://huggingface.co/datasets/xpertsystems/cyb008-sample | |
| - 🤖 Companion models: | |
| - https://huggingface.co/xpertsystems/cyb001-baseline-classifier (network traffic) | |
| - https://huggingface.co/xpertsystems/cyb002-baseline-classifier (ATT&CK kill-chain) | |
| - https://huggingface.co/xpertsystems/cyb003-baseline-classifier (malware execution phase) | |
| - https://huggingface.co/xpertsystems/cyb004-baseline-classifier (phishing campaign phase) | |
| - https://huggingface.co/xpertsystems/cyb005-baseline-classifier (ransomware actor-tier attribution) | |
| - https://huggingface.co/xpertsystems/cyb006-baseline-classifier (user risk tier + leakage diagnostic) | |
| - https://huggingface.co/xpertsystems/cyb007-baseline-classifier (insider threat type) | |
| ## Citation | |
| ```bibtex | |
| @misc{xpertsystems_cyb008_baseline_2026, | |
| title = {CYB008 Baseline Classifier: XGBoost and MLP for SOC Alert Triage Outcome Classification, with Structural-Leakage and Unlearnable-Target Diagnostic}, | |
| author = {XpertSystems.ai}, | |
| year = {2026}, | |
| url = {https://huggingface.co/xpertsystems/cyb008-baseline-classifier}, | |
| note = {Baseline reference model trained on xpertsystems/cyb008-sample} | |
| } | |
| ``` | |