cosmicmicra
/

agentic-soc-research

Model card Files Files and versions

xet

Community

cosmicmicra commited on 10 days ago

Commit

b3eddb1

verified ·

1 Parent(s): 5cc4b50

Add full research document

Browse files

Files changed (1) hide show

agentic-soc-research.md +1513 -0

agentic-soc-research.md ADDED Viewed

	@@ -0,0 +1,1513 @@

+# Agentic SOC: Autonomous Security Operations Center
+## Research & Architecture Design Document
+**Objective:** Design a fully autonomous Security Operations Center powered by LLM-based reasoning agents, starting with AWS CloudTrail log ingestion. The system builds behavioral baselines without storing raw logs, detects anomalies, enriches alerts with threat intelligence and TTPs, classifies true/false positives, and either auto-remediates or escalates to humans.
+**Core Innovation:** Store the *model*, not the *data*. Normal logs update the baseline model and are discarded. Only anomaly logs are retained for investigation. This reduces storage costs by orders of magnitude compared to traditional SIEMs.
+---
+## Table of Contents
+1. [Problem Statement & Vision](#1-problem-statement--vision)
+2. [Why Traditional SIEMs Fail](#2-why-traditional-siems-fail)
+3. [System Architecture Overview](#3-system-architecture-overview)
+4. [Layer 1: CloudTrail Ingestion & Feature Extraction](#4-layer-1-cloudtrail-ingestion--feature-extraction)
+5. [Layer 2: Baseline Accumulation (Store Model, Not Logs)](#5-layer-2-baseline-accumulation-store-model-not-logs)
+6. [Layer 3: Anomaly Detection & Scoring](#6-layer-3-anomaly-detection--scoring)
+7. [Layer 4: Multi-Agent Triage Pipeline](#7-layer-4-multi-agent-triage-pipeline)
+8. [Layer 5: Threat Intelligence Enrichment & TTP Mapping](#8-layer-5-threat-intelligence-enrichment--ttp-mapping)
+9. [Layer 6: Verdict & Response (The Three-Way Decision)](#9-layer-6-verdict--response-the-three-way-decision)
+10. [Layer 7: Automated Remediation Actions](#10-layer-7-automated-remediation-actions)
+11. [Storage Economics: Quantifying the Savings](#11-storage-economics-quantifying-the-savings)
+12. [CloudTrail → MITRE ATT&CK Mapping Reference](#12-cloudtrail--mitre-attck-mapping-reference)
+13. [Open-Source Building Blocks](#13-open-source-building-blocks)
+14. [Implementation Roadmap](#14-implementation-roadmap)
+15. [Research Papers & References](#15-research-papers--references)
+---
+## 1. Problem Statement & Vision
+### The Speed Gap
+Attackers using agentic AI systems can discover and exploit vulnerabilities at machine speed. A human SOC analyst processing 50-100 alerts/day cannot match an adversary generating thousands of attack variations per hour. The only defense that scales is an agentic defense — AI systems that detect, investigate, and respond at the same speed threats are delivered.
+### The Vision: End-to-End Autonomous SOC
+```
+CloudTrail Event Stream (thousands/second)
+        │
+        ▼
+┌─────────────────────────────────────────────┐
+│  DETECT: Statistical baseline + ML scoring   │  ← No raw log storage
+│  (milliseconds per event)                    │
+└──────────────────┬──────────────────────────┘
+                   │ anomaly detected
+                   ▼
+┌─────────────────────────────────────────────┐
+│  INVESTIGATE: Multi-agent LLM reasoning      │  ← Store only anomaly logs
+│  Enrich → Classify → Map TTPs → Verdict     │
+│  (seconds per alert)                         │
+└──────────────────┬──────────────────────────┘
+                   │
+        ┌──────────┼──────────┐
+        ▼          ▼          ▼
+   FALSE POS    AUTO-ACT    ESCALATE
+   (dismiss)   (remediate)  (human)
+```
+### Three Decision Outcomes
+Every alert terminates in exactly one of three states:
+| Outcome | Condition | Action |
+|---------|-----------|--------|
+| **False Positive** | Alert does not represent a real threat | Dismiss; update baseline to widen normal bounds |
+| **True Positive — Auto-Remediate** | Real threat; known remediation within safe parameters | Execute automated response (revoke creds, isolate, block) |
+| **True Positive — Escalate** | Real threat; unknown or risky remediation | Alert human analyst with full investigation report |
+The system's value scales with the percentage of alerts that can be confidently resolved without human intervention.
+---
+## 2. Why Traditional SIEMs Fail
+### The Storage Trap
+Traditional SIEMs (Splunk, Elastic, QRadar, Sentinel) follow a **store-then-query** model:
+```
+All Logs → Index → Store (90-365 days) → Query for anomalies
+```
+**Problems:**
+- **Cost:** Enterprise CloudTrail generates 100M-1B+ events/day. At $2-5/GB ingestion (Splunk pricing), costs reach $50K-500K+/month just for CloudTrail
+- **Latency:** Detection queries run against stored data — minutes to hours of delay
+- **Noise:** 99.9%+ of stored logs are normal activity that will never be queried
+- **Context Window:** Analysts drown in data. A single investigation might require correlating events across millions of log entries
+### The IBM Insight
+IBM Cloud research (arxiv:2411.09047) demonstrated: from 413 million raw telemetry rows collected over 4.5 months for a single system, only **39,000 rows of aggregated statistics** were needed for anomaly detection — a **10,000× compression ratio**. The raw data served no purpose beyond computing the statistics.
+### Our Approach: Accumulate the Baseline, Discard the Logs
+```
+Event → Extract Features → Update Baseline Model → Discard Event
+                                    │
+                              Is this anomalous?
+                              ├── No  → event discarded (baseline updated)
+                              └── Yes → event STORED for investigation
+```
+**Storage model:** O(entities × model_size) instead of O(events × retention_period)
+For a typical AWS environment with 10,000 entities (users, roles, services) and 1MB per entity model:
+- Our approach: **~10 GB** (constant, regardless of time)
+- Traditional SIEM: **10-100+ TB/year** (linear growth)
+---
+## 3. System Architecture Overview
+```
+═══════════════════════════════════════════════════════════════════════════
+                        AGENTIC SOC ARCHITECTURE
+═══════════════════════════════════════════════════════════════════════════
+┌─────────────────────────────────────────────────────────────────────────┐
+│                     LAYER 1: DATA INGESTION                            │
+│                                                                        │
+│  AWS CloudTrail ──→ S3 Bucket ──→ SQS Queue ──→ Event Consumer        │
+│       │                                              │                 │
+│       │  (Future: VPC Flow Logs, GuardDuty,         │                 │
+│       │   Email/Phishing Logs, Endpoint Logs)       │                 │
+│       │                                              ▼                 │
+│       └──────────────────────────────────────→ Feature Extractor       │
+│                                                (logem 0.6B model       │
+│                                                 + rule-based parser)   │
+└────────────────────────────────────┬────────────────────────────────────┘
+                                     │ feature_vector + raw_event
+                                     ▼
+┌─────────────────────────────────────────────────────────────────────────┐
+│                 LAYER 2: BASELINE ACCUMULATOR                          │
+│                                                                        │
+│  ┌──────────────────────┐  ┌───────────────────┐  ┌────────────────┐  │
+│  │ Per-Entity Profiles  │  │ Count-Min Sketch  │  │ Online iForest │  │
+│  │ (EMA μ/σ per feature)│  │ (frequency/burst) │  │ (structural)   │  │
+│  │ ~1KB per entity      │  │ ~80KB total       │  │ ~2MB total     │  │
+│  └──────────┬───────────┘  └────────┬──────────┘  └───────┬────────┘  │
+│             │                       │                      │           │
+│             └───────────┬───────────┴──────────────────────┘           │
+│                         │                                              │
+│                   Composite Anomaly Score                              │
+│                   score > threshold?                                   │
+│                    │           │                                       │
+│                   NO          YES                                      │
+│                    │           │                                       │
+│              Update baseline   Store anomaly log ──→ Anomaly Store    │
+│              Discard raw log                                          │
+└────────────────────────────────────┬────────────────────────────────────┘
+                                     │ anomaly event
+                                     ▼
+┌─────────────────────────────────────────────────────────────────────────┐
+│              LAYER 3: MULTI-AGENT TRIAGE PIPELINE                      │
+│              (LangGraph Orchestration)                                  │
+│                                                                        │
+│  ┌─────────────┐    ┌──────────────────┐    ┌───────────────────────┐ │
+│  │ Orchestrator │──→│ Behavior Analysis │──→│ Evidence Acquisition  │ │
+│  │ Agent        │    │ Agent            │    │ Agents (per-workflow) │ │
+│  │              │    │                  │    │                       │ │
+│  │ Routes alert │    │ Classifies into: │    │ Tools:                │ │
+│  │ Controls flow│    │ • CredChange     │    │ • queryCloudTrail()   │ │
+│  │ Consistency  │    │ • IAMPolicyMod   │    │ • getIAMUser()        │ │
+│  │ checks       │    │ • GeoAnomaly     │    │ • lookupIP()          │ │
+│  └─────────────┘    │ • UnusualAPI     │    │ • getAssetRecord()    │ │
+│                      │ • DataExfil      │    │ • queryAthena()       │ │
+│                      │ • PrivEsc        │    └───────────┬───────────┘ │
+│                      │ • Recon          │                │             │
+│                      └──────────────────┘                ▼             │
+│                                              ┌───────────────────────┐ │
+│                                              │ Symbolic Verifier     │ │
+│                                              │ (deterministic rules  │ │
+│                                              │  to ground LLM output)│ │
+│                                              └───────────┬───────────┘ │
+│                                                          ▼             │
+│                                              ┌───────────────────────┐ │
+│                                              │ Reasoning & Synthesis │ │
+│                                              │ Agent                 │ │
+│                                              │                       │ │
+│                                              │ + RAG CTI Enrichment  │ │
+│                                              │ + MITRE ATT&CK Map   │ │
+│                                              │ + Severity Scoring    │ │
+│                                              │ → Structured Report   │ │
+│                                              └───────────┬───────────┘ │
+└──────────────────────────────────────────────────────────┬──────────────┘
+                                                           │
+                                     ┌─────────────────────┼──────────────────────┐
+                                     ▼                     ▼                      ▼
+                              ┌─────────────┐    ┌──────────────────┐    ┌─────────────────┐
+                              │FALSE POSITIVE│    │TRUE POS: AUTO-ACT│    │TRUE POS: ESCALATE│
+                              │             │    │                  │    │                  │
+                              │Update baseline│    │Execute playbook: │    │Create case in    │
+                              │Widen normal  │    │• Revoke creds   │    │TheHive           │
+                              │bounds        │    │• Block IP       │    │Page analyst      │
+                              │Log dismissal │    │• Isolate instance│    │Full report +     │
+                              │reason        │    │• Revert IAM     │    │evidence attached │
+                              └─────────────┘    └──────────────────┘    └─────────────────┘
+```
+---
+## 4. Layer 1: CloudTrail Ingestion & Feature Extraction
+### CloudTrail Event Schema
+Every AWS API call generates a CloudTrail event with this structure:
+```json
+{
+  "eventVersion": "1.08",
+  "userIdentity": {
+    "type": "IAMUser | AssumedRole | Root | FederatedUser | AWSService",
+    "principalId": "AIDACKCEVSQ6C2EXAMPLE",
+    "arn": "arn:aws:iam::123456789012:user/alice",
+    "accountId": "123456789012",
+    "accessKeyId": "ASIAIOSFODNN7EXAMPLE",
+    "userName": "alice",
+    "sessionContext": {
+      "mfaAuthenticated": "true",
+      "creationDate": "2024-01-15T10:30:00Z"
+    }
+  },
+  "eventTime": "2024-01-15T14:22:33Z",
+  "eventSource": "iam.amazonaws.com",
+  "eventName": "CreateAccessKey",
+  "awsRegion": "us-east-1",
+  "sourceIPAddress": "203.0.113.50",
+  "userAgent": "aws-cli/2.13.0 Python/3.11.4",
+  "requestParameters": {
+    "userName": "bob"
+  },
+  "responseElements": {
+    "accessKey": {
+      "accessKeyId": "AKIAIOSFODNN7EXAMPLE",
+      "status": "Active",
+      "userName": "bob"
+    }
+  },
+  "errorCode": null,
+  "errorMessage": null,
+  "readOnly": false,
+  "eventType": "AwsApiCall",
+  "managementEvent": true,
+  "recipientAccountId": "123456789012"
+}
+```
+### Feature Extraction Pipeline
+Transform each raw CloudTrail JSON event into a numerical feature vector for the baseline model:
+```python
+def extract_features(event: dict) -> dict:
+    """Extract security-relevant features from a CloudTrail event."""
+    identity = event.get("userIdentity", {})
+    return {
+        # Identity features
+        "principal_hash": hash(identity.get("arn", "")),
+        "identity_type": encode_category(identity.get("type")),
+        "mfa_authenticated": 1 if identity.get("sessionContext", {})
+                              .get("mfaAuthenticated") == "true" else 0,
+        # Action features
+        "event_source_hash": hash(event.get("eventSource", "")),
+        "event_name_hash": hash(event.get("eventName", "")),
+        "is_write_event": 0 if event.get("readOnly") else 1,
+        "is_management_event": 1 if event.get("managementEvent") else 0,
+        "has_error": 1 if event.get("errorCode") else 0,
+        "error_code_hash": hash(event.get("errorCode", "")),
+        # Context features
+        "hour_of_day": parse_hour(event["eventTime"]),
+        "day_of_week": parse_dow(event["eventTime"]),
+        "region_hash": hash(event.get("awsRegion", "")),
+        "source_ip_hash": hash(event.get("sourceIPAddress", "")),
+        "user_agent_hash": hash(event.get("userAgent", "")),
+        # Behavioral features (computed from recent window)
+        "api_calls_last_5min": count_recent(identity["arn"], minutes=5),
+        "unique_services_last_hour": count_unique_services(identity["arn"], hours=1),
+        "unique_regions_last_hour": count_unique_regions(identity["arn"], hours=1),
+        "error_rate_last_hour": error_rate(identity["arn"], hours=1),
+        "new_api_call": is_first_time_api(identity["arn"], event["eventName"]),
+    }
+```
+### Using logem (0.6B) for Structured Extraction
+For complex or non-standard log formats (future expansion beyond CloudTrail), use the fine-tuned `HassanShehata/logem` model:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+# 0.6B params, 396MB quantized — runs on CPU
+tokenizer = AutoTokenizer.from_pretrained("HassanShehata/logem")
+model = AutoModelForCausalLM.from_pretrained("HassanShehata/logem")
+# Achieves F1=0.833 on SIEM field extraction (beats Gemma 12B)
+```
+For CloudTrail specifically, rule-based JSON parsing is faster and deterministic. Reserve the LLM parser for unstructured logs (syslog, application logs, email headers) when the system expands.
+---
+## 5. Layer 2: Baseline Accumulation (Store Model, Not Logs)
+This is the core innovation. Three complementary models operate in parallel, each maintaining a compact representation of "normal" behavior:
+### Tier 1: Per-Entity Statistical Profiles (Fastest, Smallest)
+For each entity (IAM user, role, service), maintain rolling statistics:
+```python
+from collections import defaultdict
+import math
+class EntityProfile:
+    """Compact behavioral profile. ~1KB per entity. No raw log storage."""
+    def __init__(self, alpha=0.01):
+        self.alpha = alpha  # EMA decay rate
+        # Per-feature exponential moving average
+        self.mu = defaultdict(float)     # running mean
+        self.var = defaultdict(lambda: 1.0)  # running variance
+        # Categorical frequency distributions
+        self.api_freq = {}        # {event_name: count} (top-K only)
+        self.region_freq = {}     # {region: count}
+        self.hour_dist = [0] * 24 # hourly activity distribution
+        self.ip_set_size = 0      # HyperLogLog cardinality estimate
+        # Metadata
+        self.event_count = 0
+        self.last_seen = None
+        self.first_seen = None
+    def update(self, features: dict):
+        """O(1) update. No raw data retained."""
+        self.event_count += 1
+        for key, value in features.items():
+            if isinstance(value, (int, float)):
+                # Welford's online algorithm for mean/variance
+                old_mu = self.mu[key]
+                self.mu[key] = self.alpha * value + (1 - self.alpha) * old_mu
+                self.var[key] = (self.alpha * (value - self.mu[key])**2
+                                + (1 - self.alpha) * self.var[key])
+    def anomaly_score(self, features: dict) -> float:
+        """Z-score based anomaly scoring."""
+        scores = []
+        for key, value in features.items():
+            if isinstance(value, (int, float)) and key in self.mu:
+                sigma = math.sqrt(self.var[key] + 1e-8)
+                z = abs(value - self.mu[key]) / sigma
+                scores.append(z)
+        return max(scores) if scores else 0.0
+    def memory_bytes(self) -> int:
+        """Total memory footprint of this profile."""
+        return (len(self.mu) * 16  # 8 bytes key + 8 bytes float
+                + len(self.var) * 16
+                + len(self.api_freq) * 40
+                + len(self.region_freq) * 40
+                + 24 * 8  # hour_dist
+                + 64)  # metadata
+        # Typically ~500-2000 bytes per entity
+```
+### Tier 2: Count-Min Sketch (Frequency/Burst Detection)
+Detect unusual frequencies of (entity, action) pairs without storing any raw events:
+```python
+import hashlib
+import numpy as np
+class CountMinSketch:
+    """Fixed-size frequency tracker. 80KB regardless of stream length."""
+    def __init__(self, depth=5, width=2048):
+        self.depth = depth
+        self.width = width
+        self.table = np.zeros((depth, width), dtype=np.int64)
+        self.hash_seeds = [i * 0x9e3779b9 for i in range(depth)]
+    def _hash(self, key: str, seed: int) -> int:
+        h = hashlib.md5(f"{seed}:{key}".encode()).hexdigest()
+        return int(h, 16) % self.width
+    def add(self, key: str, count: int = 1):
+        for i in range(self.depth):
+            j = self._hash(key, self.hash_seeds[i])
+            self.table[i][j] += count
+    def estimate(self, key: str) -> int:
+        return min(
+            self.table[i][self._hash(key, self.hash_seeds[i])]
+            for i in range(self.depth)
+        )
+    def memory_bytes(self) -> int:
+        return self.depth * self.width * 8  # 80KB for default params
+class BurstDetector:
+    """Detect unusual bursts using CMS + time windows."""
+    def __init__(self):
+        self.current_window = CountMinSketch()  # current time window
+        self.baseline_window = CountMinSketch()  # historical baseline
+        self.window_count = 0
+    def process(self, entity: str, event_name: str) -> float:
+        key = f"{entity}:{event_name}"
+        self.current_window.add(key)
+        current = self.current_window.estimate(key)
+        baseline = max(self.baseline_window.estimate(key), 1)
+        # Chi-squared style anomaly score (from MIDAS, AAAI 2020)
+        expected = baseline * (1.0 / max(self.window_count, 1))
+        score = (current - expected)**2 / (expected + 1e-8)
+        return score
+    def rotate_window(self):
+        """Call periodically (e.g., every 5 minutes)."""
+        # Merge current into baseline with decay
+        self.baseline_window.table = (
+            0.95 * self.baseline_window.table
+            + 0.05 * self.current_window.table
+        ).astype(np.int64)
+        self.current_window = CountMinSketch()
+        self.window_count += 1
+```
+### Tier 3: Online Isolation Forest (Structural Anomaly Detection)
+For detecting complex, multi-feature anomalies that simple statistics miss:
+```python
+# Using PySAD library (pip install pysad)
+from pysad.models import HalfSpaceTrees, xStream
+class StructuralAnomalyDetector:
+    """Streaming isolation forest. ~2MB fixed memory. No raw data storage."""
+    def __init__(self):
+        # Half-Space Trees: 32 trees, depth 15, window 250
+        self.model = HalfSpaceTrees(
+            n_trees=32,
+            max_depth=15,
+            window_size=250
+        )
+        self.is_warm = False
+        self.warmup_count = 0
+        self.warmup_threshold = 500  # events before scoring is reliable
+    def process(self, feature_vector) -> float:
+        """Process single event. Returns anomaly score."""
+        score = self.model.fit_score_partial(feature_vector)
+        self.warmup_count += 1
+        if self.warmup_count >= self.warmup_threshold:
+            self.is_warm = True
+        return score if self.is_warm else 0.0  # don't score during warmup
+```
+### Composite Scoring
+```python
+class BaselineAccumulator:
+    """Orchestrates all three tiers. Decides: store or discard."""
+    def __init__(self, anomaly_threshold=3.0):
+        self.entity_profiles = {}  # arn -> EntityProfile
+        self.burst_detector = BurstDetector()
+        self.structural_detector = StructuralAnomalyDetector()
+        self.threshold = anomaly_threshold
+    def process_event(self, event: dict) -> tuple:
+        """
+        Returns: (is_anomaly: bool, scores: dict, raw_event_or_none)
+        If normal: returns (False, scores, None) — raw event can be discarded
+        If anomaly: returns (True, scores, event) — raw event retained
+        """
+        features = extract_features(event)
+        entity_arn = event["userIdentity"]["arn"]
+        # Get or create entity profile
+        if entity_arn not in self.entity_profiles:
+            self.entity_profiles[entity_arn] = EntityProfile()
+        profile = self.entity_profiles[entity_arn]
+        # Score across all three tiers
+        stat_score = profile.anomaly_score(features)
+        burst_score = self.burst_detector.process(
+            entity_arn, event["eventName"]
+        )
+        structural_score = self.structural_detector.process(
+            list(features.values())
+        )
+        composite = max(stat_score, burst_score, structural_score)
+        # Always update the baseline (even for anomalies)
+        profile.update(features)
+        scores = {
+            "statistical": stat_score,
+            "burst": burst_score,
+            "structural": structural_score,
+            "composite": composite
+        }
+        if composite > self.threshold:
+            return (True, scores, event)   # STORE anomaly log
+        else:
+            return (False, scores, None)    # DISCARD normal log
+    def total_memory(self) -> str:
+        entity_mem = sum(p.memory_bytes() for p in self.entity_profiles.values())
+        sketch_mem = self.burst_detector.current_window.memory_bytes() * 2
+        structural_mem = 2 * 1024 * 1024  # ~2MB for HST
+        total = entity_mem + sketch_mem + structural_mem
+        return f"{total / 1024 / 1024:.1f} MB for {len(self.entity_profiles)} entities"
+```
+### Concept Drift Handling
+Normal behavior changes over time (employees change roles, new services deployed). Use ADWIN (Adaptive Windowing) to detect drift and re-initialize:
+```python
+from river import drift
+class DriftAwareBaseline(BaselineAccumulator):
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        self.drift_detectors = {}  # per-entity ADWIN instances
+    def process_event(self, event):
+        result = super().process_event(event)
+        entity = event["userIdentity"]["arn"]
+        if entity not in self.drift_detectors:
+            self.drift_detectors[entity] = drift.ADWIN(delta=0.002)
+        self.drift_detectors[entity].update(result[1]["composite"])
+        if self.drift_detectors[entity].drift_detected:
+            # Behavior has fundamentally changed — reset entity profile
+            self.entity_profiles[entity] = EntityProfile()
+            self.drift_detectors[entity] = drift.ADWIN(delta=0.002)
+            # Log the drift event (this IS stored as an anomaly)
+            return (True, {"drift": True}, event)
+        return result
+```
+---
+## 6. Layer 3: Anomaly Detection & Scoring
+### Warm-Up Period
+All models need a learning period before thresholds are reliable:
+| Phase | Duration | Raw Log Storage | Behavior |
+|-------|----------|-----------------|----------|
+| **Cold Start** | First 24 hours per entity | YES (stored temporarily) | Build initial profile |
+| **Warm-Up** | Hours 24-72 | Selective (high-score only) | Calibrate thresholds |
+| **Operational** | Day 3+ | Anomalies only | Full pipeline active |
+During cold start, raw logs are temporarily stored and replayed to build the initial baseline. After warm-up, temporary logs are deleted.
+### Threshold Calibration
+Use Gaussian Tail Probability to convert raw anomaly scores to p-values for consistent false-positive control:
+```python
+from scipy import stats
+import numpy as np
+class ThresholdCalibrator:
+    """Adaptive thresholds based on score distributions."""
+    def __init__(self, target_fpr=0.001):  # 0.1% false positive rate
+        self.target_fpr = target_fpr
+        self.score_buffer = []  # rolling window of recent scores
+        self.buffer_size = 10000
+        self.threshold = 3.0  # initial Z-score threshold
+    def update(self, score: float):
+        self.score_buffer.append(score)
+        if len(self.score_buffer) > self.buffer_size:
+            self.score_buffer.pop(0)
+        if len(self.score_buffer) >= 1000:
+            # Fit Gaussian to score distribution
+            mu = np.mean(self.score_buffer)
+            sigma = np.std(self.score_buffer) + 1e-8
+            # Set threshold at target FPR
+            self.threshold = mu + sigma * stats.norm.ppf(1 - self.target_fpr)
+```
+---
+## 7. Layer 4: Multi-Agent Triage Pipeline
+Based on the CORTEX architecture (arxiv:2510.00311), which achieved F1=0.78 and reduced false positives by 10.7 percentage points over single-agent approaches.
+### Why Multi-Agent?
+| Approach | F1 Score | FPR | Failure Mode |
+|----------|----------|-----|-------------|
+| Single LLM agent | 0.66 | 24.9% | Context cramming, hallucination |
+| **Multi-agent (CORTEX)** | **0.78** | **14.2%** | N/A — divide-and-conquer eliminates it |
+### Agent Definitions
+```python
+from langgraph.graph import StateGraph, END
+from typing import TypedDict, List, Optional
+class SOCState(TypedDict):
+    """State passed between agents in the triage pipeline."""
+    alert: dict                    # Raw anomaly event + scores
+    entity_profile: dict           # Historical profile summary
+    workflow: str                  # Classified workflow type
+    evidence: List[dict]           # Gathered evidence
+    enrichment: dict               # CTI, TTP, severity
+    symbolic_check: dict           # Deterministic rule validation
+    verdict: str                   # FP | TP_AUTO | TP_ESCALATE
+    confidence: float              # 0.0 - 1.0
+    reasoning: str                 # Natural language explanation
+    recommended_actions: List[str] # Specific remediation steps
+    triage_report: dict            # Final structured report
+# Agent 1: Orchestrator
+ORCHESTRATOR_PROMPT = """You are the SOC Orchestrator Agent. Your role is to:
+1. Validate the incoming alert has all required fields
+2. Route to the Behavior Analysis Agent
+3. Ensure all pipeline stages complete
+4. Perform consistency checks on the final report
+5. If any agent returns an error, retry or escalate
+You do NOT make triage decisions. You manage the process."""
+# Agent 2: Behavior Analysis
+BEHAVIOR_ANALYST_PROMPT = """You are the Behavior Analysis Agent. Given a CloudTrail
+anomaly event and entity profile, classify it into exactly ONE workflow:
+- CREDENTIAL_CHANGE: CreateAccessKey, UpdateAccessKey, CreateLoginProfile,
+  ChangePassword for another user
+- IAM_POLICY_MOD: PutRolePolicy, AttachUserPolicy, CreatePolicy,
+  PutGroupPolicy with overly permissive policies
+- GEO_ANOMALY: API calls from IP/region never seen for this entity
+- UNUSUAL_API: API call this entity has never made before, especially
+  sensitive APIs (GetSecretValue, GetPasswordData, etc.)
+- DATA_EXFIL: High-volume S3 GetObject, unusual data transfer patterns,
+  copy to external accounts
+- PRIVILEGE_ESCALATION: AssumeRole to higher-privilege role, iam:PassRole
+  to sensitive service
+- RECONNAISSANCE: Describe*, List*, Get* calls across multiple services
+  in rapid succession
+- DEFENSE_EVASION: StopLogging, DeleteTrail, DisableAlarmActions,
+  PutBucketPolicy reducing restrictions
+Output JSON: {"workflow": "...", "confidence": 0.0-1.0, "reasoning": "..."}"""
+# Agent 3: Evidence Acquisition (per-workflow)
+EVIDENCE_TOOLS = {
+    "queryCloudTrailEvents": "Query recent CloudTrail events for an entity within a time range",
+    "getIAMUser": "Get IAM user details including policies, groups, MFA status",
+    "getIAMRole": "Get IAM role details including trust policy and permissions",
+    "lookupIP": "Lookup IP address in threat intelligence databases (AbuseIPDB, VirusTotal)",
+    "getEntityProfile": "Retrieve the baseline behavioral profile for an entity",
+    "queryAthena": "Run SQL query against CloudTrail logs in Athena (anomaly store)",
+    "getAssetRecord": "Get EC2 instance, Lambda function, or S3 bucket details",
+    "getGuardDutyFindings": "Check if GuardDuty has related findings",
+    "getSecurityHubFindings": "Check Security Hub for related compliance findings",
+}
+# Agent 4: Reasoning & Synthesis
+REASONING_PROMPT = """You are the Reasoning & Synthesis Agent. Given:
+- The classified workflow
+- All gathered evidence
+- Threat intelligence enrichment
+- Symbolic verification results
+You must produce a structured triage report with:
+1. VERDICT: One of:
+   - FALSE_POSITIVE: This is normal/expected behavior. Explain why.
+   - TRUE_POSITIVE_AUTO: This is a real threat AND safe to auto-remediate.
+     Specify exact remediation actions.
+   - TRUE_POSITIVE_ESCALATE: This is a real threat BUT requires human judgment.
+     Explain what is uncertain.
+2. CONFIDENCE: 0.0-1.0 (must be >0.9 for AUTO remediation)
+3. MITRE_TTPS: List of applicable MITRE ATT&CK technique IDs
+4. SEVERITY: CRITICAL / HIGH / MEDIUM / LOW
+5. EVIDENCE_SUMMARY: Key evidence points that support the verdict
+6. REASONING_CHAIN: Step-by-step logic leading to the verdict
+CRITICAL RULES:
+- When in doubt, ESCALATE. Never auto-remediate with confidence < 0.9
+- Always check if the action was performed by a known automation/service role
+- Consider time of day, historical patterns, and business context
+- A single unusual action is not necessarily malicious — look for chains"""
+```
+### LangGraph Pipeline
+```python
+def build_soc_pipeline():
+    """Build the multi-agent SOC triage pipeline."""
+    workflow = StateGraph(SOCState)
+    # Add nodes
+    workflow.add_node("orchestrator", orchestrator_agent)
+    workflow.add_node("behavior_analysis", behavior_analysis_agent)
+    workflow.add_node("evidence_gathering", evidence_gathering_agent)
+    workflow.add_node("symbolic_verification", symbolic_verifier)
+    workflow.add_node("reasoning", reasoning_agent)
+    workflow.add_node("response_executor", response_executor)
+    # Define edges
+    workflow.set_entry_point("orchestrator")
+    workflow.add_edge("orchestrator", "behavior_analysis")
+    workflow.add_edge("behavior_analysis", "evidence_gathering")
+    workflow.add_edge("evidence_gathering", "symbolic_verification")
+    workflow.add_edge("symbolic_verification", "reasoning")
+    # Conditional routing based on verdict
+    workflow.add_conditional_edges(
+        "reasoning",
+        route_verdict,
+        {
+            "false_positive": "update_baseline",
+            "auto_remediate": "response_executor",
+            "escalate": "create_case",
+            "retry": "evidence_gathering",  # Need more evidence
+        }
+    )
+    workflow.add_node("update_baseline", update_baseline_node)
+    workflow.add_node("create_case", create_case_node)
+    workflow.add_edge("update_baseline", END)
+    workflow.add_edge("response_executor", END)
+    workflow.add_edge("create_case", END)
+    return workflow.compile()
+```
+### Symbolic Verifier (Grounds LLM Output in Deterministic Rules)
+Based on CloudAnoAgent (arxiv:2508.01844). Prevents LLM hallucination by cross-checking verdicts against deterministic rules:
+```python
+class SymbolicVerifier:
+    """Deterministic rule checker to ground LLM reasoning."""
+    RULES = {
+        "CREDENTIAL_CHANGE": {
+            "auto_remediate_conditions": [
+                "target_user != source_user",  # Creating creds for someone else
+                "mfa_not_authenticated",
+                "source_ip_not_in_corporate_range",
+            ],
+            "false_positive_conditions": [
+                "source_is_known_automation_role",
+                "target_user == source_user AND mfa_authenticated",
+            ],
+        },
+        "DEFENSE_EVASION": {
+            "auto_remediate_conditions": [
+                "event_name in ['StopLogging', 'DeleteTrail', 'UpdateTrail']",
+                # ALWAYS true positive — these should never happen in production
+            ],
+            "false_positive_conditions": [],  # Never FP
+            "always_critical": True,
+        },
+        "GEO_ANOMALY": {
+            "auto_remediate_conditions": [
+                "distance_km > 500 AND time_since_last_event_hours < 2",
+                # Impossible travel
+            ],
+            "false_positive_conditions": [
+                "source_ip_is_known_vpn",
+                "source_ip_is_aws_service",
+            ],
+        },
+    }
+    def verify(self, workflow: str, evidence: dict, llm_verdict: str) -> dict:
+        """Cross-check LLM verdict against deterministic rules."""
+        rules = self.RULES.get(workflow, {})
+        conflicts = []
+        # Check if LLM says FP but rules say it can't be
+        if llm_verdict == "FALSE_POSITIVE":
+            if rules.get("always_critical"):
+                conflicts.append(
+                    f"LLM classified as FP but {workflow} is ALWAYS critical"
+                )
+        # Check if LLM says auto-remediate but confidence rules aren't met
+        if llm_verdict == "TRUE_POSITIVE_AUTO":
+            if not self._check_conditions(
+                rules.get("auto_remediate_conditions", []), evidence
+            ):
+                conflicts.append(
+                    "Auto-remediation conditions not met — escalate instead"
+                )
+        return {
+            "verified": len(conflicts) == 0,
+            "conflicts": conflicts,
+            "override_verdict": "TRUE_POSITIVE_ESCALATE" if conflicts else None,
+        }
+```
+---
+## 8. Layer 5: Threat Intelligence Enrichment & TTP Mapping
+### RAG-Based CTI Enrichment
+Based on the architecture from arxiv:2504.00428 (LLM-Assisted Proactive Threat Intelligence):
+```python
+from sentence_transformers import SentenceTransformer
+from langchain.vectorstores import Chroma
+class CTIEnricher:
+    """RAG-based threat intelligence enrichment."""
+    def __init__(self):
+        # Embedding model for CTI document retrieval
+        self.embedder = SentenceTransformer("all-mpnet-base-v2")
+        # Vector store loaded with:
+        self.feeds = {
+            "mitre_attack": "MITRE ATT&CK Enterprise + Cloud matrix",
+            "nvd_cve": "National Vulnerability Database CVE entries",
+            "cisa_kev": "CISA Known Exploited Vulnerabilities",
+            "abuse_ipdb": "AbuseIPDB IP reputation data",
+            "aws_security_bulletins": "AWS security advisories",
+        }
+        self.vector_store = Chroma(
+            collection_name="cti_knowledge",
+            embedding_function=self.embedder,
+        )
+    def enrich(self, alert: dict, workflow: str) -> dict:
+        """Enrich alert with threat intelligence context."""
+        # Build query from alert context
+        query = (f"AWS CloudTrail {alert['eventName']} "
+                f"by {alert['userIdentity']['type']} "
+                f"workflow: {workflow}")
+        # Retrieve relevant CTI documents
+        docs = self.vector_store.similarity_search(query, k=5)
+        # Map to MITRE ATT&CK
+        ttps = self.map_to_attack(alert["eventName"], workflow)
+        return {
+            "mitre_ttps": ttps,
+            "cti_context": [doc.page_content for doc in docs],
+            "ip_reputation": self.check_ip(alert.get("sourceIPAddress")),
+            "related_cves": self.find_related_cves(alert),
+        }
+    def map_to_attack(self, event_name: str, workflow: str) -> list:
+        """Map CloudTrail event to MITRE ATT&CK techniques."""
+        # See Section 12 for complete mapping
+        return CLOUDTRAIL_ATTACK_MAP.get(event_name, [])
+```
+---
+## 9. Layer 6: Verdict & Response (The Three-Way Decision)
+### Decision Logic
+```python
+def make_verdict(
+    llm_verdict: str,
+    llm_confidence: float,
+    symbolic_check: dict,
+    severity: str,
+) -> str:
+    """
+    Final verdict incorporating LLM reasoning + symbolic verification.
+    Conservative by design:
+    - Auto-remediate only when BOTH LLM AND symbolic verifier agree
+    - Escalate if there's ANY disagreement
+    - FP only when LLM is confident AND no symbolic conflicts
+    """
+    # Symbolic verifier overrides LLM
+    if not symbolic_check["verified"]:
+        if symbolic_check["override_verdict"]:
+            return symbolic_check["override_verdict"]
+        return "TRUE_POSITIVE_ESCALATE"
+    # High confidence + verified = trust LLM verdict
+    if llm_verdict == "FALSE_POSITIVE" and llm_confidence > 0.85:
+        return "FALSE_POSITIVE"
+    if llm_verdict == "TRUE_POSITIVE_AUTO" and llm_confidence > 0.90:
+        # Extra safety: CRITICAL severity always escalates
+        if severity == "CRITICAL":
+            return "TRUE_POSITIVE_ESCALATE"
+        return "TRUE_POSITIVE_AUTO"
+    # Default: escalate
+    return "TRUE_POSITIVE_ESCALATE"
+```
+### Response Actions by Verdict
+```python
+class ResponseExecutor:
+    """Execute automated responses for confirmed true positives."""
+    async def execute(self, verdict: str, alert: dict, report: dict):
+        if verdict == "FALSE_POSITIVE":
+            await self.dismiss_and_learn(alert, report)
+        elif verdict == "TRUE_POSITIVE_AUTO":
+            await self.auto_remediate(alert, report)
+        elif verdict == "TRUE_POSITIVE_ESCALATE":
+            await self.escalate_to_human(alert, report)
+    async def dismiss_and_learn(self, alert, report):
+        """Update baseline to prevent future FP on similar events."""
+        entity = alert["userIdentity"]["arn"]
+        # Widen normal bounds for this entity's profile
+        profile = baseline_accumulator.entity_profiles[entity]
+        profile.widen_bounds(alert, report["reasoning"])
+        # Log dismissal reason (audit trail)
+        await audit_log.record("FP_DISMISSED", alert, report)
+    async def auto_remediate(self, alert, report):
+        """Execute safe, pre-approved remediation actions."""
+        for action in report["recommended_actions"]:
+            # Double-check action is in whitelist
+            if action in SAFE_REMEDIATION_ACTIONS:
+                await execute_aws_action(action, alert)
+                await audit_log.record("AUTO_REMEDIATED", alert, action)
+            else:
+                await self.escalate_to_human(alert, report)
+                return
+    async def escalate_to_human(self, alert, report):
+        """Create case and alert human analyst."""
+        case = await thehive.create_case(
+            title=f"[{report['severity']}] {report['workflow']}: "
+                  f"{alert['eventName']} by {alert['userIdentity']['arn']}",
+            description=report["reasoning"],
+            severity=severity_to_number(report["severity"]),
+            tags=report["mitre_ttps"],
+        )
+        # Attach all evidence
+        for evidence in report["evidence"]:
+            await thehive.add_observable(case.id, evidence)
+        # Page on-call analyst for CRITICAL
+        if report["severity"] == "CRITICAL":
+            await pagerduty.trigger(case)
+```
+---
+## 10. Layer 7: Automated Remediation Actions
+### Safe Remediation Playbooks
+Actions the system can execute autonomously with high confidence:
+| Workflow | Trigger | Auto-Remediation Action | AWS API Call | Rollback |
+|----------|---------|------------------------|-------------|----------|
+| **Credential Compromise** | New access key by unauthorized user | Deactivate the new key | `iam:UpdateAccessKey(Status=Inactive)` | Re-enable key |
+| **Credential Compromise** | Console login from impossible geo | Revoke active sessions | `sts:RevokeSession` via inline deny policy | Remove deny policy |
+| **Defense Evasion** | CloudTrail logging disabled | Re-enable logging | `cloudtrail:StartLogging` | N/A |
+| **Defense Evasion** | Trail deleted | Recreate trail from saved config | `cloudtrail:CreateTrail` | N/A |
+| **Data Exfiltration** | S3 bucket policy opened to public | Restore previous bucket policy | `s3:PutBucketPolicy(saved_policy)` | N/A |
+| **IAM Policy Mod** | Admin policy attached to user | Detach the policy | `iam:DetachUserPolicy` | Re-attach policy |
+| **Network** | Security group opened to 0.0.0.0/0 | Revoke the ingress rule | `ec2:RevokeSecurityGroupIngress` | Re-add rule |
+| **Geo Anomaly** | Impossible travel detected | Enforce MFA, revoke sessions | SCP + session revocation | Remove SCP |
+### Guardrails for Auto-Remediation
+```python
+SAFE_REMEDIATION_ACTIONS = {
+    # Credential actions (reversible, low blast radius)
+    "deactivate_access_key",
+    "revoke_session",
+    "force_mfa",
+    # Logging actions (restoring security posture)
+    "enable_cloudtrail",
+    "restore_trail_config",
+    # Network actions (blocking unauthorized access)
+    "revoke_security_group_rule",
+    "restore_bucket_policy",
+    # IAM actions (removing unauthorized permissions)
+    "detach_overprivileged_policy",
+}
+# Actions that ALWAYS require human approval
+NEVER_AUTO_REMEDIATE = {
+    "terminate_instance",      # Could be production workload
+    "delete_iam_user",         # Destructive, hard to reverse
+    "modify_vpc",              # Network-wide impact
+    "modify_rds_instance",     # Data risk
+    "anything_in_production",  # Production changes need human sign-off
+}
+```
+---
+## 11. Storage Economics: Quantifying the Savings
+### Comparison Model
+Assumptions:
+- AWS environment: 10,000 active entities (users, roles, services)
+- CloudTrail volume: 500 million events/day (~500 GB/day uncompressed)
+- Anomaly rate: 0.1% of events (500,000 anomalies/day)
+- Retention: 1 year
+| Component | Traditional SIEM | Agentic SOC |
+|-----------|-----------------|-------------|
+| **Daily ingestion** | 500 GB | 0.5 GB (anomalies only) |
+| **Annual storage** | 182 TB | 182 GB + ~10 GB models |
+| **Storage cost** (S3 pricing) | ~$4,200/month | ~$4.50/month |
+| **SIEM license** (Splunk-class) | ~$100K-500K/year | $0 (self-built) |
+| **Compute (detection)** | Query over stored data | Streaming (real-time) |
+| **Latency to detect** | Minutes to hours | Milliseconds to seconds |
+| **LLM costs** (triage only anomalies) | N/A | ~$50-200/day* |
+*LLM cost estimate: 500K anomalies/day × 1K tokens avg × $0.15/1M tokens (GPT-4o-mini) = ~$75/day
+### Total Cost of Ownership (Annual)
+| | Traditional SIEM | Agentic SOC |
+|---|---|---|
+| Storage | $50,000 | $55 |
+| SIEM License | $200,000 | $0 |
+| Compute | $30,000 | $15,000 |
+| LLM API | $0 | $25,000 |
+| Analyst time (reduced) | $500,000 (5 FTE) | $200,000 (2 FTE) |
+| **Total** | **~$780,000** | **~$240,000** |
+| **Savings** | — | **~70%** |
+---
+## 12. CloudTrail → MITRE ATT&CK Mapping Reference
+### Initial Access (TA0001)
+| CloudTrail Event | ATT&CK Technique | Description |
+|-----------------|-------------------|-------------|
+| `ConsoleLogin` (from unusual IP) | T1078.004 — Cloud Accounts | Valid account used from unexpected location |
+| `ConsoleLogin` (errorCode=Failed) | T1110 — Brute Force | Multiple failed login attempts |
+| `GetFederationToken` | T1078.004 | Federation token for unauthorized access |
+### Persistence (TA0003)
+| CloudTrail Event | ATT&CK Technique | Description |
+|-----------------|-------------------|-------------|
+| `CreateAccessKey` | T1098.001 — Additional Cloud Credentials | Backdoor access key created |
+| `CreateLoginProfile` | T1098.001 | Console access added to service account |
+| `CreateUser` | T1136.003 — Cloud Account | New IAM user for persistence |
+| `PutRolePolicy` (trust policy) | T1098.003 — Additional Cloud Roles | Cross-account trust modified |
+| `CreateFunction` (Lambda) | T1525 — Implant Internal Image | Serverless backdoor |
+### Privilege Escalation (TA0004)
+| CloudTrail Event | ATT&CK Technique | Description |
+|-----------------|-------------------|-------------|
+| `AttachUserPolicy` (AdminAccess) | T1078.004 — Cloud Accounts | Granting admin to non-admin user |
+| `AssumeRole` (to admin role) | T1548 — Abuse Elevation Control | Assuming higher-privilege role |
+| `PutUserPolicy` (iam:*) | T1078.004 | Granting IAM modification permissions |
+| `UpdateAssumeRolePolicy` | T1548 | Modifying who can assume a role |
+| `iam:PassRole` | T1548 | Passing admin role to service |
+### Defense Evasion (TA0005)
+| CloudTrail Event | ATT&CK Technique | Description |
+|-----------------|-------------------|-------------|
+| `StopLogging` | T1562.008 — Disable Cloud Logs | **CRITICAL** — Disabling audit trail |
+| `DeleteTrail` | T1562.008 | **CRITICAL** — Deleting audit trail |
+| `UpdateTrail` (S3 bucket change) | T1562.008 | Redirecting logs to attacker bucket |
+| `PutEventSelectors` (exclude events) | T1562.008 | Filtering out attacker's events |
+| `DisableAlarmActions` | T1562 | Disabling CloudWatch alarms |
+| `DeleteFlowLogs` | T1562.008 | Removing network logging |
+### Credential Access (TA0006)
+| CloudTrail Event | ATT&CK Technique | Description |
+|-----------------|-------------------|-------------|
+| `GetSecretValue` | T1555 — Credentials from Password Stores | Secrets Manager access |
+| `GetParametersByPath` (/password*) | T1555 | SSM Parameter Store credentials |
+| `GetPasswordData` | T1552.001 — Credentials In Files | EC2 Windows password retrieval |
+| `CreateAccessKey` (for other user) | T1528 — Steal Application Access Token | Creating keys for another user |
+### Discovery (TA0007)
+| CloudTrail Event | ATT&CK Technique | Description |
+|-----------------|-------------------|-------------|
+| `DescribeInstances` (broad) | T1580 — Cloud Infrastructure Discovery | Enumerating EC2 instances |
+| `ListBuckets` + `GetBucketAcl` | T1580 | Enumerating S3 buckets and permissions |
+| `ListUsers` + `ListRoles` | T1087.004 — Cloud Account Discovery | Enumerating IAM entities |
+| `GetCallerIdentity` | T1087.004 | Who am I check (post-compromise) |
+| `DescribeSecurityGroups` | T1580 | Network enumeration |
+### Collection & Exfiltration (TA0009 / TA0010)
+| CloudTrail Event | ATT&CK Technique | Description |
+|-----------------|-------------------|-------------|
+| `GetObject` (high volume) | T1530 — Data from Cloud Storage Object | Mass S3 download |
+| `CopyObject` (cross-account) | T1537 — Transfer to Cloud Account | Data moved to external account |
+| `CreateSnapshot` + `ModifySnapshotAttribute` | T1537 | EBS snapshot shared externally |
+| `PutBucketPolicy` (public access) | T1537 | S3 bucket opened for exfiltration |
+### Impact (TA0040)
+| CloudTrail Event | ATT&CK Technique | Description |
+|-----------------|-------------------|-------------|
+| `TerminateInstances` | T1485 — Data Destruction | Destroying compute resources |
+| `DeleteBucket` | T1485 | Destroying storage resources |
+| `RunInstances` (crypto mining) | T1496 — Resource Hijacking | Unauthorized compute usage |
+| `PutBucketEncryption` (attacker key) | T1486 — Data Encrypted for Impact | Ransomware via re-encryption |
+---
+## 13. Open-Source Building Blocks
+### Recommended Stack
+```
+┌─────────────────────────────────────────────────────────────┐
+│                    PRODUCTION STACK                          │
+├─────────────────────────────────────────────────────────────┤
+│                                                             │
+│  INGESTION:                                                 │
+│  ├── AWS CloudTrail → S3 → SQS → Consumer                 │
+│  ├── awslabs/mcp CloudTrail MCP Server (official AWS)      │
+│  └── Wazuh (SIEM - native CloudTrail module)               │
+│                                                             │
+│  BASELINE & ANOMALY DETECTION:                              │
+│  ├── PySAD (streaming anomaly detection library)           │
+│  │     pip install pysad                                    │
+│  │     Models: HalfSpaceTrees, xStream, LODA              │
+│  ├── River (online ML with drift detection)                │
+│  │     pip install river                                    │
+│  │     Models: ADWIN, HalfSpaceTrees                       │
+│  └── Custom: EMA profiles, Count-Min Sketch, MIDAS         │
+│                                                             │
+│  MULTI-AGENT ORCHESTRATION:                                 │
+│  ├── LangGraph (stateful multi-agent pipelines)   ⭐ #1    │
+│  │     pip install langgraph                                │
+│  │     Features: cycles, human-in-loop, checkpointing      │
+│  ├── CrewAI (role-based agents)                   #2       │
+│  └── AutoGen (conversational agents)              #3       │
+│                                                             │
+│  LLM MODELS:                                                │
+│  ├── GPT-4o-mini / Claude Haiku (orchestration - cheap)    │
+│  ├── GPT-4o / Claude Sonnet (reasoning - quality)          │
+│  ├── Gemini 2.5 Flash (best cost/quality)                  │
+│  ├── Llama 4 Maverick 17B (best open-source)               │
+│  └── HassanShehata/logem 0.6B (log parsing - local)       │
+│                                                             │
+│  EMBEDDINGS:                                                │
+│  ├── all-mpnet-base-v2 (CTI document retrieval)            │
+│  ├── cisco-ai/SecureBERT2.0-base (security NER/embed)     │
+│  └── Chroma / Milvus (vector store)                        │
+│                                                             │
+│  CLOUD SECURITY TOOLS (as agent tools):                    │
+│  ├── Prowler + MCP Server (500+ AWS checks)               │
+│  │     pip install prowler-mcp                              │
+│  ├── Steampipe (SQL over cloud APIs)                       │
+│  │     steampipe plugin install aws                         │
+│  └── AWS SDK (boto3 - remediation actions)                 │
+│                                                             │
+│  CASE MANAGEMENT & SOAR:                                    │
+│  ├── TheHive (case management, evidence)                   │
+│  ├── Shuffle SOAR (playbook automation)                    │
+│  └── Custom LangGraph interrupt (human approval gate)      │
+│                                                             │
+│  CTI FEEDS:                                                 │
+│  ├── MITRE ATT&CK (via taxii2 / stix2)                   │
+│  ├── NVD/CVE (via nvdlib)                                  │
+│  ├── CISA KEV (JSON feed)                                  │
+│  ├── AbuseIPDB (API)                                       │
+│  └── VirusTotal (API)                                      │
+│                                                             │
+└─────────────────────────────────────────────────────────────┘
+```
+### Key Libraries & Versions
+```bash
+# Core pipeline
+pip install langgraph langchain langchain-openai
+pip install pysad river
+pip install boto3 botocore
+# Embeddings & RAG
+pip install sentence-transformers chromadb
+# CTI integration
+pip install stix2 taxii2-client nvdlib
+# Security tools
+pip install prowler-mcp
+pip install awslabs.cloudtrail-mcp-server
+# Log parsing (optional)
+pip install transformers torch  # for logem model
+# Monitoring
+pip install trackio
+```
+---
+## 14. Implementation Roadmap
+### Phase 1: Foundation (Weeks 1-2)
+**Goal:** Ingest CloudTrail logs and build behavioral baselines
+- [ ] Set up CloudTrail → S3 → SQS pipeline
+- [ ] Implement feature extraction from CloudTrail JSON events
+- [ ] Build per-entity statistical profiles (EMA + z-score)
+- [ ] Implement Count-Min Sketch for burst detection
+- [ ] Deploy Online Isolation Forest (PySAD) for structural anomaly detection
+- [ ] Build composite scoring and threshold calibration
+- [ ] Implement the "store anomaly, discard normal" decision logic
+- [ ] Validate on simulated data: generate normal + attack patterns
+**Deliverable:** Streaming baseline system that correctly separates normal from anomalous CloudTrail events with <0.1% FPR
+### Phase 2: Multi-Agent Triage (Weeks 3-4)
+**Goal:** Build the LLM-powered investigation pipeline
+- [ ] Implement LangGraph state machine with 4 agent nodes
+- [ ] Define workflow classifications (8 CloudTrail attack patterns)
+- [ ] Build evidence acquisition tools (CloudTrail query, IAM lookup, IP reputation)
+- [ ] Implement symbolic verifier with deterministic rules
+- [ ] Build reasoning agent with structured output schema
+- [ ] Implement the three-way verdict logic
+- [ ] Test with known attack patterns (use CloudAnoBench)
+**Deliverable:** Multi-agent pipeline that correctly triages CloudTrail anomalies into FP/TP-Auto/TP-Escalate
+### Phase 3: Enrichment & Intelligence (Weeks 5-6)
+**Goal:** Add threat intelligence and MITRE ATT&CK mapping
+- [ ] Load MITRE ATT&CK Cloud matrix into vector store
+- [ ] Build CTI feed ingestion (NVD, CISA KEV, AbuseIPDB)
+- [ ] Implement CloudTrail → ATT&CK TTP mapping (Section 12)
+- [ ] Build RAG enrichment pipeline
+- [ ] Integrate with Prowler MCP for posture context
+- [ ] Test enrichment quality against known CVE/attack scenarios
+**Deliverable:** Alerts enriched with TTPs, CVE context, IP reputation, and severity scoring
+### Phase 4: Automated Response (Weeks 7-8)
+**Goal:** Close the loop with safe auto-remediation
+- [ ] Implement safe remediation actions (credential, logging, network)
+- [ ] Build guardrail framework (whitelist, blast radius check, rollback)
+- [ ] Integrate with TheHive for case management (escalations)
+- [ ] Build audit trail for all actions taken
+- [ ] Implement feedback loop: FP dismissals widen baseline
+- [ ] Deploy human-in-the-loop approval gate (LangGraph interrupt)
+- [ ] Red team testing: simulate multi-stage attacks
+**Deliverable:** End-to-end autonomous SOC for CloudTrail with safe auto-remediation
+### Phase 5: Expansion & Optimization (Ongoing)
+**Goal:** Add more data sources, reduce false positives, increase automation
+- [ ] Add VPC Flow Logs (network anomaly detection)
+- [ ] Add GuardDuty findings (correlation)
+- [ ] Add email/phishing logs (cross-domain correlation)
+- [ ] Add endpoint logs (EDR integration)
+- [ ] Fine-tune classification model on accumulated triage data (AACT approach)
+- [ ] Implement ADWIN drift detection for baseline updates
+- [ ] Build dashboards and reporting
+- [ ] Measure and optimize: FPR, MTTR, auto-resolution rate
+---
+## 15. Research Papers & References
+### Core Architecture Papers
+| Paper | arxiv | Year | Key Contribution |
+|-------|-------|------|------------------|
+| **CORTEX** — Collaborative LLM Agents for Alert Triage | 2510.00311 | 2024 | Multi-agent SOC architecture, F1=0.78 |
+| **AACT** — Automated Alert Classification | 2505.09843 | 2025 | 61% alert reduction in production, behavioral profiling |
+| **CloudAnoAgent** — Cloud Anomaly Detection | 2508.01844 | 2025 | Fast/slow detection + symbolic verifier |
+| **CyberRAG** — Agentic RAG for Attack Classification | 2507.02424 | 2025 | 94.92% accuracy, specialist + RAG |
+| **OpsAgent** — Self-Evolving Multi-Agent | 2510.24145 | 2025 | +46.63% on incident management |
+| **ExCyTIn-Bench** — LLM Agent Evaluation | 2507.14201 | 2025 | Best models for security investigation |
+### Baseline & Anomaly Detection Papers
+| Paper | arxiv | Year | Key Contribution |
+|-------|-------|------|------------------|
+| **DyMETER** — Dynamic Concept Adaptation | 2604.14726 | 2026 | AUCROC 0.906-0.991, concept drift handling |
+| **Online-iForest** — Streaming Isolation Forest | 2505.09593 | 2025 | 5-8× faster than HST, AUC 0.998 |
+| **MemStream** — Memory-Based Streaming Detection | 2106.03837 | 2022 | Fixed-size memory, AUCROC 0.988 |
+| **MIDAS** — Count-Min Sketch for Edge Streams | 1911.04464 | 2020 | O(1) per event, 50KB memory |
+| **LogBERT** — Self-Supervised Log Anomaly Detection | 2103.04475 | 2021 | Masked log key prediction, hypersphere loss |
+| **LogLLM** — BERT+Llama Log Anomaly Detection | 2411.08561 | 2024 | F1=0.97, no log parser required |
+### Threat Intelligence & Enrichment
+| Paper | arxiv | Year | Key Contribution |
+|-------|-------|------|------------------|
+| **LLM-Assisted Proactive CTI** | 2504.00428 | 2025 | RAG over CTI feeds, real-time enrichment |
+| **IBM Cloud Telemetry** | 2411.09047 | 2024 | 10,000× compression ratio for detection |
+### Frameworks & Tools
+| Tool | Source | Purpose |
+|------|--------|---------|
+| **LangGraph** | langchain-ai/langgraph | Multi-agent orchestration |
+| **PySAD** | selimfirat/pysad | Streaming anomaly detection |
+| **River** | online-ml/river | Online ML + drift detection |
+| **Prowler MCP** | prowler-cloud/prowler | AWS security checks via LLM |
+| **CloudTrail MCP** | awslabs/mcp | AWS CloudTrail LLM interface |
+| **logem** | HassanShehata/logem | Log field extraction (0.6B) |
+| **SecureBERT 2.0** | cisco-ai/SecureBERT2.0-base | Security embeddings |
+| **Wazuh** | wazuh/wazuh | Open-source SIEM with CloudTrail support |
+| **TheHive** | TheHive-Project/TheHive | Case management |
+| **Shuffle SOAR** | Shuffle/Shuffle | Security orchestration |
+### HuggingFace Datasets for Development
+| Dataset | HF ID / Source | Size | Use |
+|---------|---------------|------|-----|
+| **CloudAnoBench** | jayzou3773.github.io | 1,252 cases | Cloud anomaly detection eval |
+| **ACSE-Eval** | ACSE-Eval/ACSE-Eval | 100 AWS scenarios | AWS threat modeling |
+| **AIT Log Dataset** | Austrian Inst. Tech | 8 networks, 3 weeks | Multi-step attack simulation |
+| **BGL/HDFS** | logpai/loghub | Millions of entries | Log anomaly detection baselines |
+| **NSL-KDD** | rgaidot/nsl-kdd | 125K+ entries | Network intrusion detection |
+---
+## Appendix A: Quick-Start Prototype
+A minimal end-to-end prototype you can run today:
+```python
+"""
+Agentic SOC Quick-Start Prototype
+Requires: pip install boto3 langgraph langchain-openai pysad river
+"""
+import json
+import boto3
+from collections import defaultdict
+from pysad.models import HalfSpaceTrees
+from river import drift
+# ── Layer 1: Ingest CloudTrail ──────────────────────────────
+def consume_cloudtrail_events(queue_url: str):
+    """Pull CloudTrail events from SQS queue."""
+    sqs = boto3.client('sqs')
+    while True:
+        response = sqs.receive_message(
+            QueueUrl=queue_url,
+            MaxNumberOfMessages=10,
+            WaitTimeSeconds=20,
+        )
+        for msg in response.get('Messages', []):
+            events = json.loads(msg['Body']).get('Records', [])
+            for event in events:
+                yield event
+            sqs.delete_message(
+                QueueUrl=queue_url,
+                ReceiptHandle=msg['ReceiptHandle']
+            )
+# ── Layer 2: Baseline Accumulator ───────────────────────────
+class SimpleBaseline:
+    def __init__(self):
+        self.profiles = defaultdict(lambda: {
+            'count': 0, 'api_freq': defaultdict(int),
+            'mu': defaultdict(float), 'var': defaultdict(lambda: 1.0)
+        })
+        self.model = HalfSpaceTrees(n_trees=25, max_depth=15, window_size=250)
+        self.anomaly_store = []
+    def process(self, event):
+        arn = event.get('userIdentity', {}).get('arn', 'unknown')
+        profile = self.profiles[arn]
+        profile['count'] += 1
+        profile['api_freq'][event['eventName']] += 1
+        features = [
+            hash(event['eventName']) % 1000,
+            hash(event.get('sourceIPAddress', '')) % 1000,
+            int(event.get('eventTime', '2024-01-01T12:00:00Z')[11:13]),
+            1 if event.get('errorCode') else 0,
+        ]
+        score = self.model.fit_score_partial(features)
+        if score > 0.7 and profile['count'] > 100:
+            self.anomaly_store.append(event)
+            return True, score  # ANOMALY — store
+        return False, score     # NORMAL — discard
+# ── Layer 3: LLM Triage (simplified) ────────────────────────
+from langchain_openai import ChatOpenAI
+llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
+def triage_anomaly(event, score):
+    prompt = f"""You are a SOC analyst. Analyze this CloudTrail anomaly:
+Event: {event['eventName']}
+User: {event.get('userIdentity', {}).get('arn', 'unknown')}
+Source IP: {event.get('sourceIPAddress', 'unknown')}
+Region: {event.get('awsRegion', 'unknown')}
+Error: {event.get('errorCode', 'none')}
+Anomaly Score: {score:.2f}
+Classify as: FALSE_POSITIVE, TRUE_POSITIVE_AUTO, or TRUE_POSITIVE_ESCALATE
+Provide: verdict, confidence (0-1), reasoning, mitre_ttp, severity, recommended_action
+Respond in JSON format."""
+    response = llm.invoke(prompt)
+    return json.loads(response.content)
+# ── Main Loop ───────────────────────────────────────────────
+baseline = SimpleBaseline()
+for event in consume_cloudtrail_events("YOUR_SQS_QUEUE_URL"):
+    is_anomaly, score = baseline.process(event)
+    if is_anomaly:
+        # Only anomalies reach the LLM — saves cost
+        report = triage_anomaly(event, score)
+        if report['verdict'] == 'FALSE_POSITIVE':
+            print(f"FP dismissed: {event['eventName']}")
+        elif report['verdict'] == 'TRUE_POSITIVE_AUTO':
+            print(f"AUTO-REMEDIATE: {report['recommended_action']}")
+        else:
+            print(f"ESCALATE: {report['reasoning']}")
+    # Normal events: already discarded by baseline.process()
+```
+---
+## Appendix B: Key Design Decisions & Rationale
+| Decision | Choice | Rationale |
+|----------|--------|-----------|
+| Multi-agent vs single-agent | Multi-agent (4 stages) | CORTEX shows +12 F1 points, -10.7% FPR |
+| Baseline storage | Model only, no raw logs | 10,000× storage reduction (IBM study) |
+| Anomaly detection | 3-tier cascade (stats → sketch → forest) | Each catches different patterns; composite is robust |
+| LLM for all events vs anomalies only | Anomalies only | 99.9% events are normal — LLM on all would cost 1000× more |
+| Auto-remediate threshold | Confidence > 0.9 + symbolic verify | Conservative by design; false auto-remediation is catastrophic |
+| Drift detection | ADWIN per entity | Employees change roles; static baselines decay |
+| Agent framework | LangGraph | Only framework with cycles + human-in-loop + checkpointing |
+| Primary LLM | GPT-4o-mini (triage), GPT-4o (reasoning) | Cost/quality balance; replace with Llama 4 for on-prem |
+| Vector store | Chroma | Simple to start; migrate to Milvus at scale |
+| Case management | TheHive | Open-source, rich API, evidence management |
+---
+*Document prepared: April 2026*
+*Based on literature review of 15+ research papers (2020-2026) and survey of 20+ open-source tools*
+*Architecture validated against: CORTEX (arxiv:2510.00311), AACT (arxiv:2505.09843), CloudAnoAgent (arxiv:2508.01844)*