cosmicmicra commited on
Commit
b3eddb1
Β·
verified Β·
1 Parent(s): 5cc4b50

Add full research document

Browse files
Files changed (1) hide show
  1. agentic-soc-research.md +1513 -0
agentic-soc-research.md ADDED
@@ -0,0 +1,1513 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Agentic SOC: Autonomous Security Operations Center
2
+ ## Research & Architecture Design Document
3
+
4
+ **Objective:** Design a fully autonomous Security Operations Center powered by LLM-based reasoning agents, starting with AWS CloudTrail log ingestion. The system builds behavioral baselines without storing raw logs, detects anomalies, enriches alerts with threat intelligence and TTPs, classifies true/false positives, and either auto-remediates or escalates to humans.
5
+
6
+ **Core Innovation:** Store the *model*, not the *data*. Normal logs update the baseline model and are discarded. Only anomaly logs are retained for investigation. This reduces storage costs by orders of magnitude compared to traditional SIEMs.
7
+
8
+ ---
9
+
10
+ ## Table of Contents
11
+
12
+ 1. [Problem Statement & Vision](#1-problem-statement--vision)
13
+ 2. [Why Traditional SIEMs Fail](#2-why-traditional-siems-fail)
14
+ 3. [System Architecture Overview](#3-system-architecture-overview)
15
+ 4. [Layer 1: CloudTrail Ingestion & Feature Extraction](#4-layer-1-cloudtrail-ingestion--feature-extraction)
16
+ 5. [Layer 2: Baseline Accumulation (Store Model, Not Logs)](#5-layer-2-baseline-accumulation-store-model-not-logs)
17
+ 6. [Layer 3: Anomaly Detection & Scoring](#6-layer-3-anomaly-detection--scoring)
18
+ 7. [Layer 4: Multi-Agent Triage Pipeline](#7-layer-4-multi-agent-triage-pipeline)
19
+ 8. [Layer 5: Threat Intelligence Enrichment & TTP Mapping](#8-layer-5-threat-intelligence-enrichment--ttp-mapping)
20
+ 9. [Layer 6: Verdict & Response (The Three-Way Decision)](#9-layer-6-verdict--response-the-three-way-decision)
21
+ 10. [Layer 7: Automated Remediation Actions](#10-layer-7-automated-remediation-actions)
22
+ 11. [Storage Economics: Quantifying the Savings](#11-storage-economics-quantifying-the-savings)
23
+ 12. [CloudTrail β†’ MITRE ATT&CK Mapping Reference](#12-cloudtrail--mitre-attck-mapping-reference)
24
+ 13. [Open-Source Building Blocks](#13-open-source-building-blocks)
25
+ 14. [Implementation Roadmap](#14-implementation-roadmap)
26
+ 15. [Research Papers & References](#15-research-papers--references)
27
+
28
+ ---
29
+
30
+ ## 1. Problem Statement & Vision
31
+
32
+ ### The Speed Gap
33
+
34
+ Attackers using agentic AI systems can discover and exploit vulnerabilities at machine speed. A human SOC analyst processing 50-100 alerts/day cannot match an adversary generating thousands of attack variations per hour. The only defense that scales is an agentic defense β€” AI systems that detect, investigate, and respond at the same speed threats are delivered.
35
+
36
+ ### The Vision: End-to-End Autonomous SOC
37
+
38
+ ```
39
+ CloudTrail Event Stream (thousands/second)
40
+ β”‚
41
+ β–Ό
42
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
43
+ β”‚ DETECT: Statistical baseline + ML scoring β”‚ ← No raw log storage
44
+ β”‚ (milliseconds per event) β”‚
45
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
46
+ β”‚ anomaly detected
47
+ β–Ό
48
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
49
+ β”‚ INVESTIGATE: Multi-agent LLM reasoning β”‚ ← Store only anomaly logs
50
+ β”‚ Enrich β†’ Classify β†’ Map TTPs β†’ Verdict β”‚
51
+ β”‚ (seconds per alert) β”‚
52
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
53
+ β”‚
54
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
55
+ β–Ό β–Ό β–Ό
56
+ FALSE POS AUTO-ACT ESCALATE
57
+ (dismiss) (remediate) (human)
58
+ ```
59
+
60
+ ### Three Decision Outcomes
61
+
62
+ Every alert terminates in exactly one of three states:
63
+
64
+ | Outcome | Condition | Action |
65
+ |---------|-----------|--------|
66
+ | **False Positive** | Alert does not represent a real threat | Dismiss; update baseline to widen normal bounds |
67
+ | **True Positive β€” Auto-Remediate** | Real threat; known remediation within safe parameters | Execute automated response (revoke creds, isolate, block) |
68
+ | **True Positive β€” Escalate** | Real threat; unknown or risky remediation | Alert human analyst with full investigation report |
69
+
70
+ The system's value scales with the percentage of alerts that can be confidently resolved without human intervention.
71
+
72
+ ---
73
+
74
+ ## 2. Why Traditional SIEMs Fail
75
+
76
+ ### The Storage Trap
77
+
78
+ Traditional SIEMs (Splunk, Elastic, QRadar, Sentinel) follow a **store-then-query** model:
79
+
80
+ ```
81
+ All Logs β†’ Index β†’ Store (90-365 days) β†’ Query for anomalies
82
+ ```
83
+
84
+ **Problems:**
85
+ - **Cost:** Enterprise CloudTrail generates 100M-1B+ events/day. At $2-5/GB ingestion (Splunk pricing), costs reach $50K-500K+/month just for CloudTrail
86
+ - **Latency:** Detection queries run against stored data β€” minutes to hours of delay
87
+ - **Noise:** 99.9%+ of stored logs are normal activity that will never be queried
88
+ - **Context Window:** Analysts drown in data. A single investigation might require correlating events across millions of log entries
89
+
90
+ ### The IBM Insight
91
+
92
+ IBM Cloud research (arxiv:2411.09047) demonstrated: from 413 million raw telemetry rows collected over 4.5 months for a single system, only **39,000 rows of aggregated statistics** were needed for anomaly detection β€” a **10,000Γ— compression ratio**. The raw data served no purpose beyond computing the statistics.
93
+
94
+ ### Our Approach: Accumulate the Baseline, Discard the Logs
95
+
96
+ ```
97
+ Event β†’ Extract Features β†’ Update Baseline Model β†’ Discard Event
98
+ β”‚
99
+ Is this anomalous?
100
+ β”œβ”€β”€ No β†’ event discarded (baseline updated)
101
+ └── Yes β†’ event STORED for investigation
102
+ ```
103
+
104
+ **Storage model:** O(entities Γ— model_size) instead of O(events Γ— retention_period)
105
+
106
+ For a typical AWS environment with 10,000 entities (users, roles, services) and 1MB per entity model:
107
+ - Our approach: **~10 GB** (constant, regardless of time)
108
+ - Traditional SIEM: **10-100+ TB/year** (linear growth)
109
+
110
+ ---
111
+
112
+ ## 3. System Architecture Overview
113
+
114
+ ```
115
+ ═══════════════════════════════════════════════════════════════════════════
116
+ AGENTIC SOC ARCHITECTURE
117
+ ═══════════════════════════════════════════════════════════════════════════
118
+
119
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
120
+ β”‚ LAYER 1: DATA INGESTION β”‚
121
+ β”‚ β”‚
122
+ β”‚ AWS CloudTrail ──→ S3 Bucket ──→ SQS Queue ──→ Event Consumer β”‚
123
+ β”‚ β”‚ β”‚ β”‚
124
+ β”‚ β”‚ (Future: VPC Flow Logs, GuardDuty, β”‚ β”‚
125
+ β”‚ β”‚ Email/Phishing Logs, Endpoint Logs) β”‚ β”‚
126
+ β”‚ β”‚ β–Ό β”‚
127
+ β”‚ └──────────────────────────────────────→ Feature Extractor β”‚
128
+ β”‚ (logem 0.6B model β”‚
129
+ β”‚ + rule-based parser) β”‚
130
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
131
+ β”‚ feature_vector + raw_event
132
+ β–Ό
133
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
134
+ β”‚ LAYER 2: BASELINE ACCUMULATOR β”‚
135
+ β”‚ β”‚
136
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
137
+ β”‚ β”‚ Per-Entity Profiles β”‚ β”‚ Count-Min Sketch β”‚ β”‚ Online iForest β”‚ β”‚
138
+ β”‚ β”‚ (EMA ΞΌ/Οƒ per feature)β”‚ β”‚ (frequency/burst) β”‚ β”‚ (structural) β”‚ β”‚
139
+ β”‚ β”‚ ~1KB per entity β”‚ β”‚ ~80KB total β”‚ β”‚ ~2MB total β”‚ β”‚
140
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
141
+ β”‚ β”‚ β”‚ β”‚ β”‚
142
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
143
+ β”‚ β”‚ β”‚
144
+ β”‚ Composite Anomaly Score β”‚
145
+ β”‚ score > threshold? β”‚
146
+ β”‚ β”‚ β”‚ β”‚
147
+ β”‚ NO YES β”‚
148
+ β”‚ β”‚ β”‚ β”‚
149
+ β”‚ Update baseline Store anomaly log ──→ Anomaly Store β”‚
150
+ β”‚ Discard raw log β”‚
151
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
152
+ β”‚ anomaly event
153
+ β–Ό
154
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
155
+ β”‚ LAYER 3: MULTI-AGENT TRIAGE PIPELINE β”‚
156
+ β”‚ (LangGraph Orchestration) β”‚
157
+ β”‚ β”‚
158
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
159
+ β”‚ β”‚ Orchestrator │──→│ Behavior Analysis │──→│ Evidence Acquisition β”‚ β”‚
160
+ β”‚ β”‚ Agent β”‚ β”‚ Agent β”‚ β”‚ Agents (per-workflow) β”‚ β”‚
161
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
162
+ β”‚ β”‚ Routes alert β”‚ β”‚ Classifies into: β”‚ β”‚ Tools: β”‚ β”‚
163
+ β”‚ β”‚ Controls flowβ”‚ β”‚ β€’ CredChange β”‚ β”‚ β€’ queryCloudTrail() β”‚ β”‚
164
+ β”‚ β”‚ Consistency β”‚ β”‚ β€’ IAMPolicyMod β”‚ β”‚ β€’ getIAMUser() β”‚ β”‚
165
+ β”‚ β”‚ checks β”‚ β”‚ β€’ GeoAnomaly β”‚ β”‚ β€’ lookupIP() β”‚ β”‚
166
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β€’ UnusualAPI β”‚ β”‚ β€’ getAssetRecord() β”‚ β”‚
167
+ β”‚ β”‚ β€’ DataExfil β”‚ β”‚ β€’ queryAthena() β”‚ β”‚
168
+ β”‚ β”‚ β€’ PrivEsc β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
169
+ β”‚ β”‚ β€’ Recon β”‚ β”‚ β”‚
170
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β–Ό β”‚
171
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
172
+ β”‚ β”‚ Symbolic Verifier β”‚ β”‚
173
+ β”‚ β”‚ (deterministic rules β”‚ β”‚
174
+ β”‚ β”‚ to ground LLM output)β”‚ β”‚
175
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
176
+ β”‚ β–Ό β”‚
177
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
178
+ β”‚ β”‚ Reasoning & Synthesis β”‚ β”‚
179
+ β”‚ β”‚ Agent β”‚ β”‚
180
+ β”‚ β”‚ β”‚ β”‚
181
+ β”‚ β”‚ + RAG CTI Enrichment β”‚ β”‚
182
+ β”‚ β”‚ + MITRE ATT&CK Map β”‚ β”‚
183
+ β”‚ β”‚ + Severity Scoring β”‚ β”‚
184
+ β”‚ β”‚ β†’ Structured Report β”‚ β”‚
185
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
186
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
187
+ β”‚
188
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
189
+ β–Ό β–Ό β–Ό
190
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
191
+ β”‚FALSE POSITIVEβ”‚ β”‚TRUE POS: AUTO-ACTβ”‚ β”‚TRUE POS: ESCALATEβ”‚
192
+ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
193
+ β”‚Update baselineβ”‚ β”‚Execute playbook: β”‚ β”‚Create case in β”‚
194
+ β”‚Widen normal β”‚ β”‚β€’ Revoke creds β”‚ β”‚TheHive β”‚
195
+ β”‚bounds β”‚ β”‚β€’ Block IP β”‚ β”‚Page analyst β”‚
196
+ β”‚Log dismissal β”‚ β”‚β€’ Isolate instanceβ”‚ β”‚Full report + β”‚
197
+ β”‚reason β”‚ β”‚β€’ Revert IAM β”‚ β”‚evidence attached β”‚
198
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
199
+ ```
200
+
201
+ ---
202
+
203
+ ## 4. Layer 1: CloudTrail Ingestion & Feature Extraction
204
+
205
+ ### CloudTrail Event Schema
206
+
207
+ Every AWS API call generates a CloudTrail event with this structure:
208
+
209
+ ```json
210
+ {
211
+ "eventVersion": "1.08",
212
+ "userIdentity": {
213
+ "type": "IAMUser | AssumedRole | Root | FederatedUser | AWSService",
214
+ "principalId": "AIDACKCEVSQ6C2EXAMPLE",
215
+ "arn": "arn:aws:iam::123456789012:user/alice",
216
+ "accountId": "123456789012",
217
+ "accessKeyId": "ASIAIOSFODNN7EXAMPLE",
218
+ "userName": "alice",
219
+ "sessionContext": {
220
+ "mfaAuthenticated": "true",
221
+ "creationDate": "2024-01-15T10:30:00Z"
222
+ }
223
+ },
224
+ "eventTime": "2024-01-15T14:22:33Z",
225
+ "eventSource": "iam.amazonaws.com",
226
+ "eventName": "CreateAccessKey",
227
+ "awsRegion": "us-east-1",
228
+ "sourceIPAddress": "203.0.113.50",
229
+ "userAgent": "aws-cli/2.13.0 Python/3.11.4",
230
+ "requestParameters": {
231
+ "userName": "bob"
232
+ },
233
+ "responseElements": {
234
+ "accessKey": {
235
+ "accessKeyId": "AKIAIOSFODNN7EXAMPLE",
236
+ "status": "Active",
237
+ "userName": "bob"
238
+ }
239
+ },
240
+ "errorCode": null,
241
+ "errorMessage": null,
242
+ "readOnly": false,
243
+ "eventType": "AwsApiCall",
244
+ "managementEvent": true,
245
+ "recipientAccountId": "123456789012"
246
+ }
247
+ ```
248
+
249
+ ### Feature Extraction Pipeline
250
+
251
+ Transform each raw CloudTrail JSON event into a numerical feature vector for the baseline model:
252
+
253
+ ```python
254
+ def extract_features(event: dict) -> dict:
255
+ """Extract security-relevant features from a CloudTrail event."""
256
+
257
+ identity = event.get("userIdentity", {})
258
+
259
+ return {
260
+ # Identity features
261
+ "principal_hash": hash(identity.get("arn", "")),
262
+ "identity_type": encode_category(identity.get("type")),
263
+ "mfa_authenticated": 1 if identity.get("sessionContext", {})
264
+ .get("mfaAuthenticated") == "true" else 0,
265
+
266
+ # Action features
267
+ "event_source_hash": hash(event.get("eventSource", "")),
268
+ "event_name_hash": hash(event.get("eventName", "")),
269
+ "is_write_event": 0 if event.get("readOnly") else 1,
270
+ "is_management_event": 1 if event.get("managementEvent") else 0,
271
+ "has_error": 1 if event.get("errorCode") else 0,
272
+ "error_code_hash": hash(event.get("errorCode", "")),
273
+
274
+ # Context features
275
+ "hour_of_day": parse_hour(event["eventTime"]),
276
+ "day_of_week": parse_dow(event["eventTime"]),
277
+ "region_hash": hash(event.get("awsRegion", "")),
278
+ "source_ip_hash": hash(event.get("sourceIPAddress", "")),
279
+ "user_agent_hash": hash(event.get("userAgent", "")),
280
+
281
+ # Behavioral features (computed from recent window)
282
+ "api_calls_last_5min": count_recent(identity["arn"], minutes=5),
283
+ "unique_services_last_hour": count_unique_services(identity["arn"], hours=1),
284
+ "unique_regions_last_hour": count_unique_regions(identity["arn"], hours=1),
285
+ "error_rate_last_hour": error_rate(identity["arn"], hours=1),
286
+ "new_api_call": is_first_time_api(identity["arn"], event["eventName"]),
287
+ }
288
+ ```
289
+
290
+ ### Using logem (0.6B) for Structured Extraction
291
+
292
+ For complex or non-standard log formats (future expansion beyond CloudTrail), use the fine-tuned `HassanShehata/logem` model:
293
+
294
+ ```python
295
+ from transformers import AutoTokenizer, AutoModelForCausalLM
296
+
297
+ # 0.6B params, 396MB quantized β€” runs on CPU
298
+ tokenizer = AutoTokenizer.from_pretrained("HassanShehata/logem")
299
+ model = AutoModelForCausalLM.from_pretrained("HassanShehata/logem")
300
+
301
+ # Achieves F1=0.833 on SIEM field extraction (beats Gemma 12B)
302
+ ```
303
+
304
+ For CloudTrail specifically, rule-based JSON parsing is faster and deterministic. Reserve the LLM parser for unstructured logs (syslog, application logs, email headers) when the system expands.
305
+
306
+ ---
307
+
308
+ ## 5. Layer 2: Baseline Accumulation (Store Model, Not Logs)
309
+
310
+ This is the core innovation. Three complementary models operate in parallel, each maintaining a compact representation of "normal" behavior:
311
+
312
+ ### Tier 1: Per-Entity Statistical Profiles (Fastest, Smallest)
313
+
314
+ For each entity (IAM user, role, service), maintain rolling statistics:
315
+
316
+ ```python
317
+ from collections import defaultdict
318
+ import math
319
+
320
+ class EntityProfile:
321
+ """Compact behavioral profile. ~1KB per entity. No raw log storage."""
322
+
323
+ def __init__(self, alpha=0.01):
324
+ self.alpha = alpha # EMA decay rate
325
+ # Per-feature exponential moving average
326
+ self.mu = defaultdict(float) # running mean
327
+ self.var = defaultdict(lambda: 1.0) # running variance
328
+ # Categorical frequency distributions
329
+ self.api_freq = {} # {event_name: count} (top-K only)
330
+ self.region_freq = {} # {region: count}
331
+ self.hour_dist = [0] * 24 # hourly activity distribution
332
+ self.ip_set_size = 0 # HyperLogLog cardinality estimate
333
+ # Metadata
334
+ self.event_count = 0
335
+ self.last_seen = None
336
+ self.first_seen = None
337
+
338
+ def update(self, features: dict):
339
+ """O(1) update. No raw data retained."""
340
+ self.event_count += 1
341
+
342
+ for key, value in features.items():
343
+ if isinstance(value, (int, float)):
344
+ # Welford's online algorithm for mean/variance
345
+ old_mu = self.mu[key]
346
+ self.mu[key] = self.alpha * value + (1 - self.alpha) * old_mu
347
+ self.var[key] = (self.alpha * (value - self.mu[key])**2
348
+ + (1 - self.alpha) * self.var[key])
349
+
350
+ def anomaly_score(self, features: dict) -> float:
351
+ """Z-score based anomaly scoring."""
352
+ scores = []
353
+ for key, value in features.items():
354
+ if isinstance(value, (int, float)) and key in self.mu:
355
+ sigma = math.sqrt(self.var[key] + 1e-8)
356
+ z = abs(value - self.mu[key]) / sigma
357
+ scores.append(z)
358
+ return max(scores) if scores else 0.0
359
+
360
+ def memory_bytes(self) -> int:
361
+ """Total memory footprint of this profile."""
362
+ return (len(self.mu) * 16 # 8 bytes key + 8 bytes float
363
+ + len(self.var) * 16
364
+ + len(self.api_freq) * 40
365
+ + len(self.region_freq) * 40
366
+ + 24 * 8 # hour_dist
367
+ + 64) # metadata
368
+ # Typically ~500-2000 bytes per entity
369
+ ```
370
+
371
+ ### Tier 2: Count-Min Sketch (Frequency/Burst Detection)
372
+
373
+ Detect unusual frequencies of (entity, action) pairs without storing any raw events:
374
+
375
+ ```python
376
+ import hashlib
377
+ import numpy as np
378
+
379
+ class CountMinSketch:
380
+ """Fixed-size frequency tracker. 80KB regardless of stream length."""
381
+
382
+ def __init__(self, depth=5, width=2048):
383
+ self.depth = depth
384
+ self.width = width
385
+ self.table = np.zeros((depth, width), dtype=np.int64)
386
+ self.hash_seeds = [i * 0x9e3779b9 for i in range(depth)]
387
+
388
+ def _hash(self, key: str, seed: int) -> int:
389
+ h = hashlib.md5(f"{seed}:{key}".encode()).hexdigest()
390
+ return int(h, 16) % self.width
391
+
392
+ def add(self, key: str, count: int = 1):
393
+ for i in range(self.depth):
394
+ j = self._hash(key, self.hash_seeds[i])
395
+ self.table[i][j] += count
396
+
397
+ def estimate(self, key: str) -> int:
398
+ return min(
399
+ self.table[i][self._hash(key, self.hash_seeds[i])]
400
+ for i in range(self.depth)
401
+ )
402
+
403
+ def memory_bytes(self) -> int:
404
+ return self.depth * self.width * 8 # 80KB for default params
405
+
406
+
407
+ class BurstDetector:
408
+ """Detect unusual bursts using CMS + time windows."""
409
+
410
+ def __init__(self):
411
+ self.current_window = CountMinSketch() # current time window
412
+ self.baseline_window = CountMinSketch() # historical baseline
413
+ self.window_count = 0
414
+
415
+ def process(self, entity: str, event_name: str) -> float:
416
+ key = f"{entity}:{event_name}"
417
+ self.current_window.add(key)
418
+
419
+ current = self.current_window.estimate(key)
420
+ baseline = max(self.baseline_window.estimate(key), 1)
421
+
422
+ # Chi-squared style anomaly score (from MIDAS, AAAI 2020)
423
+ expected = baseline * (1.0 / max(self.window_count, 1))
424
+ score = (current - expected)**2 / (expected + 1e-8)
425
+
426
+ return score
427
+
428
+ def rotate_window(self):
429
+ """Call periodically (e.g., every 5 minutes)."""
430
+ # Merge current into baseline with decay
431
+ self.baseline_window.table = (
432
+ 0.95 * self.baseline_window.table
433
+ + 0.05 * self.current_window.table
434
+ ).astype(np.int64)
435
+ self.current_window = CountMinSketch()
436
+ self.window_count += 1
437
+ ```
438
+
439
+ ### Tier 3: Online Isolation Forest (Structural Anomaly Detection)
440
+
441
+ For detecting complex, multi-feature anomalies that simple statistics miss:
442
+
443
+ ```python
444
+ # Using PySAD library (pip install pysad)
445
+ from pysad.models import HalfSpaceTrees, xStream
446
+
447
+ class StructuralAnomalyDetector:
448
+ """Streaming isolation forest. ~2MB fixed memory. No raw data storage."""
449
+
450
+ def __init__(self):
451
+ # Half-Space Trees: 32 trees, depth 15, window 250
452
+ self.model = HalfSpaceTrees(
453
+ n_trees=32,
454
+ max_depth=15,
455
+ window_size=250
456
+ )
457
+ self.is_warm = False
458
+ self.warmup_count = 0
459
+ self.warmup_threshold = 500 # events before scoring is reliable
460
+
461
+ def process(self, feature_vector) -> float:
462
+ """Process single event. Returns anomaly score."""
463
+ score = self.model.fit_score_partial(feature_vector)
464
+ self.warmup_count += 1
465
+ if self.warmup_count >= self.warmup_threshold:
466
+ self.is_warm = True
467
+ return score if self.is_warm else 0.0 # don't score during warmup
468
+ ```
469
+
470
+ ### Composite Scoring
471
+
472
+ ```python
473
+ class BaselineAccumulator:
474
+ """Orchestrates all three tiers. Decides: store or discard."""
475
+
476
+ def __init__(self, anomaly_threshold=3.0):
477
+ self.entity_profiles = {} # arn -> EntityProfile
478
+ self.burst_detector = BurstDetector()
479
+ self.structural_detector = StructuralAnomalyDetector()
480
+ self.threshold = anomaly_threshold
481
+
482
+ def process_event(self, event: dict) -> tuple:
483
+ """
484
+ Returns: (is_anomaly: bool, scores: dict, raw_event_or_none)
485
+
486
+ If normal: returns (False, scores, None) β€” raw event can be discarded
487
+ If anomaly: returns (True, scores, event) β€” raw event retained
488
+ """
489
+ features = extract_features(event)
490
+ entity_arn = event["userIdentity"]["arn"]
491
+
492
+ # Get or create entity profile
493
+ if entity_arn not in self.entity_profiles:
494
+ self.entity_profiles[entity_arn] = EntityProfile()
495
+ profile = self.entity_profiles[entity_arn]
496
+
497
+ # Score across all three tiers
498
+ stat_score = profile.anomaly_score(features)
499
+ burst_score = self.burst_detector.process(
500
+ entity_arn, event["eventName"]
501
+ )
502
+ structural_score = self.structural_detector.process(
503
+ list(features.values())
504
+ )
505
+
506
+ composite = max(stat_score, burst_score, structural_score)
507
+
508
+ # Always update the baseline (even for anomalies)
509
+ profile.update(features)
510
+
511
+ scores = {
512
+ "statistical": stat_score,
513
+ "burst": burst_score,
514
+ "structural": structural_score,
515
+ "composite": composite
516
+ }
517
+
518
+ if composite > self.threshold:
519
+ return (True, scores, event) # STORE anomaly log
520
+ else:
521
+ return (False, scores, None) # DISCARD normal log
522
+
523
+ def total_memory(self) -> str:
524
+ entity_mem = sum(p.memory_bytes() for p in self.entity_profiles.values())
525
+ sketch_mem = self.burst_detector.current_window.memory_bytes() * 2
526
+ structural_mem = 2 * 1024 * 1024 # ~2MB for HST
527
+ total = entity_mem + sketch_mem + structural_mem
528
+ return f"{total / 1024 / 1024:.1f} MB for {len(self.entity_profiles)} entities"
529
+ ```
530
+
531
+ ### Concept Drift Handling
532
+
533
+ Normal behavior changes over time (employees change roles, new services deployed). Use ADWIN (Adaptive Windowing) to detect drift and re-initialize:
534
+
535
+ ```python
536
+ from river import drift
537
+
538
+ class DriftAwareBaseline(BaselineAccumulator):
539
+ def __init__(self, **kwargs):
540
+ super().__init__(**kwargs)
541
+ self.drift_detectors = {} # per-entity ADWIN instances
542
+
543
+ def process_event(self, event):
544
+ result = super().process_event(event)
545
+ entity = event["userIdentity"]["arn"]
546
+
547
+ if entity not in self.drift_detectors:
548
+ self.drift_detectors[entity] = drift.ADWIN(delta=0.002)
549
+
550
+ self.drift_detectors[entity].update(result[1]["composite"])
551
+
552
+ if self.drift_detectors[entity].drift_detected:
553
+ # Behavior has fundamentally changed β€” reset entity profile
554
+ self.entity_profiles[entity] = EntityProfile()
555
+ self.drift_detectors[entity] = drift.ADWIN(delta=0.002)
556
+ # Log the drift event (this IS stored as an anomaly)
557
+ return (True, {"drift": True}, event)
558
+
559
+ return result
560
+ ```
561
+
562
+ ---
563
+
564
+ ## 6. Layer 3: Anomaly Detection & Scoring
565
+
566
+ ### Warm-Up Period
567
+
568
+ All models need a learning period before thresholds are reliable:
569
+
570
+ | Phase | Duration | Raw Log Storage | Behavior |
571
+ |-------|----------|-----------------|----------|
572
+ | **Cold Start** | First 24 hours per entity | YES (stored temporarily) | Build initial profile |
573
+ | **Warm-Up** | Hours 24-72 | Selective (high-score only) | Calibrate thresholds |
574
+ | **Operational** | Day 3+ | Anomalies only | Full pipeline active |
575
+
576
+ During cold start, raw logs are temporarily stored and replayed to build the initial baseline. After warm-up, temporary logs are deleted.
577
+
578
+ ### Threshold Calibration
579
+
580
+ Use Gaussian Tail Probability to convert raw anomaly scores to p-values for consistent false-positive control:
581
+
582
+ ```python
583
+ from scipy import stats
584
+ import numpy as np
585
+
586
+ class ThresholdCalibrator:
587
+ """Adaptive thresholds based on score distributions."""
588
+
589
+ def __init__(self, target_fpr=0.001): # 0.1% false positive rate
590
+ self.target_fpr = target_fpr
591
+ self.score_buffer = [] # rolling window of recent scores
592
+ self.buffer_size = 10000
593
+ self.threshold = 3.0 # initial Z-score threshold
594
+
595
+ def update(self, score: float):
596
+ self.score_buffer.append(score)
597
+ if len(self.score_buffer) > self.buffer_size:
598
+ self.score_buffer.pop(0)
599
+
600
+ if len(self.score_buffer) >= 1000:
601
+ # Fit Gaussian to score distribution
602
+ mu = np.mean(self.score_buffer)
603
+ sigma = np.std(self.score_buffer) + 1e-8
604
+ # Set threshold at target FPR
605
+ self.threshold = mu + sigma * stats.norm.ppf(1 - self.target_fpr)
606
+ ```
607
+
608
+ ---
609
+
610
+ ## 7. Layer 4: Multi-Agent Triage Pipeline
611
+
612
+ Based on the CORTEX architecture (arxiv:2510.00311), which achieved F1=0.78 and reduced false positives by 10.7 percentage points over single-agent approaches.
613
+
614
+ ### Why Multi-Agent?
615
+
616
+ | Approach | F1 Score | FPR | Failure Mode |
617
+ |----------|----------|-----|-------------|
618
+ | Single LLM agent | 0.66 | 24.9% | Context cramming, hallucination |
619
+ | **Multi-agent (CORTEX)** | **0.78** | **14.2%** | N/A β€” divide-and-conquer eliminates it |
620
+
621
+ ### Agent Definitions
622
+
623
+ ```python
624
+ from langgraph.graph import StateGraph, END
625
+ from typing import TypedDict, List, Optional
626
+
627
+ class SOCState(TypedDict):
628
+ """State passed between agents in the triage pipeline."""
629
+ alert: dict # Raw anomaly event + scores
630
+ entity_profile: dict # Historical profile summary
631
+ workflow: str # Classified workflow type
632
+ evidence: List[dict] # Gathered evidence
633
+ enrichment: dict # CTI, TTP, severity
634
+ symbolic_check: dict # Deterministic rule validation
635
+ verdict: str # FP | TP_AUTO | TP_ESCALATE
636
+ confidence: float # 0.0 - 1.0
637
+ reasoning: str # Natural language explanation
638
+ recommended_actions: List[str] # Specific remediation steps
639
+ triage_report: dict # Final structured report
640
+
641
+
642
+ # Agent 1: Orchestrator
643
+ ORCHESTRATOR_PROMPT = """You are the SOC Orchestrator Agent. Your role is to:
644
+ 1. Validate the incoming alert has all required fields
645
+ 2. Route to the Behavior Analysis Agent
646
+ 3. Ensure all pipeline stages complete
647
+ 4. Perform consistency checks on the final report
648
+ 5. If any agent returns an error, retry or escalate
649
+
650
+ You do NOT make triage decisions. You manage the process."""
651
+
652
+
653
+ # Agent 2: Behavior Analysis
654
+ BEHAVIOR_ANALYST_PROMPT = """You are the Behavior Analysis Agent. Given a CloudTrail
655
+ anomaly event and entity profile, classify it into exactly ONE workflow:
656
+
657
+ - CREDENTIAL_CHANGE: CreateAccessKey, UpdateAccessKey, CreateLoginProfile,
658
+ ChangePassword for another user
659
+ - IAM_POLICY_MOD: PutRolePolicy, AttachUserPolicy, CreatePolicy,
660
+ PutGroupPolicy with overly permissive policies
661
+ - GEO_ANOMALY: API calls from IP/region never seen for this entity
662
+ - UNUSUAL_API: API call this entity has never made before, especially
663
+ sensitive APIs (GetSecretValue, GetPasswordData, etc.)
664
+ - DATA_EXFIL: High-volume S3 GetObject, unusual data transfer patterns,
665
+ copy to external accounts
666
+ - PRIVILEGE_ESCALATION: AssumeRole to higher-privilege role, iam:PassRole
667
+ to sensitive service
668
+ - RECONNAISSANCE: Describe*, List*, Get* calls across multiple services
669
+ in rapid succession
670
+ - DEFENSE_EVASION: StopLogging, DeleteTrail, DisableAlarmActions,
671
+ PutBucketPolicy reducing restrictions
672
+
673
+ Output JSON: {"workflow": "...", "confidence": 0.0-1.0, "reasoning": "..."}"""
674
+
675
+
676
+ # Agent 3: Evidence Acquisition (per-workflow)
677
+ EVIDENCE_TOOLS = {
678
+ "queryCloudTrailEvents": "Query recent CloudTrail events for an entity within a time range",
679
+ "getIAMUser": "Get IAM user details including policies, groups, MFA status",
680
+ "getIAMRole": "Get IAM role details including trust policy and permissions",
681
+ "lookupIP": "Lookup IP address in threat intelligence databases (AbuseIPDB, VirusTotal)",
682
+ "getEntityProfile": "Retrieve the baseline behavioral profile for an entity",
683
+ "queryAthena": "Run SQL query against CloudTrail logs in Athena (anomaly store)",
684
+ "getAssetRecord": "Get EC2 instance, Lambda function, or S3 bucket details",
685
+ "getGuardDutyFindings": "Check if GuardDuty has related findings",
686
+ "getSecurityHubFindings": "Check Security Hub for related compliance findings",
687
+ }
688
+
689
+
690
+ # Agent 4: Reasoning & Synthesis
691
+ REASONING_PROMPT = """You are the Reasoning & Synthesis Agent. Given:
692
+ - The classified workflow
693
+ - All gathered evidence
694
+ - Threat intelligence enrichment
695
+ - Symbolic verification results
696
+
697
+ You must produce a structured triage report with:
698
+
699
+ 1. VERDICT: One of:
700
+ - FALSE_POSITIVE: This is normal/expected behavior. Explain why.
701
+ - TRUE_POSITIVE_AUTO: This is a real threat AND safe to auto-remediate.
702
+ Specify exact remediation actions.
703
+ - TRUE_POSITIVE_ESCALATE: This is a real threat BUT requires human judgment.
704
+ Explain what is uncertain.
705
+
706
+ 2. CONFIDENCE: 0.0-1.0 (must be >0.9 for AUTO remediation)
707
+
708
+ 3. MITRE_TTPS: List of applicable MITRE ATT&CK technique IDs
709
+
710
+ 4. SEVERITY: CRITICAL / HIGH / MEDIUM / LOW
711
+
712
+ 5. EVIDENCE_SUMMARY: Key evidence points that support the verdict
713
+
714
+ 6. REASONING_CHAIN: Step-by-step logic leading to the verdict
715
+
716
+ CRITICAL RULES:
717
+ - When in doubt, ESCALATE. Never auto-remediate with confidence < 0.9
718
+ - Always check if the action was performed by a known automation/service role
719
+ - Consider time of day, historical patterns, and business context
720
+ - A single unusual action is not necessarily malicious β€” look for chains"""
721
+ ```
722
+
723
+ ### LangGraph Pipeline
724
+
725
+ ```python
726
+ def build_soc_pipeline():
727
+ """Build the multi-agent SOC triage pipeline."""
728
+
729
+ workflow = StateGraph(SOCState)
730
+
731
+ # Add nodes
732
+ workflow.add_node("orchestrator", orchestrator_agent)
733
+ workflow.add_node("behavior_analysis", behavior_analysis_agent)
734
+ workflow.add_node("evidence_gathering", evidence_gathering_agent)
735
+ workflow.add_node("symbolic_verification", symbolic_verifier)
736
+ workflow.add_node("reasoning", reasoning_agent)
737
+ workflow.add_node("response_executor", response_executor)
738
+
739
+ # Define edges
740
+ workflow.set_entry_point("orchestrator")
741
+ workflow.add_edge("orchestrator", "behavior_analysis")
742
+ workflow.add_edge("behavior_analysis", "evidence_gathering")
743
+ workflow.add_edge("evidence_gathering", "symbolic_verification")
744
+ workflow.add_edge("symbolic_verification", "reasoning")
745
+
746
+ # Conditional routing based on verdict
747
+ workflow.add_conditional_edges(
748
+ "reasoning",
749
+ route_verdict,
750
+ {
751
+ "false_positive": "update_baseline",
752
+ "auto_remediate": "response_executor",
753
+ "escalate": "create_case",
754
+ "retry": "evidence_gathering", # Need more evidence
755
+ }
756
+ )
757
+
758
+ workflow.add_node("update_baseline", update_baseline_node)
759
+ workflow.add_node("create_case", create_case_node)
760
+
761
+ workflow.add_edge("update_baseline", END)
762
+ workflow.add_edge("response_executor", END)
763
+ workflow.add_edge("create_case", END)
764
+
765
+ return workflow.compile()
766
+ ```
767
+
768
+ ### Symbolic Verifier (Grounds LLM Output in Deterministic Rules)
769
+
770
+ Based on CloudAnoAgent (arxiv:2508.01844). Prevents LLM hallucination by cross-checking verdicts against deterministic rules:
771
+
772
+ ```python
773
+ class SymbolicVerifier:
774
+ """Deterministic rule checker to ground LLM reasoning."""
775
+
776
+ RULES = {
777
+ "CREDENTIAL_CHANGE": {
778
+ "auto_remediate_conditions": [
779
+ "target_user != source_user", # Creating creds for someone else
780
+ "mfa_not_authenticated",
781
+ "source_ip_not_in_corporate_range",
782
+ ],
783
+ "false_positive_conditions": [
784
+ "source_is_known_automation_role",
785
+ "target_user == source_user AND mfa_authenticated",
786
+ ],
787
+ },
788
+ "DEFENSE_EVASION": {
789
+ "auto_remediate_conditions": [
790
+ "event_name in ['StopLogging', 'DeleteTrail', 'UpdateTrail']",
791
+ # ALWAYS true positive β€” these should never happen in production
792
+ ],
793
+ "false_positive_conditions": [], # Never FP
794
+ "always_critical": True,
795
+ },
796
+ "GEO_ANOMALY": {
797
+ "auto_remediate_conditions": [
798
+ "distance_km > 500 AND time_since_last_event_hours < 2",
799
+ # Impossible travel
800
+ ],
801
+ "false_positive_conditions": [
802
+ "source_ip_is_known_vpn",
803
+ "source_ip_is_aws_service",
804
+ ],
805
+ },
806
+ }
807
+
808
+ def verify(self, workflow: str, evidence: dict, llm_verdict: str) -> dict:
809
+ """Cross-check LLM verdict against deterministic rules."""
810
+ rules = self.RULES.get(workflow, {})
811
+
812
+ conflicts = []
813
+
814
+ # Check if LLM says FP but rules say it can't be
815
+ if llm_verdict == "FALSE_POSITIVE":
816
+ if rules.get("always_critical"):
817
+ conflicts.append(
818
+ f"LLM classified as FP but {workflow} is ALWAYS critical"
819
+ )
820
+
821
+ # Check if LLM says auto-remediate but confidence rules aren't met
822
+ if llm_verdict == "TRUE_POSITIVE_AUTO":
823
+ if not self._check_conditions(
824
+ rules.get("auto_remediate_conditions", []), evidence
825
+ ):
826
+ conflicts.append(
827
+ "Auto-remediation conditions not met β€” escalate instead"
828
+ )
829
+
830
+ return {
831
+ "verified": len(conflicts) == 0,
832
+ "conflicts": conflicts,
833
+ "override_verdict": "TRUE_POSITIVE_ESCALATE" if conflicts else None,
834
+ }
835
+ ```
836
+
837
+ ---
838
+
839
+ ## 8. Layer 5: Threat Intelligence Enrichment & TTP Mapping
840
+
841
+ ### RAG-Based CTI Enrichment
842
+
843
+ Based on the architecture from arxiv:2504.00428 (LLM-Assisted Proactive Threat Intelligence):
844
+
845
+ ```python
846
+ from sentence_transformers import SentenceTransformer
847
+ from langchain.vectorstores import Chroma
848
+
849
+ class CTIEnricher:
850
+ """RAG-based threat intelligence enrichment."""
851
+
852
+ def __init__(self):
853
+ # Embedding model for CTI document retrieval
854
+ self.embedder = SentenceTransformer("all-mpnet-base-v2")
855
+
856
+ # Vector store loaded with:
857
+ self.feeds = {
858
+ "mitre_attack": "MITRE ATT&CK Enterprise + Cloud matrix",
859
+ "nvd_cve": "National Vulnerability Database CVE entries",
860
+ "cisa_kev": "CISA Known Exploited Vulnerabilities",
861
+ "abuse_ipdb": "AbuseIPDB IP reputation data",
862
+ "aws_security_bulletins": "AWS security advisories",
863
+ }
864
+
865
+ self.vector_store = Chroma(
866
+ collection_name="cti_knowledge",
867
+ embedding_function=self.embedder,
868
+ )
869
+
870
+ def enrich(self, alert: dict, workflow: str) -> dict:
871
+ """Enrich alert with threat intelligence context."""
872
+
873
+ # Build query from alert context
874
+ query = (f"AWS CloudTrail {alert['eventName']} "
875
+ f"by {alert['userIdentity']['type']} "
876
+ f"workflow: {workflow}")
877
+
878
+ # Retrieve relevant CTI documents
879
+ docs = self.vector_store.similarity_search(query, k=5)
880
+
881
+ # Map to MITRE ATT&CK
882
+ ttps = self.map_to_attack(alert["eventName"], workflow)
883
+
884
+ return {
885
+ "mitre_ttps": ttps,
886
+ "cti_context": [doc.page_content for doc in docs],
887
+ "ip_reputation": self.check_ip(alert.get("sourceIPAddress")),
888
+ "related_cves": self.find_related_cves(alert),
889
+ }
890
+
891
+ def map_to_attack(self, event_name: str, workflow: str) -> list:
892
+ """Map CloudTrail event to MITRE ATT&CK techniques."""
893
+ # See Section 12 for complete mapping
894
+ return CLOUDTRAIL_ATTACK_MAP.get(event_name, [])
895
+ ```
896
+
897
+ ---
898
+
899
+ ## 9. Layer 6: Verdict & Response (The Three-Way Decision)
900
+
901
+ ### Decision Logic
902
+
903
+ ```python
904
+ def make_verdict(
905
+ llm_verdict: str,
906
+ llm_confidence: float,
907
+ symbolic_check: dict,
908
+ severity: str,
909
+ ) -> str:
910
+ """
911
+ Final verdict incorporating LLM reasoning + symbolic verification.
912
+
913
+ Conservative by design:
914
+ - Auto-remediate only when BOTH LLM AND symbolic verifier agree
915
+ - Escalate if there's ANY disagreement
916
+ - FP only when LLM is confident AND no symbolic conflicts
917
+ """
918
+
919
+ # Symbolic verifier overrides LLM
920
+ if not symbolic_check["verified"]:
921
+ if symbolic_check["override_verdict"]:
922
+ return symbolic_check["override_verdict"]
923
+ return "TRUE_POSITIVE_ESCALATE"
924
+
925
+ # High confidence + verified = trust LLM verdict
926
+ if llm_verdict == "FALSE_POSITIVE" and llm_confidence > 0.85:
927
+ return "FALSE_POSITIVE"
928
+
929
+ if llm_verdict == "TRUE_POSITIVE_AUTO" and llm_confidence > 0.90:
930
+ # Extra safety: CRITICAL severity always escalates
931
+ if severity == "CRITICAL":
932
+ return "TRUE_POSITIVE_ESCALATE"
933
+ return "TRUE_POSITIVE_AUTO"
934
+
935
+ # Default: escalate
936
+ return "TRUE_POSITIVE_ESCALATE"
937
+ ```
938
+
939
+ ### Response Actions by Verdict
940
+
941
+ ```python
942
+ class ResponseExecutor:
943
+ """Execute automated responses for confirmed true positives."""
944
+
945
+ async def execute(self, verdict: str, alert: dict, report: dict):
946
+ if verdict == "FALSE_POSITIVE":
947
+ await self.dismiss_and_learn(alert, report)
948
+
949
+ elif verdict == "TRUE_POSITIVE_AUTO":
950
+ await self.auto_remediate(alert, report)
951
+
952
+ elif verdict == "TRUE_POSITIVE_ESCALATE":
953
+ await self.escalate_to_human(alert, report)
954
+
955
+ async def dismiss_and_learn(self, alert, report):
956
+ """Update baseline to prevent future FP on similar events."""
957
+ entity = alert["userIdentity"]["arn"]
958
+ # Widen normal bounds for this entity's profile
959
+ profile = baseline_accumulator.entity_profiles[entity]
960
+ profile.widen_bounds(alert, report["reasoning"])
961
+ # Log dismissal reason (audit trail)
962
+ await audit_log.record("FP_DISMISSED", alert, report)
963
+
964
+ async def auto_remediate(self, alert, report):
965
+ """Execute safe, pre-approved remediation actions."""
966
+ for action in report["recommended_actions"]:
967
+ # Double-check action is in whitelist
968
+ if action in SAFE_REMEDIATION_ACTIONS:
969
+ await execute_aws_action(action, alert)
970
+ await audit_log.record("AUTO_REMEDIATED", alert, action)
971
+ else:
972
+ await self.escalate_to_human(alert, report)
973
+ return
974
+
975
+ async def escalate_to_human(self, alert, report):
976
+ """Create case and alert human analyst."""
977
+ case = await thehive.create_case(
978
+ title=f"[{report['severity']}] {report['workflow']}: "
979
+ f"{alert['eventName']} by {alert['userIdentity']['arn']}",
980
+ description=report["reasoning"],
981
+ severity=severity_to_number(report["severity"]),
982
+ tags=report["mitre_ttps"],
983
+ )
984
+ # Attach all evidence
985
+ for evidence in report["evidence"]:
986
+ await thehive.add_observable(case.id, evidence)
987
+ # Page on-call analyst for CRITICAL
988
+ if report["severity"] == "CRITICAL":
989
+ await pagerduty.trigger(case)
990
+ ```
991
+
992
+ ---
993
+
994
+ ## 10. Layer 7: Automated Remediation Actions
995
+
996
+ ### Safe Remediation Playbooks
997
+
998
+ Actions the system can execute autonomously with high confidence:
999
+
1000
+ | Workflow | Trigger | Auto-Remediation Action | AWS API Call | Rollback |
1001
+ |----------|---------|------------------------|-------------|----------|
1002
+ | **Credential Compromise** | New access key by unauthorized user | Deactivate the new key | `iam:UpdateAccessKey(Status=Inactive)` | Re-enable key |
1003
+ | **Credential Compromise** | Console login from impossible geo | Revoke active sessions | `sts:RevokeSession` via inline deny policy | Remove deny policy |
1004
+ | **Defense Evasion** | CloudTrail logging disabled | Re-enable logging | `cloudtrail:StartLogging` | N/A |
1005
+ | **Defense Evasion** | Trail deleted | Recreate trail from saved config | `cloudtrail:CreateTrail` | N/A |
1006
+ | **Data Exfiltration** | S3 bucket policy opened to public | Restore previous bucket policy | `s3:PutBucketPolicy(saved_policy)` | N/A |
1007
+ | **IAM Policy Mod** | Admin policy attached to user | Detach the policy | `iam:DetachUserPolicy` | Re-attach policy |
1008
+ | **Network** | Security group opened to 0.0.0.0/0 | Revoke the ingress rule | `ec2:RevokeSecurityGroupIngress` | Re-add rule |
1009
+ | **Geo Anomaly** | Impossible travel detected | Enforce MFA, revoke sessions | SCP + session revocation | Remove SCP |
1010
+
1011
+ ### Guardrails for Auto-Remediation
1012
+
1013
+ ```python
1014
+ SAFE_REMEDIATION_ACTIONS = {
1015
+ # Credential actions (reversible, low blast radius)
1016
+ "deactivate_access_key",
1017
+ "revoke_session",
1018
+ "force_mfa",
1019
+
1020
+ # Logging actions (restoring security posture)
1021
+ "enable_cloudtrail",
1022
+ "restore_trail_config",
1023
+
1024
+ # Network actions (blocking unauthorized access)
1025
+ "revoke_security_group_rule",
1026
+ "restore_bucket_policy",
1027
+
1028
+ # IAM actions (removing unauthorized permissions)
1029
+ "detach_overprivileged_policy",
1030
+ }
1031
+
1032
+ # Actions that ALWAYS require human approval
1033
+ NEVER_AUTO_REMEDIATE = {
1034
+ "terminate_instance", # Could be production workload
1035
+ "delete_iam_user", # Destructive, hard to reverse
1036
+ "modify_vpc", # Network-wide impact
1037
+ "modify_rds_instance", # Data risk
1038
+ "anything_in_production", # Production changes need human sign-off
1039
+ }
1040
+ ```
1041
+
1042
+ ---
1043
+
1044
+ ## 11. Storage Economics: Quantifying the Savings
1045
+
1046
+ ### Comparison Model
1047
+
1048
+ Assumptions:
1049
+ - AWS environment: 10,000 active entities (users, roles, services)
1050
+ - CloudTrail volume: 500 million events/day (~500 GB/day uncompressed)
1051
+ - Anomaly rate: 0.1% of events (500,000 anomalies/day)
1052
+ - Retention: 1 year
1053
+
1054
+ | Component | Traditional SIEM | Agentic SOC |
1055
+ |-----------|-----------------|-------------|
1056
+ | **Daily ingestion** | 500 GB | 0.5 GB (anomalies only) |
1057
+ | **Annual storage** | 182 TB | 182 GB + ~10 GB models |
1058
+ | **Storage cost** (S3 pricing) | ~$4,200/month | ~$4.50/month |
1059
+ | **SIEM license** (Splunk-class) | ~$100K-500K/year | $0 (self-built) |
1060
+ | **Compute (detection)** | Query over stored data | Streaming (real-time) |
1061
+ | **Latency to detect** | Minutes to hours | Milliseconds to seconds |
1062
+ | **LLM costs** (triage only anomalies) | N/A | ~$50-200/day* |
1063
+
1064
+ *LLM cost estimate: 500K anomalies/day Γ— 1K tokens avg Γ— $0.15/1M tokens (GPT-4o-mini) = ~$75/day
1065
+
1066
+ ### Total Cost of Ownership (Annual)
1067
+
1068
+ | | Traditional SIEM | Agentic SOC |
1069
+ |---|---|---|
1070
+ | Storage | $50,000 | $55 |
1071
+ | SIEM License | $200,000 | $0 |
1072
+ | Compute | $30,000 | $15,000 |
1073
+ | LLM API | $0 | $25,000 |
1074
+ | Analyst time (reduced) | $500,000 (5 FTE) | $200,000 (2 FTE) |
1075
+ | **Total** | **~$780,000** | **~$240,000** |
1076
+ | **Savings** | β€” | **~70%** |
1077
+
1078
+ ---
1079
+
1080
+ ## 12. CloudTrail β†’ MITRE ATT&CK Mapping Reference
1081
+
1082
+ ### Initial Access (TA0001)
1083
+
1084
+ | CloudTrail Event | ATT&CK Technique | Description |
1085
+ |-----------------|-------------------|-------------|
1086
+ | `ConsoleLogin` (from unusual IP) | T1078.004 β€” Cloud Accounts | Valid account used from unexpected location |
1087
+ | `ConsoleLogin` (errorCode=Failed) | T1110 β€” Brute Force | Multiple failed login attempts |
1088
+ | `GetFederationToken` | T1078.004 | Federation token for unauthorized access |
1089
+
1090
+ ### Persistence (TA0003)
1091
+
1092
+ | CloudTrail Event | ATT&CK Technique | Description |
1093
+ |-----------------|-------------------|-------------|
1094
+ | `CreateAccessKey` | T1098.001 β€” Additional Cloud Credentials | Backdoor access key created |
1095
+ | `CreateLoginProfile` | T1098.001 | Console access added to service account |
1096
+ | `CreateUser` | T1136.003 β€” Cloud Account | New IAM user for persistence |
1097
+ | `PutRolePolicy` (trust policy) | T1098.003 β€” Additional Cloud Roles | Cross-account trust modified |
1098
+ | `CreateFunction` (Lambda) | T1525 β€” Implant Internal Image | Serverless backdoor |
1099
+
1100
+ ### Privilege Escalation (TA0004)
1101
+
1102
+ | CloudTrail Event | ATT&CK Technique | Description |
1103
+ |-----------------|-------------------|-------------|
1104
+ | `AttachUserPolicy` (AdminAccess) | T1078.004 β€” Cloud Accounts | Granting admin to non-admin user |
1105
+ | `AssumeRole` (to admin role) | T1548 β€” Abuse Elevation Control | Assuming higher-privilege role |
1106
+ | `PutUserPolicy` (iam:*) | T1078.004 | Granting IAM modification permissions |
1107
+ | `UpdateAssumeRolePolicy` | T1548 | Modifying who can assume a role |
1108
+ | `iam:PassRole` | T1548 | Passing admin role to service |
1109
+
1110
+ ### Defense Evasion (TA0005)
1111
+
1112
+ | CloudTrail Event | ATT&CK Technique | Description |
1113
+ |-----------------|-------------------|-------------|
1114
+ | `StopLogging` | T1562.008 β€” Disable Cloud Logs | **CRITICAL** β€” Disabling audit trail |
1115
+ | `DeleteTrail` | T1562.008 | **CRITICAL** β€” Deleting audit trail |
1116
+ | `UpdateTrail` (S3 bucket change) | T1562.008 | Redirecting logs to attacker bucket |
1117
+ | `PutEventSelectors` (exclude events) | T1562.008 | Filtering out attacker's events |
1118
+ | `DisableAlarmActions` | T1562 | Disabling CloudWatch alarms |
1119
+ | `DeleteFlowLogs` | T1562.008 | Removing network logging |
1120
+
1121
+ ### Credential Access (TA0006)
1122
+
1123
+ | CloudTrail Event | ATT&CK Technique | Description |
1124
+ |-----------------|-------------------|-------------|
1125
+ | `GetSecretValue` | T1555 β€” Credentials from Password Stores | Secrets Manager access |
1126
+ | `GetParametersByPath` (/password*) | T1555 | SSM Parameter Store credentials |
1127
+ | `GetPasswordData` | T1552.001 β€” Credentials In Files | EC2 Windows password retrieval |
1128
+ | `CreateAccessKey` (for other user) | T1528 β€” Steal Application Access Token | Creating keys for another user |
1129
+
1130
+ ### Discovery (TA0007)
1131
+
1132
+ | CloudTrail Event | ATT&CK Technique | Description |
1133
+ |-----------------|-------------------|-------------|
1134
+ | `DescribeInstances` (broad) | T1580 β€” Cloud Infrastructure Discovery | Enumerating EC2 instances |
1135
+ | `ListBuckets` + `GetBucketAcl` | T1580 | Enumerating S3 buckets and permissions |
1136
+ | `ListUsers` + `ListRoles` | T1087.004 β€” Cloud Account Discovery | Enumerating IAM entities |
1137
+ | `GetCallerIdentity` | T1087.004 | Who am I check (post-compromise) |
1138
+ | `DescribeSecurityGroups` | T1580 | Network enumeration |
1139
+
1140
+ ### Collection & Exfiltration (TA0009 / TA0010)
1141
+
1142
+ | CloudTrail Event | ATT&CK Technique | Description |
1143
+ |-----------------|-------------------|-------------|
1144
+ | `GetObject` (high volume) | T1530 β€” Data from Cloud Storage Object | Mass S3 download |
1145
+ | `CopyObject` (cross-account) | T1537 β€” Transfer to Cloud Account | Data moved to external account |
1146
+ | `CreateSnapshot` + `ModifySnapshotAttribute` | T1537 | EBS snapshot shared externally |
1147
+ | `PutBucketPolicy` (public access) | T1537 | S3 bucket opened for exfiltration |
1148
+
1149
+ ### Impact (TA0040)
1150
+
1151
+ | CloudTrail Event | ATT&CK Technique | Description |
1152
+ |-----------------|-------------------|-------------|
1153
+ | `TerminateInstances` | T1485 β€” Data Destruction | Destroying compute resources |
1154
+ | `DeleteBucket` | T1485 | Destroying storage resources |
1155
+ | `RunInstances` (crypto mining) | T1496 β€” Resource Hijacking | Unauthorized compute usage |
1156
+ | `PutBucketEncryption` (attacker key) | T1486 β€” Data Encrypted for Impact | Ransomware via re-encryption |
1157
+
1158
+ ---
1159
+
1160
+ ## 13. Open-Source Building Blocks
1161
+
1162
+ ### Recommended Stack
1163
+
1164
+ ```
1165
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
1166
+ β”‚ PRODUCTION STACK β”‚
1167
+ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
1168
+ β”‚ β”‚
1169
+ β”‚ INGESTION: β”‚
1170
+ β”‚ β”œβ”€β”€ AWS CloudTrail β†’ S3 β†’ SQS β†’ Consumer β”‚
1171
+ β”‚ β”œβ”€β”€ awslabs/mcp CloudTrail MCP Server (official AWS) β”‚
1172
+ β”‚ └── Wazuh (SIEM - native CloudTrail module) β”‚
1173
+ β”‚ β”‚
1174
+ β”‚ BASELINE & ANOMALY DETECTION: β”‚
1175
+ β”‚ β”œβ”€β”€ PySAD (streaming anomaly detection library) β”‚
1176
+ β”‚ β”‚ pip install pysad β”‚
1177
+ β”‚ β”‚ Models: HalfSpaceTrees, xStream, LODA β”‚
1178
+ β”‚ β”œβ”€β”€ River (online ML with drift detection) β”‚
1179
+ β”‚ β”‚ pip install river β”‚
1180
+ β”‚ β”‚ Models: ADWIN, HalfSpaceTrees β”‚
1181
+ β”‚ └── Custom: EMA profiles, Count-Min Sketch, MIDAS β”‚
1182
+ β”‚ β”‚
1183
+ β”‚ MULTI-AGENT ORCHESTRATION: β”‚
1184
+ β”‚ β”œβ”€β”€ LangGraph (stateful multi-agent pipelines) ⭐ #1 β”‚
1185
+ β”‚ β”‚ pip install langgraph β”‚
1186
+ β”‚ β”‚ Features: cycles, human-in-loop, checkpointing β”‚
1187
+ β”‚ β”œβ”€β”€ CrewAI (role-based agents) #2 β”‚
1188
+ β”‚ └── AutoGen (conversational agents) #3 β”‚
1189
+ β”‚ β”‚
1190
+ β”‚ LLM MODELS: β”‚
1191
+ β”‚ β”œβ”€β”€ GPT-4o-mini / Claude Haiku (orchestration - cheap) β”‚
1192
+ β”‚ β”œβ”€β”€ GPT-4o / Claude Sonnet (reasoning - quality) β”‚
1193
+ β”‚ β”œβ”€β”€ Gemini 2.5 Flash (best cost/quality) β”‚
1194
+ β”‚ β”œβ”€β”€ Llama 4 Maverick 17B (best open-source) β”‚
1195
+ β”‚ └── HassanShehata/logem 0.6B (log parsing - local) β”‚
1196
+ β”‚ β”‚
1197
+ β”‚ EMBEDDINGS: β”‚
1198
+ β”‚ β”œβ”€β”€ all-mpnet-base-v2 (CTI document retrieval) β”‚
1199
+ β”‚ β”œβ”€β”€ cisco-ai/SecureBERT2.0-base (security NER/embed) β”‚
1200
+ β”‚ └── Chroma / Milvus (vector store) β”‚
1201
+ β”‚ β”‚
1202
+ β”‚ CLOUD SECURITY TOOLS (as agent tools): β”‚
1203
+ β”‚ β”œβ”€β”€ Prowler + MCP Server (500+ AWS checks) β”‚
1204
+ β”‚ β”‚ pip install prowler-mcp β”‚
1205
+ β”‚ β”œβ”€β”€ Steampipe (SQL over cloud APIs) β”‚
1206
+ β”‚ β”‚ steampipe plugin install aws β”‚
1207
+ β”‚ └── AWS SDK (boto3 - remediation actions) β”‚
1208
+ β”‚ β”‚
1209
+ β”‚ CASE MANAGEMENT & SOAR: β”‚
1210
+ β”‚ β”œβ”€β”€ TheHive (case management, evidence) β”‚
1211
+ β”‚ β”œβ”€β”€ Shuffle SOAR (playbook automation) β”‚
1212
+ β”‚ └── Custom LangGraph interrupt (human approval gate) β”‚
1213
+ β”‚ β”‚
1214
+ β”‚ CTI FEEDS: β”‚
1215
+ β”‚ β”œβ”€β”€ MITRE ATT&CK (via taxii2 / stix2) β”‚
1216
+ β”‚ β”œβ”€β”€ NVD/CVE (via nvdlib) β”‚
1217
+ β”‚ β”œβ”€β”€ CISA KEV (JSON feed) β”‚
1218
+ β”‚ β”œβ”€β”€ AbuseIPDB (API) β”‚
1219
+ β”‚ └── VirusTotal (API) β”‚
1220
+ β”‚ β”‚
1221
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
1222
+ ```
1223
+
1224
+ ### Key Libraries & Versions
1225
+
1226
+ ```bash
1227
+ # Core pipeline
1228
+ pip install langgraph langchain langchain-openai
1229
+ pip install pysad river
1230
+ pip install boto3 botocore
1231
+
1232
+ # Embeddings & RAG
1233
+ pip install sentence-transformers chromadb
1234
+
1235
+ # CTI integration
1236
+ pip install stix2 taxii2-client nvdlib
1237
+
1238
+ # Security tools
1239
+ pip install prowler-mcp
1240
+ pip install awslabs.cloudtrail-mcp-server
1241
+
1242
+ # Log parsing (optional)
1243
+ pip install transformers torch # for logem model
1244
+
1245
+ # Monitoring
1246
+ pip install trackio
1247
+ ```
1248
+
1249
+ ---
1250
+
1251
+ ## 14. Implementation Roadmap
1252
+
1253
+ ### Phase 1: Foundation (Weeks 1-2)
1254
+ **Goal:** Ingest CloudTrail logs and build behavioral baselines
1255
+
1256
+ - [ ] Set up CloudTrail β†’ S3 β†’ SQS pipeline
1257
+ - [ ] Implement feature extraction from CloudTrail JSON events
1258
+ - [ ] Build per-entity statistical profiles (EMA + z-score)
1259
+ - [ ] Implement Count-Min Sketch for burst detection
1260
+ - [ ] Deploy Online Isolation Forest (PySAD) for structural anomaly detection
1261
+ - [ ] Build composite scoring and threshold calibration
1262
+ - [ ] Implement the "store anomaly, discard normal" decision logic
1263
+ - [ ] Validate on simulated data: generate normal + attack patterns
1264
+
1265
+ **Deliverable:** Streaming baseline system that correctly separates normal from anomalous CloudTrail events with <0.1% FPR
1266
+
1267
+ ### Phase 2: Multi-Agent Triage (Weeks 3-4)
1268
+ **Goal:** Build the LLM-powered investigation pipeline
1269
+
1270
+ - [ ] Implement LangGraph state machine with 4 agent nodes
1271
+ - [ ] Define workflow classifications (8 CloudTrail attack patterns)
1272
+ - [ ] Build evidence acquisition tools (CloudTrail query, IAM lookup, IP reputation)
1273
+ - [ ] Implement symbolic verifier with deterministic rules
1274
+ - [ ] Build reasoning agent with structured output schema
1275
+ - [ ] Implement the three-way verdict logic
1276
+ - [ ] Test with known attack patterns (use CloudAnoBench)
1277
+
1278
+ **Deliverable:** Multi-agent pipeline that correctly triages CloudTrail anomalies into FP/TP-Auto/TP-Escalate
1279
+
1280
+ ### Phase 3: Enrichment & Intelligence (Weeks 5-6)
1281
+ **Goal:** Add threat intelligence and MITRE ATT&CK mapping
1282
+
1283
+ - [ ] Load MITRE ATT&CK Cloud matrix into vector store
1284
+ - [ ] Build CTI feed ingestion (NVD, CISA KEV, AbuseIPDB)
1285
+ - [ ] Implement CloudTrail β†’ ATT&CK TTP mapping (Section 12)
1286
+ - [ ] Build RAG enrichment pipeline
1287
+ - [ ] Integrate with Prowler MCP for posture context
1288
+ - [ ] Test enrichment quality against known CVE/attack scenarios
1289
+
1290
+ **Deliverable:** Alerts enriched with TTPs, CVE context, IP reputation, and severity scoring
1291
+
1292
+ ### Phase 4: Automated Response (Weeks 7-8)
1293
+ **Goal:** Close the loop with safe auto-remediation
1294
+
1295
+ - [ ] Implement safe remediation actions (credential, logging, network)
1296
+ - [ ] Build guardrail framework (whitelist, blast radius check, rollback)
1297
+ - [ ] Integrate with TheHive for case management (escalations)
1298
+ - [ ] Build audit trail for all actions taken
1299
+ - [ ] Implement feedback loop: FP dismissals widen baseline
1300
+ - [ ] Deploy human-in-the-loop approval gate (LangGraph interrupt)
1301
+ - [ ] Red team testing: simulate multi-stage attacks
1302
+
1303
+ **Deliverable:** End-to-end autonomous SOC for CloudTrail with safe auto-remediation
1304
+
1305
+ ### Phase 5: Expansion & Optimization (Ongoing)
1306
+ **Goal:** Add more data sources, reduce false positives, increase automation
1307
+
1308
+ - [ ] Add VPC Flow Logs (network anomaly detection)
1309
+ - [ ] Add GuardDuty findings (correlation)
1310
+ - [ ] Add email/phishing logs (cross-domain correlation)
1311
+ - [ ] Add endpoint logs (EDR integration)
1312
+ - [ ] Fine-tune classification model on accumulated triage data (AACT approach)
1313
+ - [ ] Implement ADWIN drift detection for baseline updates
1314
+ - [ ] Build dashboards and reporting
1315
+ - [ ] Measure and optimize: FPR, MTTR, auto-resolution rate
1316
+
1317
+ ---
1318
+
1319
+ ## 15. Research Papers & References
1320
+
1321
+ ### Core Architecture Papers
1322
+
1323
+ | Paper | arxiv | Year | Key Contribution |
1324
+ |-------|-------|------|------------------|
1325
+ | **CORTEX** β€” Collaborative LLM Agents for Alert Triage | 2510.00311 | 2024 | Multi-agent SOC architecture, F1=0.78 |
1326
+ | **AACT** β€” Automated Alert Classification | 2505.09843 | 2025 | 61% alert reduction in production, behavioral profiling |
1327
+ | **CloudAnoAgent** β€” Cloud Anomaly Detection | 2508.01844 | 2025 | Fast/slow detection + symbolic verifier |
1328
+ | **CyberRAG** β€” Agentic RAG for Attack Classification | 2507.02424 | 2025 | 94.92% accuracy, specialist + RAG |
1329
+ | **OpsAgent** β€” Self-Evolving Multi-Agent | 2510.24145 | 2025 | +46.63% on incident management |
1330
+ | **ExCyTIn-Bench** β€” LLM Agent Evaluation | 2507.14201 | 2025 | Best models for security investigation |
1331
+
1332
+ ### Baseline & Anomaly Detection Papers
1333
+
1334
+ | Paper | arxiv | Year | Key Contribution |
1335
+ |-------|-------|------|------------------|
1336
+ | **DyMETER** β€” Dynamic Concept Adaptation | 2604.14726 | 2026 | AUCROC 0.906-0.991, concept drift handling |
1337
+ | **Online-iForest** β€” Streaming Isolation Forest | 2505.09593 | 2025 | 5-8Γ— faster than HST, AUC 0.998 |
1338
+ | **MemStream** β€” Memory-Based Streaming Detection | 2106.03837 | 2022 | Fixed-size memory, AUCROC 0.988 |
1339
+ | **MIDAS** β€” Count-Min Sketch for Edge Streams | 1911.04464 | 2020 | O(1) per event, 50KB memory |
1340
+ | **LogBERT** β€” Self-Supervised Log Anomaly Detection | 2103.04475 | 2021 | Masked log key prediction, hypersphere loss |
1341
+ | **LogLLM** β€” BERT+Llama Log Anomaly Detection | 2411.08561 | 2024 | F1=0.97, no log parser required |
1342
+
1343
+ ### Threat Intelligence & Enrichment
1344
+
1345
+ | Paper | arxiv | Year | Key Contribution |
1346
+ |-------|-------|------|------------------|
1347
+ | **LLM-Assisted Proactive CTI** | 2504.00428 | 2025 | RAG over CTI feeds, real-time enrichment |
1348
+ | **IBM Cloud Telemetry** | 2411.09047 | 2024 | 10,000Γ— compression ratio for detection |
1349
+
1350
+ ### Frameworks & Tools
1351
+
1352
+ | Tool | Source | Purpose |
1353
+ |------|--------|---------|
1354
+ | **LangGraph** | langchain-ai/langgraph | Multi-agent orchestration |
1355
+ | **PySAD** | selimfirat/pysad | Streaming anomaly detection |
1356
+ | **River** | online-ml/river | Online ML + drift detection |
1357
+ | **Prowler MCP** | prowler-cloud/prowler | AWS security checks via LLM |
1358
+ | **CloudTrail MCP** | awslabs/mcp | AWS CloudTrail LLM interface |
1359
+ | **logem** | HassanShehata/logem | Log field extraction (0.6B) |
1360
+ | **SecureBERT 2.0** | cisco-ai/SecureBERT2.0-base | Security embeddings |
1361
+ | **Wazuh** | wazuh/wazuh | Open-source SIEM with CloudTrail support |
1362
+ | **TheHive** | TheHive-Project/TheHive | Case management |
1363
+ | **Shuffle SOAR** | Shuffle/Shuffle | Security orchestration |
1364
+
1365
+ ### HuggingFace Datasets for Development
1366
+
1367
+ | Dataset | HF ID / Source | Size | Use |
1368
+ |---------|---------------|------|-----|
1369
+ | **CloudAnoBench** | jayzou3773.github.io | 1,252 cases | Cloud anomaly detection eval |
1370
+ | **ACSE-Eval** | ACSE-Eval/ACSE-Eval | 100 AWS scenarios | AWS threat modeling |
1371
+ | **AIT Log Dataset** | Austrian Inst. Tech | 8 networks, 3 weeks | Multi-step attack simulation |
1372
+ | **BGL/HDFS** | logpai/loghub | Millions of entries | Log anomaly detection baselines |
1373
+ | **NSL-KDD** | rgaidot/nsl-kdd | 125K+ entries | Network intrusion detection |
1374
+
1375
+ ---
1376
+
1377
+ ## Appendix A: Quick-Start Prototype
1378
+
1379
+ A minimal end-to-end prototype you can run today:
1380
+
1381
+ ```python
1382
+ """
1383
+ Agentic SOC Quick-Start Prototype
1384
+ Requires: pip install boto3 langgraph langchain-openai pysad river
1385
+ """
1386
+
1387
+ import json
1388
+ import boto3
1389
+ from collections import defaultdict
1390
+ from pysad.models import HalfSpaceTrees
1391
+ from river import drift
1392
+
1393
+ # ── Layer 1: Ingest CloudTrail ──────────────────────────────
1394
+
1395
+ def consume_cloudtrail_events(queue_url: str):
1396
+ """Pull CloudTrail events from SQS queue."""
1397
+ sqs = boto3.client('sqs')
1398
+ while True:
1399
+ response = sqs.receive_message(
1400
+ QueueUrl=queue_url,
1401
+ MaxNumberOfMessages=10,
1402
+ WaitTimeSeconds=20,
1403
+ )
1404
+ for msg in response.get('Messages', []):
1405
+ events = json.loads(msg['Body']).get('Records', [])
1406
+ for event in events:
1407
+ yield event
1408
+ sqs.delete_message(
1409
+ QueueUrl=queue_url,
1410
+ ReceiptHandle=msg['ReceiptHandle']
1411
+ )
1412
+
1413
+
1414
+ # ── Layer 2: Baseline Accumulator ───────────────────────────
1415
+
1416
+ class SimpleBaseline:
1417
+ def __init__(self):
1418
+ self.profiles = defaultdict(lambda: {
1419
+ 'count': 0, 'api_freq': defaultdict(int),
1420
+ 'mu': defaultdict(float), 'var': defaultdict(lambda: 1.0)
1421
+ })
1422
+ self.model = HalfSpaceTrees(n_trees=25, max_depth=15, window_size=250)
1423
+ self.anomaly_store = []
1424
+
1425
+ def process(self, event):
1426
+ arn = event.get('userIdentity', {}).get('arn', 'unknown')
1427
+ profile = self.profiles[arn]
1428
+ profile['count'] += 1
1429
+ profile['api_freq'][event['eventName']] += 1
1430
+
1431
+ features = [
1432
+ hash(event['eventName']) % 1000,
1433
+ hash(event.get('sourceIPAddress', '')) % 1000,
1434
+ int(event.get('eventTime', '2024-01-01T12:00:00Z')[11:13]),
1435
+ 1 if event.get('errorCode') else 0,
1436
+ ]
1437
+
1438
+ score = self.model.fit_score_partial(features)
1439
+
1440
+ if score > 0.7 and profile['count'] > 100:
1441
+ self.anomaly_store.append(event)
1442
+ return True, score # ANOMALY β€” store
1443
+ return False, score # NORMAL β€” discard
1444
+
1445
+
1446
+ # ── Layer 3: LLM Triage (simplified) ────────────────────────
1447
+
1448
+ from langchain_openai import ChatOpenAI
1449
+
1450
+ llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
1451
+
1452
+ def triage_anomaly(event, score):
1453
+ prompt = f"""You are a SOC analyst. Analyze this CloudTrail anomaly:
1454
+
1455
+ Event: {event['eventName']}
1456
+ User: {event.get('userIdentity', {}).get('arn', 'unknown')}
1457
+ Source IP: {event.get('sourceIPAddress', 'unknown')}
1458
+ Region: {event.get('awsRegion', 'unknown')}
1459
+ Error: {event.get('errorCode', 'none')}
1460
+ Anomaly Score: {score:.2f}
1461
+
1462
+ Classify as: FALSE_POSITIVE, TRUE_POSITIVE_AUTO, or TRUE_POSITIVE_ESCALATE
1463
+ Provide: verdict, confidence (0-1), reasoning, mitre_ttp, severity, recommended_action
1464
+
1465
+ Respond in JSON format."""
1466
+
1467
+ response = llm.invoke(prompt)
1468
+ return json.loads(response.content)
1469
+
1470
+
1471
+ # ── Main Loop ───────────────────────────────────────────────
1472
+
1473
+ baseline = SimpleBaseline()
1474
+
1475
+ for event in consume_cloudtrail_events("YOUR_SQS_QUEUE_URL"):
1476
+ is_anomaly, score = baseline.process(event)
1477
+
1478
+ if is_anomaly:
1479
+ # Only anomalies reach the LLM β€” saves cost
1480
+ report = triage_anomaly(event, score)
1481
+
1482
+ if report['verdict'] == 'FALSE_POSITIVE':
1483
+ print(f"FP dismissed: {event['eventName']}")
1484
+ elif report['verdict'] == 'TRUE_POSITIVE_AUTO':
1485
+ print(f"AUTO-REMEDIATE: {report['recommended_action']}")
1486
+ else:
1487
+ print(f"ESCALATE: {report['reasoning']}")
1488
+
1489
+ # Normal events: already discarded by baseline.process()
1490
+ ```
1491
+
1492
+ ---
1493
+
1494
+ ## Appendix B: Key Design Decisions & Rationale
1495
+
1496
+ | Decision | Choice | Rationale |
1497
+ |----------|--------|-----------|
1498
+ | Multi-agent vs single-agent | Multi-agent (4 stages) | CORTEX shows +12 F1 points, -10.7% FPR |
1499
+ | Baseline storage | Model only, no raw logs | 10,000Γ— storage reduction (IBM study) |
1500
+ | Anomaly detection | 3-tier cascade (stats β†’ sketch β†’ forest) | Each catches different patterns; composite is robust |
1501
+ | LLM for all events vs anomalies only | Anomalies only | 99.9% events are normal β€” LLM on all would cost 1000Γ— more |
1502
+ | Auto-remediate threshold | Confidence > 0.9 + symbolic verify | Conservative by design; false auto-remediation is catastrophic |
1503
+ | Drift detection | ADWIN per entity | Employees change roles; static baselines decay |
1504
+ | Agent framework | LangGraph | Only framework with cycles + human-in-loop + checkpointing |
1505
+ | Primary LLM | GPT-4o-mini (triage), GPT-4o (reasoning) | Cost/quality balance; replace with Llama 4 for on-prem |
1506
+ | Vector store | Chroma | Simple to start; migrate to Milvus at scale |
1507
+ | Case management | TheHive | Open-source, rich API, evidence management |
1508
+
1509
+ ---
1510
+
1511
+ *Document prepared: April 2026*
1512
+ *Based on literature review of 15+ research papers (2020-2026) and survey of 20+ open-source tools*
1513
+ *Architecture validated against: CORTEX (arxiv:2510.00311), AACT (arxiv:2505.09843), CloudAnoAgent (arxiv:2508.01844)*