Spaces:

lablab-ai-amd-developer-hackathon
/

Threat_Hunter

Running

App Files Files Community

Threat_Hunter / docs /data_contracts.md

EricChen2005

Deploy ThreatHunter - AMD MI300X + Qwen2.5-32B

c8d30bc 1 day ago

preview code

raw

history blame contribute delete

18.6 kB

	# ThreatHunter JSON 資料契約

	> 定義每個 Agent 的輸入輸出 JSON 格式。
	> 來源：FINAL_PLAN.md §六
	> 版本：v3.5（新增 Intel Fusion evidence gating）


	---

	## Scout → Analyst

	```json
	{
	"scan_id": "scan_20260401_001",
	"timestamp": "2026-04-01T10:00:00Z",
	"tech_stack": ["django 4.2", "redis 7.0"],
	"vulnerabilities": [
	{
	"cve_id": "CVE-2024-XXXX",
	"package": "django",
	"cvss_score": 7.5,
	"severity": "HIGH",
	"description": "...",
	"is_new": true
	}
	],
	"summary": {
	"total": 8,
	"new_since_last_scan": 2,
	"critical": 1,
	"high": 3,
	"medium": 3,
	"low": 1
	},
	"_degraded": false,
	"_error": null
	}
	```

	欄位說明：

	\| 欄位 \| 類型 \| 必填 \| 說明 \|
	\|---\|---\|---\|---\|
	\| `scan_id` \| string \| ✅ \| 掃描唯一識別符 \|
	\| `timestamp` \| string \| ✅ \| ISO 8601 時間戳 \|
	\| `tech_stack` \| string[] \| ✅ \| 技術堆疊清單 \|
	\| `vulnerabilities` \| array \| ✅ \| 漏洞清單 \|
	\| `vulnerabilities[].cve_id` \| string \| ✅ \| CVE 編號 \|
	\| `vulnerabilities[].package` \| string \| ✅ \| 套件名稱 \|
	\| `vulnerabilities[].cvss_score` \| number \| ✅ \| CVSS 分數（0-10） \|
	\| `vulnerabilities[].severity` \| string \| ✅ \| CRITICAL/HIGH/MEDIUM/LOW \|
	\| `vulnerabilities[].description` \| string \| ✅ \| 漏洞描述 \|
	\| `vulnerabilities[].is_new` \| bool \| ✅ \| 是否為新發現 \|
	\| `summary.total` \| number \| ✅ \| 漏洞總數 \|
	\| `summary.new_since_last_scan` \| number \| ✅ \| 新增漏洞數 \|
	\| `summary.critical` \| number \| ✅ \| CRITICAL 數量 \|
	\| `summary.high` \| number \| ✅ \| HIGH 數量 \|
	\| `_degraded` \| bool \| ❌ \| 是否為降級輸出 \|
	\| `_error` \| string/null \| ❌ \| 錯誤訊息（降級時） \|

	---

	## Analyst → Advisor

	```json
	{
	"scan_id": "scan_20260401_001",
	"risk_score": 85,
	"risk_trend": "+7",
	"analysis": [
	{
	"cve_id": "CVE-2024-XXXX",
	"original_cvss": 6.5,
	"adjusted_risk": "CRITICAL",
	"in_cisa_kev": true,
	"exploit_available": true,
	"chain_risk": {
	"is_chain": true,
	"chain_with": ["CVE-2024-YYYY"],
	"chain_description": "SSRF → Redis → RCE",
	"confidence": "HIGH"
	},
	"reasoning": "In CISA KEV + public exploit + chains with Redis"
	}
	],
	"_degraded": false,
	"_error": null
	}
	```

	欄位說明：

	\| 欄位 \| 類型 \| 必填 \| 說明 \|
	\|---\|---\|---\|---\|
	\| `scan_id` \| string \| ✅ \| 掃描唯一識別符 \|
	\| `risk_score` \| number \| ✅ \| 風險分數（0-100） \|
	\| `risk_trend` \| string \| ✅ \| 風險趨勢（"+7", "-3", "+0"） \|
	\| `analysis` \| array \| ✅ \| 分析結果清單 \|
	\| `analysis[].cve_id` \| string \| ✅ \| CVE 編號 \|
	\| `analysis[].original_cvss` \| number \| ✅ \| 原始 CVSS 分數 \|
	\| `analysis[].adjusted_risk` \| string \| ✅ \| 調整後風險等級 \|
	\| `analysis[].in_cisa_kev` \| bool \| ✅ \| 是否在 CISA KEV 中 \|
	\| `analysis[].exploit_available` \| bool \| ✅ \| 是否有公開 Exploit \|
	\| `analysis[].chain_risk` \| object \| ✅ \| 連鎖風險 \|
	\| `chain_risk.is_chain` \| bool \| ✅ \| 是否形成連鎖 \|
	\| `chain_risk.chain_with` \| string[] \| ✅ \| 連鎖的 CVE 清單 \|
	\| `chain_risk.chain_description` \| string \| ✅ \| 連鎖描述 \|
	\| `chain_risk.confidence` \| string \| ✅ \| HIGH/MEDIUM/NEEDS_VERIFICATION \|
	\| `analysis[].reasoning` \| string \| ✅ \| 推理依據 \|

	---

	## Critic 辯論結果

	```json
	{
	"debate_rounds": 2,
	"challenges": ["Redis 暴露前提未驗證"],
	"scorecard": {
	"evidence": 0.85,
	"chain_completeness": 0.80,
	"critique_quality": 0.75,
	"defense_quality": 0.70,
	"calibration": 0.90
	},
	"weighted_score": 80.5,
	"verdict": "MAINTAIN",
	"reasoning": "Evidence is strong, chain analysis is well-supported.",
	"generated_at": "2026-04-01T10:05:00Z",
	"_harness_skipped": false
	}
	```

	欄位說明：

	\| 欄位 \| 類型 \| 必填 \| 說明 \|
	\|---\|---\|---\|---\|
	\| `debate_rounds` \| number \| ✅ \| 辯論輪數 \|
	\| `challenges` \| string[] \| ✅ \| 挑戰清單 \|
	\| `scorecard` \| object \| ✅ \| 5 維評分卡 \|
	\| `scorecard.evidence` \| number \| ✅ \| 證據支持度（0-1） \|
	\| `scorecard.chain_completeness` \| number \| ✅ \| 路徑完整性（0-1） \|
	\| `scorecard.critique_quality` \| number \| ✅ \| 反駁品質（0-1） \|
	\| `scorecard.defense_quality` \| number \| ✅ \| 回應品質（0-1） \|
	\| `scorecard.calibration` \| number \| ✅ \| 信心校準（0-1） \|
	\| `weighted_score` \| number \| ✅ \| 加權總分（0-100） \|
	\| `verdict` \| string \| ✅ \| MAINTAIN/DOWNGRADE/SKIPPED \|
	\| `reasoning` \| string \| ✅ \| 裁決理由 \|
	\| `generated_at` \| string \| ✅ \| ISO 8601 時間戳 \|

	裁決規則：
	- weighted_score ≥ 50 → MAINTAIN
	- weighted_score < 50 → DOWNGRADE
	- 有 CVE 在 CISA KEV 中 → 禁止 DOWNGRADE

	---

	## Advisor → UI

	```json
	{
	"executive_summary": "1 actively exploited chain. Risk increased.",
	"actions": {
	"urgent": [
	{
	"cve_id": "CVE-2024-XXXX",
	"package": "django",
	"severity": "CRITICAL",
	"action": "Update Django to latest patched version.",
	"command": "pip install --upgrade django",
	"reason": "In CISA KEV with public exploit.",
	"is_repeated": false
	}
	],
	"important": [
	{
	"cve_id": "CVE-2024-YYYY",
	"package": "redis",
	"severity": "HIGH",
	"action": "Update Redis and verify network exposure.",
	"reason": "Part of attack chain."
	}
	],
	"resolved": []
	},
	"risk_score": 85,
	"risk_trend": "+7",
	"scan_count": 2,
	"generated_at": "2026-04-01T10:06:00Z"
	}
	```

	欄位說明：

	\| 欄位 \| 類型 \| 必填 \| 說明 \|
	\|---\|---\|---\|---\|
	\| `executive_summary` \| string \| ✅ \| 一句話摘要 \|
	\| `actions` \| object \| ✅ \| 行動清單 \|
	\| `actions.urgent` \| array \| ✅ \| 緊急行動 \|
	\| `actions.urgent[].cve_id` \| string \| ✅ \| CVE 編號 \|
	\| `actions.urgent[].package` \| string \| ✅ \| 套件名稱 \|
	\| `actions.urgent[].severity` \| string \| ✅ \| CRITICAL/HIGH \|
	\| `actions.urgent[].action` \| string \| ✅ \| 修補說明 \|
	\| `actions.urgent[].command` \| string \| ✅ \| 具體指令 \|
	\| `actions.urgent[].reason` \| string \| ✅ \| 為何標記為 URGENT \|
	\| `actions.urgent[].is_repeated` \| bool \| ✅ \| 是否重複未修補 \|
	\| `actions.important` \| array \| ✅ \| 重要行動 \|
	\| `actions.resolved` \| array \| ✅ \| 已修補項目 \|
	\| `risk_score` \| number \| ✅ \| 風險分數（0-100） \|
	\| `risk_trend` \| string \| ✅ \| 風險趨勢 \|
	\| `scan_count` \| number \| ✅ \| 掃描次數 \|
	\| `generated_at` \| string \| ✅ \| ISO 8601 時間戳 \|

	---

	## Pipeline Meta（最終輸出附加欄位）

	```json
	{
	"pipeline_meta": {
	"pipeline_version": "3.0",
	"tech_stack": "Django 4.2, Redis 7.0",
	"stages_completed": 4,
	"stages_detail": {
	"scout": {
	"status": "SUCCESS",
	"vuln_count": 9,
	"duration_ms": 1200
	},
	"analyst": {
	"status": "SUCCESS",
	"risk_score": 85,
	"duration_ms": 800
	},
	"critic": {
	"status": "SUCCESS",
	"verdict": "MAINTAIN",
	"score": 80.5,
	"duration_ms": 600
	},
	"advisor": {
	"status": "SUCCESS",
	"urgent_count": 2,
	"duration_ms": 500
	}
	},
	"enable_critic": false,
	"critic_verdict": "SKIPPED",
	"critic_score": 0,
	"duration_seconds": 3.1,
	"degradation": {
	"level": 1,
	"label": "⚡ 全速運行",
	"degraded_components": [],
	"timestamp": "2026-04-01T10:06:00Z"
	},
	"generated_at": "2026-04-01T10:06:00Z"
	}
	}
	```

	欄位說明：

	\| 欄位 \| 類型 \| 必填 \| 說明 \|
	\|---\|---\|---\|---\|
	\| `pipeline_version` \| string \| ✅ \| Pipeline 版本（當前 3.0） \|
	\| `tech_stack` \| string \| ✅ \| 使用者輸入的技術堆疊 \|
	\| `stages_completed` \| number \| ✅ \| 完成的 Stage 數量 \|
	\| `stages_detail` \| object \| ✅ \| 每個 Stage 的詳細資訊 \|
	\| `stages_detail.{stage}.status` \| string \| ✅ \| SUCCESS/DEGRADED \|
	\| `enable_critic` \| bool \| ✅ \| Critic 是否啟用 \|
	\| `critic_verdict` \| string \| ✅ \| MAINTAIN/DOWNGRADE/SKIPPED \|
	\| `critic_score` \| number \| ✅ \| Critic 加權分數 \|
	\| `duration_seconds` \| number \| ✅ \| 總執行時間（秒） \|
	\| `degradation` \| object \| ✅ \| 降級狀態 \|
	\| `degradation.level` \| number \| ✅ \| 1-5（1=全速，5=最低生存） \|
	\| `degradation.label` \| string \| ✅ \| UI 顯示文字 \|
	\| `degradation.degraded_components` \| string[] \| ✅ \| 降級元件清單 \|

	降級層級定義：

	\| 層級 \| 標籤 \| 觸發條件 \|
	\|---\|---\|---\|
	\| 1 \| ⚡ 全速運行 \| 所有元件正常 \|
	\| 2 \| ⚠️ LLM 降級 \| vLLM → OpenRouter → OpenAI \|
	\| 3 \| ⚠️ API 降級 \| NVD/OTX → 離線快取 \|
	\| 4 \| 🔶 Agent 降級 \| Analyst/Critic 跳過 \|
	\| 5 \| 🔶 最低生存模式 \| 使用上次掃描結果 \|

	---

	## v3.1 新增：L0 淨化報告（input_sanitizer → Pipeline）

	```json
	{
	"safe": true,
	"input_type": "source_code",
	"truncated": false,
	"input_hash": "a3f8b1c2d4e5f6a7",
	"blocked_reason": "",
	"l0_findings": [
	{
	"pattern": "hardcoded_secret",
	"description": "硬編碼憑證（Credential Exposure 風險）",
	"line_no": 42,
	"severity": "WARNING"
	}
	],
	"l0_warning_count": 1
	}
	```

	`input_type` 枚舉：`"package_list"` / `"source_code"` / `"config_file"` / `"sql_review"` / `"blocked"`
	`safe=false` 時 Pipeline 直接返回錯誤，不進入任何 Agent

	---

	## v3.2 新增：SQL Syntax Review（孤立 `.sql` corpus → Pipeline/UI）

	```json
	{
	"sql_syntax_review": {
	"review_status": "completed",
	"review_type": "sql_syntax_review",
	"requires_application_context": true,
	"verified_application_vulnerability": false,
	"code_scan_excluded": true,
	"package_scan_allowed": false,
	"patterns": [
	{
	"syntax_pattern": "union_select",
	"line_no": 12,
	"category": "sql_injection_payload",
	"snippet": "UNION SELECT",
	"risk_note": "UNION-based SQL injection payload. Requires application input-flow context.",
	"requires_application_context": true
	}
	],
	"summary": {
	"total": 1,
	"patterns_detected": 1,
	"syntax_patterns": ["union_select"]
	}
	}
	}
	```

	契約限制：

	- `sql_syntax_review` 只表示 SQL corpus 內含危險語法或測試 payload。
	- `verified_application_vulnerability` 必須維持 `false`，直到同時有 application source/sink evidence。
	- SQL review 不得產生 package CVE/GHSA scan。
	- SQL review 不納入一般 source-code CWE recall 分母。

	---

	## v3.1 新增：Orchestrator 任務計畫（orchestrator → main.py）

	```json
	{
	"path": "B",
	"parallel_layer1": ["security_guard", "intel_fusion"],
	"agents_to_run": ["security_guard", "intel_fusion", "scout", "analyst", "debate", "advisor"],
	"shortcuts": [],
	"feedback_loop_count": 0,
	"l0_input_type": "source_code"
	}
	```

	`path` 枚舉：`"A"` 套件 / `"B"` 完整程式碼 / `"C"` 配置文件 / `"D"` 回饋補充

	---

	## v3.1 新增：Security Guard 輸出（security_guard → scout）

	```json
	{
	"extraction_status": "success",
	"functions": [
	{"name": "execute_query", "args": ["sql"], "line_no": 15, "suspicious": true, "reason": "字串拼接"}
	],
	"imports": [{"module": "os", "alias": null, "line_no": 1}],
	"patterns": [{"type": "sql_concat", "pattern": "f\"SELECT...\"", "line_no": 23, "severity": "HIGH"}],
	"hardcoded": [{"type": "api_key", "key_name": "AWS_SECRET", "line_no": 8}],
	"stats": {"total_lines": 150, "functions_found": 12, "patterns_found": 3}
	}
	```

	---

	## v3.1 新增：Intel Fusion 輸出（intel_fusion → scout）

	```json
	{
	"fusion_results": [
	{
	"cve_id": "CVE-2024-42005",
	"composite_score": 9.1,
	"confidence": "HIGH",
	"dimensions": {
	"nvd_cvss": 9.8, "epss_score": 0.97,
	"in_kev": true, "ghsa_hits": 3,
	"attack_techniques": 2, "otx_pulse_count": 5
	},
	"weights_used": {"nvd": 0.20, "epss": 0.00, "kev": 0.55, "ghsa": 0.10, "attack": 0.10, "otx": 0.05},
	"kev_shortcut": true,
	"cve_year": 2024
	}
	],
	"api_health": {"epss": "ok", "ghsa": "ok", "otx": "degraded"},
	"degraded": false
	}
	```

	動態加權規則：
	- `in_kev=true` → `epss_weight=0.00`，`kev_weight` 增至 0.55
	- `cve_year < 2020` → `epss_weight=0.10`
	- `otx_fail_rate > 0.5` → `otx_weight=0.01`

	---

	## v3.1 更新：Pipeline Meta 完整欄位

	```json
	{
	"pipeline_meta": {
	"pipeline_version": "3.1",
	"tech_stack": "Django 4.2, Redis 7.0",
	"stages_completed": 7,
	"stages_detail": {
	"orchestrator": {"status": "SUCCESS", "scan_path": "B", "l0_input_type": "source_code"},
	"security_guard": {"status": "SUCCESS", "functions_found": 12},
	"intel_fusion": {"status": "SUCCESS", "cves_scored": 2},
	"scout": {"status": "SUCCESS", "vuln_count": 2},
	"analyst": {"status": "SUCCESS", "risk_score": 85},
	"critic": {"status": "SUCCESS", "verdict": "MAINTAIN"},
	"advisor": {"status": "SUCCESS", "urgent_count": 1}
	},
	"enable_critic": true,
	"critic_verdict": "MAINTAIN",
	"critic_score": 80.5,
	"duration_seconds": 45.2,
	"degradation": {"level": 1, "label": "FULL_SPEED"},
	"generated_at": "2026-04-10T00:02:00Z",
	"l0_report": {"safe": true, "input_type": "source_code", "l0_warning_count": 0}
	}
	}
	```

	\| v3.1 新增欄位 \| 說明 \|
	\|---\|---\|
	\| `stages_detail.orchestrator` \| Orchestrator 路徑 + L0 類型 \|
	\| `stages_detail.security_guard` \| Security Guard 提取統計 \|
	\| `stages_detail.intel_fusion` \| Intel Fusion 計分統計 \|
	\| `stages_completed` \| 現在包含 orchestrator + 4 主 Stage（≥ 5）\|
	\| `l0_report` \| L0 淨化報告摘要 \|

	---

	## v3.3 更新：DIM11 Benchmark Context 與 Critic Recall Challenge

	此契約只在 `main.py` 偵測到 DIM11 fixture exact match 時啟用；一般使用者掃描不應產生 expected baseline，也不應套用 expected-CWE recall 判罰。

	```json
	{
	"benchmark_context": {
	"benchmark": "dim11_redteam",
	"fixture": "c_cpp_multivuln_01.c",
	"expected_cwe_categories": ["CWE-120", "CWE-134"],
	"observed_cwe_categories": ["CWE-120", "CWE-134"],
	"min_category_recall": 0.7,
	"route_correct": true,
	"expected_path": "B",
	"actual_path": "B",
	"external_pollution_count": 0,
	"activation": "exact_fixture_match"
	},
	"critic_recall_challenge": {
	"fixture": "c_cpp_multivuln_01.c",
	"expected_cwe_categories": ["CWE-120", "CWE-134"],
	"observed_cwe_categories": ["CWE-120", "CWE-134"],
	"missing_cwe_categories": [],
	"category_recall": 1.0,
	"min_category_recall": 0.7,
	"route_correct": true,
	"external_pollution_count": 0,
	"verdict": "PASS",
	"failed_reasons": []
	}
	}
	```

	\| 欄位 \| 說明 \|
	\|---\|---\|
	\| `benchmark_context` \| Pipeline 在 Analyst → Critic 之間注入的 benchmark baseline，只允許 fixture exact match。 \|
	\| `critic_recall_challenge` \| Critic deterministic challenge 結果；若 FAIL，Critic 會輸出 `verdict=DOWNGRADE` 與 `needs_rescan=true`。 \|
	\| `external_pollution_count` \| source code 沒有第三方 package evidence 卻出現外部 CVE/GHSA finding 時的污染計數。 \|
	\| `activation` \| 啟用條件；目前只允許 `exact_fixture_match`，避免一般掃描被 DIM11 catalog 誤判。 \|

	---

	## v3.4 更新：CWE Registry Canonical Fields

	`code_patterns_summary[]` 的 CWE mapping 以 `tools/cwe_registry.py` 為後端唯一來源。Security Guard 保留原始 `pattern_type`，Pipeline/Advisor/UI 使用 registry 產生 canonical 欄位，避免 `main.py`、Advisor、UI 各自維護不同 CWE 對照表。

	```json
	{
	"code_patterns_summary": [
	{
	"finding_id": "CODE-001",
	"type": "code_pattern",
	"pattern_type": "PROTOTYPE_POLLUTION",
	"cwe_id": "CWE-1321",
	"canonical_cwe_id": "CWE-1321",
	"weakness_family": "prototype_pollution",
	"evidence_type": "code_scan",
	"owasp_category": "A03:2021-Injection",
	"severity": "CRITICAL",
	"cwe_reference": {
	"id": "CWE-1321",
	"name": "Prototype Pollution",
	"source": "MITRE CWE v4.14",
	"representative_cves": []
	}
	}
	]
	}
	```

	\| 欄位 \| 說明 \|
	\|---\|---\|
	\| `canonical_cwe_id` \| registry 正規化後的 CWE ID；目前與 `cwe_id` 一致，保留給後續 taxonomy migration。 \|
	\| `weakness_family` \| 給 UI 與 recall 統計使用的穩定弱點家族名稱。 \|
	\| `evidence_type` \| 目前 code finding 固定為 `code_scan`；不得用它表示 package CVE。 \|
	\| `cwe_reference` \| 從 `tools.cwe_registry.build_cwe_reference()` 產生，優先引用 MITRE CWE 離線資料庫。 \|

	---

	## v3.5 更新：Intel Fusion Evidence Gating

	Intel Fusion 的 CVE/GHSA 輸出必須區分「本次掃描直接觀察到的 package advisory」與「用來佐證 CWE 風險的代表性 CVE」。Representative CVE 只能出現在 explanation / CWE support context，不得進入 `vulnerability_detail` 或 Advisor `actions.urgent/actions.important`。

	```json
	{
	"fusion_results": [
	{
	"cve_id": "CVE-2024-42005",
	"evidence_type": "direct_cve",
	"not_directly_observed": false,
	"must_not_enter_package_actions": false,
	"source_package_evidence": ["django"],
	"composite_score": 9.1
	},
	{
	"cve_id": "CVE-2024-9999",
	"evidence_type": "representative_cve",
	"supports_cwe": ["CWE-89"],
	"not_directly_observed": true,
	"must_not_enter_package_actions": true,
	"evidence_note": "Representative CVE for a Security Guard CWE finding; not a directly observed package CVE."
	}
	],
	"evidence_contract": {
	"direct_cve_count": 1,
	"cwe_support_count": 1,
	"package_target_count": 1,
	"representative_cve_count": 1
	}
	}
	```

	\| 欄位 \| 說明 \|
	\|---\|---\|
	\| `evidence_type=direct_cve` \| Scout/OSV/NVD/GHSA 直接從 package evidence 觀察到，可進 package action list。 \|
	\| `evidence_type=package_cve` \| Intel Fusion 從 package target 補充的外部 CVE，可進 package vulnerability flow。 \|
	\| `evidence_type=representative_cve` \| CWE support evidence，只能解釋 CWE 風險，不得當成本專案直接受影響 CVE。 \|
	\| `not_directly_observed` \| `true` 表示本次掃描沒有直接 package evidence。 \|
	\| `must_not_enter_package_actions` \| Advisor/Scout/main.py 必須把此類項目擋在 `actions` 與 `vulnerability_detail` 外。 \|
	\| `representative_cve_evidence` \| final output 中保存 representative CVE 的獨立欄位，供 UI/Thinking Path 顯示佐證。 \|