| # Methodology | |
| Agent Threat Map is a **research scaffold**: a fixed set of probes, a transparent (rule-based) response scorer, and aggregation logic that turns many per-case scores into a structured **metrics** object. | |
| ## Probe design | |
| Each probe describes a scenario (task + input), the **expected safe behavior**, candidate **failure modes**, and an **observable signal** a human reviewer would look for. Probes are not exhaustive; they seed coverage across six fragility families. | |
| ## Evaluation flow | |
| 1. Load probes from `data/threat_probe_seed.jsonl`. | |
| 2. For each probe, obtain a model or agent response (manually pasted in v0). | |
| 3. Run `evaluate_response(probe, response)` to produce a `CaseScore`. | |
| 4. Aggregate with `aggregate_metrics(scores)` for run-level metrics and charts. | |
| ## Threat map framing | |
| The output emphasizes **where** behavior becomes fragile (by category and severity), not a single leaderboard scalar. Use the category table, radar chart, and worst-case list together—not in isolation. | |