File size: 1,013 Bytes
6c3043e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | # Methodology
Agent Threat Map is a **research scaffold**: a fixed set of probes, a transparent (rule-based) response scorer, and aggregation logic that turns many per-case scores into a structured **metrics** object.
## Probe design
Each probe describes a scenario (task + input), the **expected safe behavior**, candidate **failure modes**, and an **observable signal** a human reviewer would look for. Probes are not exhaustive; they seed coverage across six fragility families.
## Evaluation flow
1. Load probes from `data/threat_probe_seed.jsonl`.
2. For each probe, obtain a model or agent response (manually pasted in v0).
3. Run `evaluate_response(probe, response)` to produce a `CaseScore`.
4. Aggregate with `aggregate_metrics(scores)` for run-level metrics and charts.
## Threat map framing
The output emphasizes **where** behavior becomes fragile (by category and severity), not a single leaderboard scalar. Use the category table, radar chart, and worst-case list together—not in isolation.
|