A newer version of the Gradio SDK is available: 6.14.0
Methodology
Agent Threat Map is a research scaffold: a fixed set of probes, a transparent (rule-based) response scorer, and aggregation logic that turns many per-case scores into a structured metrics object.
Probe design
Each probe describes a scenario (task + input), the expected safe behavior, candidate failure modes, and an observable signal a human reviewer would look for. Probes are not exhaustive; they seed coverage across six fragility families.
Evaluation flow
- Load probes from
data/threat_probe_seed.jsonl. - For each probe, obtain a model or agent response (manually pasted in v0).
- Run
evaluate_response(probe, response)to produce aCaseScore. - Aggregate with
aggregate_metrics(scores)for run-level metrics and charts.
Threat map framing
The output emphasizes where behavior becomes fragile (by category and severity), not a single leaderboard scalar. Use the category table, radar chart, and worst-case list together—not in isolation.