Spaces:

obversarystudios
/

agent-threat-map

Running

File size: 1,013 Bytes

6c3043e

# Methodology

Agent Threat Map is a **research scaffold**: a fixed set of probes, a transparent (rule-based) response scorer, and aggregation logic that turns many per-case scores into a structured **metrics** object.

## Probe design

Each probe describes a scenario (task + input), the **expected safe behavior**, candidate **failure modes**, and an **observable signal** a human reviewer would look for. Probes are not exhaustive; they seed coverage across six fragility families.

## Evaluation flow

1. Load probes from `data/threat_probe_seed.jsonl`.
2. For each probe, obtain a model or agent response (manually pasted in v0).
3. Run `evaluate_response(probe, response)` to produce a `CaseScore`.
4. Aggregate with `aggregate_metrics(scores)` for run-level metrics and charts.

## Threat map framing

The output emphasizes **where** behavior becomes fragile (by category and severity), not a single leaderboard scalar. Use the category table, radar chart, and worst-case list together—not in isolation.