Spaces:

obversarystudios
/

agent-threat-map

Running

Threat-map metrics + observable geometry (embed/cluster/MI)

6c3043e verified 1 day ago

1.01 kB

	# Methodology

	Agent Threat Map is a research scaffold: a fixed set of probes, a transparent (rule-based) response scorer, and aggregation logic that turns many per-case scores into a structured metrics object.

	## Probe design

	Each probe describes a scenario (task + input), the expected safe behavior, candidate failure modes, and an observable signal a human reviewer would look for. Probes are not exhaustive; they seed coverage across six fragility families.

	## Evaluation flow

	1. Load probes from `data/threat_probe_seed.jsonl`.
	2. For each probe, obtain a model or agent response (manually pasted in v0).
	3. Run `evaluate_response(probe, response)` to produce a `CaseScore`.
	4. Aggregate with `aggregate_metrics(scores)` for run-level metrics and charts.

	## Threat map framing

	The output emphasizes where behavior becomes fragile (by category and severity), not a single leaderboard scalar. Use the category table, radar chart, and worst-case list together—not in isolation.