A newer version of the Gradio SDK is available: 6.14.0
title: Agent Threat Map Observatory
emoji: 🧭
colorFrom: gray
colorTo: purple
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
pinned: false
license: mit
short_description: Threat-map benchmark with metrics and geometry
Agent Threat Map
Agent Threat Map is a research benchmark and observability scaffold for mapping fragile behavior in model-agent systems.
Instead of asking only whether a model answered correctly, this project asks where and how a model breaks under agent-like pressure.
The benchmark focuses on:
- Prompt injection
- Tool-output injection
- Retrieval poisoning
- Memory poisoning
- Secret exfiltration
- Unauthorized action
The goal is to make hidden model-agent failure modes visible, structured, and easier to compare across runs.
What this is
- A JSONL probe set (
data/threat_probe_seed.jsonl). - A rule-based evaluator and expanded run metrics (distributions, severity weighting, composite indices, failure-mode histograms, per-category rollups).
- Observable geometry (same family as
failure-geometry-demo): TF-IDF + SVD embeddings of scored cases → KMeans → mutual information vs category, severity, and pass/fail (agent_threat_map/observability.py). Results appear asobservabilityin exported JSON and in the Gradio Observable geometry tab. - A Gradio Space UI (
app.py) for scoring pasted responses and exporting JSON reports.
What this is not
This is not a production vulnerability scanner, security certification system, or complete red-team suite. See docs/limitations.md.
Dataset schema (JSONL)
Each line is one probe:
| Field | Description |
|---|---|
id |
Stable identifier |
category |
One of the six families above |
fragility_area |
Finer-grained label |
severity |
low / medium / high / critical |
task |
What the agent is asked to do |
input |
User/tool/context text (may contain adversarial content) |
expected_behavior |
What safe handling looks like |
failure_modes |
Strings describing ways the run can go wrong |
observable_signal |
What a reviewer should look for |
safe_response_pattern |
Optional hints for benign completions |
Regenerate the seed file (60 probes) with:
python3 scripts/generate_threat_seed.py
Run locally
From this directory (agent-threat-map/):
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python3 examples/run_local_eval.py
python3 app.py
pip install -r requirements.txt installs scikit-learn (needed for observability / geometry and for examples/run_local_eval.py). examples/run_local_eval.py writes reports/sample_report.json using a canned safe-ish response over all probes.
If pip install fails or .venv/bin/python is missing, remove the broken env (rm -rf .venv), ensure PyPI is reachable (DNS/network), recreate the venv, and run pip install -r requirements.txt again. Do not commit .venv/ (it is gitignored).
Hugging Face Space
The YAML front matter above is what Hugging Face reads when this README lives at the Space repo root. Deploy by copying this folder to the Space (or hf upload — see scripts/push_spaces.sh in the parent monorepo).
- Runtime: Python 3.10+ supported; Python 3.13 needs
audioop-lts(already listed inrequirements.txt). - No API keys required for the threat-map UI (manual paste only).
Metrics overview
Run-level metrics are documented in docs/scoring.md. Highlights:
- Distribution: mean / median / P90 / max risk; weighted risk stats.
- Severity-aware: severity-weighted pass rate; high-stakes failure rate.
- Signals: boundary-language rate; safe vs unsafe signal totals / ratio.
- Composites: resilience index, exposure index, fragility spread (risk std dev).
- Slices: by category, by severity tier, failure-mode histogram, worst cases.
- Observable geometry:
MI(cluster, category),MI(cluster, severity),MI(cluster, pass_fail)plus 2-D scatter coordinates per case (needs ≥5 cases by default).
Related Spaces
- failure-geometry-demo — CARB failure geometry with sklearn baselines (no API key).
- carb-observability-space — same observability shape via HF Inference API (
HF_TOKENsecret required). - obversarystudios.org — research narrative.
What you should do on your machine
- Git: Commit and push
agent-threat-map/from your monorepo; merge any remote drift on GitHub first. - Hub: Create Space
obversarystudios/agent-threat-map(or your namespace) if it does not exist, then runbash scripts/push_spaces.shfrom the repo root (afterhf auth login). - Smoke-test: After
pip install -r requirements.txt, runpython3 examples/run_local_eval.pyand confirmreports/sample_report.jsoncontains"observability".
License
See LICENSE.