--- title: Agent Threat Map Observatory emoji: 🧭 colorFrom: gray colorTo: purple sdk: gradio sdk_version: 5.50.0 app_file: app.py pinned: false license: mit short_description: Threat-map benchmark with metrics and geometry --- # Agent Threat Map Agent Threat Map is a research benchmark and observability scaffold for mapping fragile behavior in model-agent systems. Instead of asking only whether a model answered correctly, this project asks **where and how** a model breaks under agent-like pressure. The benchmark focuses on: - Prompt injection - Tool-output injection - Retrieval poisoning - Memory poisoning - Secret exfiltration - Unauthorized action The goal is to make hidden model-agent failure modes **visible**, **structured**, and easier to compare across runs. ## What this is - A JSONL probe set (`data/threat_probe_seed.jsonl`). - A rule-based evaluator and **expanded run metrics** (distributions, severity weighting, composite indices, failure-mode histograms, per-category rollups). - **Observable geometry** (same *family* as `failure-geometry-demo`): TF-IDF + SVD embeddings of scored cases → KMeans → mutual information vs category, severity, and pass/fail (`agent_threat_map/observability.py`). Results appear as `observability` in exported JSON and in the Gradio **Observable geometry** tab. - A Gradio Space UI (`app.py`) for scoring pasted responses and exporting JSON reports. ## What this is not This is **not** a production vulnerability scanner, security certification system, or complete red-team suite. See [docs/limitations.md](docs/limitations.md). ## Dataset schema (JSONL) Each line is one probe: | Field | Description | | --- | --- | | `id` | Stable identifier | | `category` | One of the six families above | | `fragility_area` | Finer-grained label | | `severity` | `low` / `medium` / `high` / `critical` | | `task` | What the agent is asked to do | | `input` | User/tool/context text (may contain adversarial content) | | `expected_behavior` | What safe handling looks like | | `failure_modes` | Strings describing ways the run can go wrong | | `observable_signal` | What a reviewer should look for | | `safe_response_pattern` | Optional hints for benign completions | Regenerate the seed file (60 probes) with: ```bash python3 scripts/generate_threat_seed.py ``` ## Run locally From this directory (`agent-threat-map/`): ```bash python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt python3 examples/run_local_eval.py python3 app.py ``` `pip install -r requirements.txt` installs **`scikit-learn`** (needed for `observability` / geometry and for `examples/run_local_eval.py`). `examples/run_local_eval.py` writes `reports/sample_report.json` using a canned safe-ish response over all probes. If **`pip install` fails** or **`.venv/bin/python` is missing**, remove the broken env (`rm -rf .venv`), ensure **PyPI is reachable** (DNS/network), recreate the venv, and run `pip install -r requirements.txt` again. Do not commit `.venv/` (it is gitignored). ## Hugging Face Space The YAML **front matter above** is what Hugging Face reads when this README lives at the **Space repo root**. Deploy by copying this folder to the Space (or `hf upload` — see `scripts/push_spaces.sh` in the parent monorepo). - **Runtime:** Python 3.10+ supported; **Python 3.13** needs `audioop-lts` (already listed in `requirements.txt`). - **No API keys** required for the threat-map UI (manual paste only). ## Metrics overview Run-level metrics are documented in [docs/scoring.md](docs/scoring.md). Highlights: - **Distribution:** mean / median / P90 / max risk; weighted risk stats. - **Severity-aware:** severity-weighted pass rate; high-stakes failure rate. - **Signals:** boundary-language rate; safe vs unsafe signal totals / ratio. - **Composites:** resilience index, exposure index, fragility spread (risk std dev). - **Slices:** by category, by severity tier, failure-mode histogram, worst cases. - **Observable geometry:** `MI(cluster, category)`, `MI(cluster, severity)`, `MI(cluster, pass_fail)` plus 2-D scatter coordinates per case (needs ≥5 cases by default). ## Related Spaces - **[failure-geometry-demo](https://huggingface.co/spaces/obversarystudios/failure-geometry-demo)** — CARB failure geometry with sklearn baselines (no API key). - **[carb-observability-space](https://huggingface.co/spaces/obversarystudios/carb-observability-space)** — same observability shape via HF Inference API (`HF_TOKEN` secret required). - **[obversarystudios.org](https://obversarystudios.org)** — research narrative. ## What you should do on your machine 1. **Git:** Commit and push `agent-threat-map/` from your monorepo; merge any remote drift on GitHub first. 2. **Hub:** Create Space `obversarystudios/agent-threat-map` (or your namespace) if it does not exist, then run `bash scripts/push_spaces.sh` from the repo root (after `hf auth login`). 3. **Smoke-test:** After `pip install -r requirements.txt`, run `python3 examples/run_local_eval.py` and confirm `reports/sample_report.json` contains `"observability"`. ## License See [LICENSE](LICENSE).