| --- |
| title: Agent Threat Map Observatory |
| emoji: π§ |
| colorFrom: gray |
| colorTo: purple |
| sdk: gradio |
| sdk_version: 5.50.0 |
| app_file: app.py |
| pinned: false |
| license: mit |
| short_description: Threat-map benchmark with metrics and geometry |
| --- |
| |
| # Agent Threat Map |
|
|
| Agent Threat Map is a research benchmark and observability scaffold for mapping fragile behavior in model-agent systems. |
|
|
| Instead of asking only whether a model answered correctly, this project asks **where and how** a model breaks under agent-like pressure. |
|
|
| The benchmark focuses on: |
|
|
| - Prompt injection |
| - Tool-output injection |
| - Retrieval poisoning |
| - Memory poisoning |
| - Secret exfiltration |
| - Unauthorized action |
|
|
| The goal is to make hidden model-agent failure modes **visible**, **structured**, and easier to compare across runs. |
|
|
| ## What this is |
|
|
| - A JSONL probe set (`data/threat_probe_seed.jsonl`). |
| - A rule-based evaluator and **expanded run metrics** (distributions, severity weighting, composite indices, failure-mode histograms, per-category rollups). |
| - **Observable geometry** (same *family* as `failure-geometry-demo`): TF-IDF + SVD embeddings of scored cases β KMeans β mutual information vs category, severity, and pass/fail (`agent_threat_map/observability.py`). Results appear as `observability` in exported JSON and in the Gradio **Observable geometry** tab. |
| - A Gradio Space UI (`app.py`) for scoring pasted responses and exporting JSON reports. |
|
|
| ## What this is not |
|
|
| This is **not** a production vulnerability scanner, security certification system, or complete red-team suite. See [docs/limitations.md](docs/limitations.md). |
|
|
| ## Dataset schema (JSONL) |
|
|
| Each line is one probe: |
|
|
| | Field | Description | |
| | --- | --- | |
| | `id` | Stable identifier | |
| | `category` | One of the six families above | |
| | `fragility_area` | Finer-grained label | |
| | `severity` | `low` / `medium` / `high` / `critical` | |
| | `task` | What the agent is asked to do | |
| | `input` | User/tool/context text (may contain adversarial content) | |
| | `expected_behavior` | What safe handling looks like | |
| | `failure_modes` | Strings describing ways the run can go wrong | |
| | `observable_signal` | What a reviewer should look for | |
| | `safe_response_pattern` | Optional hints for benign completions | |
|
|
| Regenerate the seed file (60 probes) with: |
|
|
| ```bash |
| python3 scripts/generate_threat_seed.py |
| ``` |
|
|
| ## Run locally |
|
|
| From this directory (`agent-threat-map/`): |
|
|
| ```bash |
| python3 -m venv .venv |
| source .venv/bin/activate |
| pip install -r requirements.txt |
| python3 examples/run_local_eval.py |
| python3 app.py |
| ``` |
|
|
| `pip install -r requirements.txt` installs **`scikit-learn`** (needed for `observability` / geometry and for `examples/run_local_eval.py`). `examples/run_local_eval.py` writes `reports/sample_report.json` using a canned safe-ish response over all probes. |
|
|
| If **`pip install` fails** or **`.venv/bin/python` is missing**, remove the broken env (`rm -rf .venv`), ensure **PyPI is reachable** (DNS/network), recreate the venv, and run `pip install -r requirements.txt` again. Do not commit `.venv/` (it is gitignored). |
|
|
| ## Hugging Face Space |
|
|
| The YAML **front matter above** is what Hugging Face reads when this README lives at the **Space repo root**. Deploy by copying this folder to the Space (or `hf upload` β see `scripts/push_spaces.sh` in the parent monorepo). |
|
|
| - **Runtime:** Python 3.10+ supported; **Python 3.13** needs `audioop-lts` (already listed in `requirements.txt`). |
| - **No API keys** required for the threat-map UI (manual paste only). |
|
|
| ## Metrics overview |
|
|
| Run-level metrics are documented in [docs/scoring.md](docs/scoring.md). Highlights: |
|
|
| - **Distribution:** mean / median / P90 / max risk; weighted risk stats. |
| - **Severity-aware:** severity-weighted pass rate; high-stakes failure rate. |
| - **Signals:** boundary-language rate; safe vs unsafe signal totals / ratio. |
| - **Composites:** resilience index, exposure index, fragility spread (risk std dev). |
| - **Slices:** by category, by severity tier, failure-mode histogram, worst cases. |
| - **Observable geometry:** `MI(cluster, category)`, `MI(cluster, severity)`, `MI(cluster, pass_fail)` plus 2-D scatter coordinates per case (needs β₯5 cases by default). |
|
|
| ## Related Spaces |
|
|
| - **[failure-geometry-demo](https://huggingface.co/spaces/obversarystudios/failure-geometry-demo)** β CARB failure geometry with sklearn baselines (no API key). |
| - **[carb-observability-space](https://huggingface.co/spaces/obversarystudios/carb-observability-space)** β same observability shape via HF Inference API (`HF_TOKEN` secret required). |
| - **[obversarystudios.org](https://obversarystudios.org)** β research narrative. |
|
|
| ## What you should do on your machine |
|
|
| 1. **Git:** Commit and push `agent-threat-map/` from your monorepo; merge any remote drift on GitHub first. |
| 2. **Hub:** Create Space `obversarystudios/agent-threat-map` (or your namespace) if it does not exist, then run `bash scripts/push_spaces.sh` from the repo root (after `hf auth login`). |
| 3. **Smoke-test:** After `pip install -r requirements.txt`, run `python3 examples/run_local_eval.py` and confirm `reports/sample_report.json` contains `"observability"`. |
|
|
| ## License |
|
|
| See [LICENSE](LICENSE). |
|
|