File size: 5,148 Bytes
5c5457c 6c3043e 5c5457c 6c3043e 5c5457c 6c3043e 5c5457c 6c3043e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | ---
title: Agent Threat Map Observatory
emoji: 🧭
colorFrom: gray
colorTo: purple
sdk: gradio
sdk_version: 5.50.0
app_file: app.py
pinned: false
license: mit
short_description: Threat-map benchmark with metrics and geometry
---
# Agent Threat Map
Agent Threat Map is a research benchmark and observability scaffold for mapping fragile behavior in model-agent systems.
Instead of asking only whether a model answered correctly, this project asks **where and how** a model breaks under agent-like pressure.
The benchmark focuses on:
- Prompt injection
- Tool-output injection
- Retrieval poisoning
- Memory poisoning
- Secret exfiltration
- Unauthorized action
The goal is to make hidden model-agent failure modes **visible**, **structured**, and easier to compare across runs.
## What this is
- A JSONL probe set (`data/threat_probe_seed.jsonl`).
- A rule-based evaluator and **expanded run metrics** (distributions, severity weighting, composite indices, failure-mode histograms, per-category rollups).
- **Observable geometry** (same *family* as `failure-geometry-demo`): TF-IDF + SVD embeddings of scored cases → KMeans → mutual information vs category, severity, and pass/fail (`agent_threat_map/observability.py`). Results appear as `observability` in exported JSON and in the Gradio **Observable geometry** tab.
- A Gradio Space UI (`app.py`) for scoring pasted responses and exporting JSON reports.
## What this is not
This is **not** a production vulnerability scanner, security certification system, or complete red-team suite. See [docs/limitations.md](docs/limitations.md).
## Dataset schema (JSONL)
Each line is one probe:
| Field | Description |
| --- | --- |
| `id` | Stable identifier |
| `category` | One of the six families above |
| `fragility_area` | Finer-grained label |
| `severity` | `low` / `medium` / `high` / `critical` |
| `task` | What the agent is asked to do |
| `input` | User/tool/context text (may contain adversarial content) |
| `expected_behavior` | What safe handling looks like |
| `failure_modes` | Strings describing ways the run can go wrong |
| `observable_signal` | What a reviewer should look for |
| `safe_response_pattern` | Optional hints for benign completions |
Regenerate the seed file (60 probes) with:
```bash
python3 scripts/generate_threat_seed.py
```
## Run locally
From this directory (`agent-threat-map/`):
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python3 examples/run_local_eval.py
python3 app.py
```
`pip install -r requirements.txt` installs **`scikit-learn`** (needed for `observability` / geometry and for `examples/run_local_eval.py`). `examples/run_local_eval.py` writes `reports/sample_report.json` using a canned safe-ish response over all probes.
If **`pip install` fails** or **`.venv/bin/python` is missing**, remove the broken env (`rm -rf .venv`), ensure **PyPI is reachable** (DNS/network), recreate the venv, and run `pip install -r requirements.txt` again. Do not commit `.venv/` (it is gitignored).
## Hugging Face Space
The YAML **front matter above** is what Hugging Face reads when this README lives at the **Space repo root**. Deploy by copying this folder to the Space (or `hf upload` — see `scripts/push_spaces.sh` in the parent monorepo).
- **Runtime:** Python 3.10+ supported; **Python 3.13** needs `audioop-lts` (already listed in `requirements.txt`).
- **No API keys** required for the threat-map UI (manual paste only).
## Metrics overview
Run-level metrics are documented in [docs/scoring.md](docs/scoring.md). Highlights:
- **Distribution:** mean / median / P90 / max risk; weighted risk stats.
- **Severity-aware:** severity-weighted pass rate; high-stakes failure rate.
- **Signals:** boundary-language rate; safe vs unsafe signal totals / ratio.
- **Composites:** resilience index, exposure index, fragility spread (risk std dev).
- **Slices:** by category, by severity tier, failure-mode histogram, worst cases.
- **Observable geometry:** `MI(cluster, category)`, `MI(cluster, severity)`, `MI(cluster, pass_fail)` plus 2-D scatter coordinates per case (needs ≥5 cases by default).
## Related Spaces
- **[failure-geometry-demo](https://huggingface.co/spaces/obversarystudios/failure-geometry-demo)** — CARB failure geometry with sklearn baselines (no API key).
- **[carb-observability-space](https://huggingface.co/spaces/obversarystudios/carb-observability-space)** — same observability shape via HF Inference API (`HF_TOKEN` secret required).
- **[obversarystudios.org](https://obversarystudios.org)** — research narrative.
## What you should do on your machine
1. **Git:** Commit and push `agent-threat-map/` from your monorepo; merge any remote drift on GitHub first.
2. **Hub:** Create Space `obversarystudios/agent-threat-map` (or your namespace) if it does not exist, then run `bash scripts/push_spaces.sh` from the repo root (after `hf auth login`).
3. **Smoke-test:** After `pip install -r requirements.txt`, run `python3 examples/run_local_eval.py` and confirm `reports/sample_report.json` contains `"observability"`.
## License
See [LICENSE](LICENSE).
|