Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -29,6 +29,7 @@ datasets:
|
|
| 29 |
- mindbomber/aana-external-agent-trace-noisy-evidence
|
| 30 |
- mindbomber/aana-head-to-head-permissive-vs-aana
|
| 31 |
- mindbomber/aana-head-to-head-single-classifier-vs-aana
|
|
|
|
| 32 |
metrics:
|
| 33 |
- accuracy
|
| 34 |
- f_beta
|
|
@@ -624,6 +625,33 @@ reads from unsafe actions on this external trace transform. AANA keeps the same
|
|
| 624 |
category, authorization state, evidence refs, risk domain, and hard blockers
|
| 625 |
from the pre-tool-call contract.
|
| 626 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 627 |
### PIIMB: Presidio + AANA
|
| 628 |
|
| 629 |
Official PIIMB submission:
|
|
|
|
| 29 |
- mindbomber/aana-external-agent-trace-noisy-evidence
|
| 30 |
- mindbomber/aana-head-to-head-permissive-vs-aana
|
| 31 |
- mindbomber/aana-head-to-head-single-classifier-vs-aana
|
| 32 |
+
- mindbomber/aana-head-to-head-prompt-policy-vs-aana
|
| 33 |
metrics:
|
| 34 |
- accuracy
|
| 35 |
- f_beta
|
|
|
|
| 625 |
category, authorization state, evidence refs, risk domain, and hard blockers
|
| 626 |
from the pre-tool-call contract.
|
| 627 |
|
| 628 |
+
### Head-to-Head: Prompt-Only Policy Guardrail vs AANA
|
| 629 |
+
|
| 630 |
+
Public validation artifact:
|
| 631 |
+
https://huggingface.co/datasets/mindbomber/aana-head-to-head-prompt-policy-vs-aana
|
| 632 |
+
|
| 633 |
+
Source dataset:
|
| 634 |
+
https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory
|
| 635 |
+
|
| 636 |
+
Rows:
|
| 637 |
+
`360` external trace rows with moderate noisy-evidence stressors
|
| 638 |
+
|
| 639 |
+
Status:
|
| 640 |
+
head-to-head architecture diagnostic, policy-derived labels, not an official
|
| 641 |
+
leaderboard
|
| 642 |
+
|
| 643 |
+
| Architecture | Accuracy | Unsafe recall | Block precision | Safe allow | Unsafe accept | False positives | False negatives |
|
| 644 |
+
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
| 645 |
+
| Prompt-only policy guardrail | `81.67%` | `96.67%` | `74.36%` | `66.67%` | `3.33%` | `60` | `6` |
|
| 646 |
+
| AANA schema gate | `92.78%` | `100.00%` | `87.38%` | `85.56%` | `0.00%` | `26` | `0` |
|
| 647 |
+
|
| 648 |
+
The prompt-only policy guardrail is a flattened-text baseline over candidate
|
| 649 |
+
action, user intent, policy text, proposed arguments, and evidence summaries.
|
| 650 |
+
It performs better than an accept-all agent and the transferred single
|
| 651 |
+
classifier, but still misses unsafe rows and over-blocks many safe rows. AANA
|
| 652 |
+
improves unsafe recall, block precision, and safe allow in this run by using the
|
| 653 |
+
typed contract and hard-blocker route surface.
|
| 654 |
+
|
| 655 |
### PIIMB: Presidio + AANA
|
| 656 |
|
| 657 |
Official PIIMB submission:
|