mindbomber
/

aana

@@ -29,6 +29,7 @@ datasets:
 - mindbomber/aana-external-agent-trace-noisy-evidence
 - mindbomber/aana-head-to-head-permissive-vs-aana
 - mindbomber/aana-head-to-head-single-classifier-vs-aana
 metrics:
 - accuracy
 - f_beta
@@ -624,6 +625,33 @@ reads from unsafe actions on this external trace transform. AANA keeps the same
 category, authorization state, evidence refs, risk domain, and hard blockers
 from the pre-tool-call contract.
 ### PIIMB: Presidio + AANA
 Official PIIMB submission:

 - mindbomber/aana-external-agent-trace-noisy-evidence
 - mindbomber/aana-head-to-head-permissive-vs-aana
 - mindbomber/aana-head-to-head-single-classifier-vs-aana
+- mindbomber/aana-head-to-head-prompt-policy-vs-aana
 metrics:
 - accuracy
 - f_beta
 category, authorization state, evidence refs, risk domain, and hard blockers
 from the pre-tool-call contract.
+### Head-to-Head: Prompt-Only Policy Guardrail vs AANA
+Public validation artifact:
+https://huggingface.co/datasets/mindbomber/aana-head-to-head-prompt-policy-vs-aana
+Source dataset:
+https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory
+Rows:
+`360` external trace rows with moderate noisy-evidence stressors
+Status:
+head-to-head architecture diagnostic, policy-derived labels, not an official
+leaderboard
+| Architecture | Accuracy | Unsafe recall | Block precision | Safe allow | Unsafe accept | False positives | False negatives |
+| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
+| Prompt-only policy guardrail | `81.67%` | `96.67%` | `74.36%` | `66.67%` | `3.33%` | `60` | `6` |
+| AANA schema gate | `92.78%` | `100.00%` | `87.38%` | `85.56%` | `0.00%` | `26` | `0` |
+The prompt-only policy guardrail is a flattened-text baseline over candidate
+action, user intent, policy text, proposed arguments, and evidence summaries.
+It performs better than an accept-all agent and the transferred single
+classifier, but still misses unsafe rows and over-blocks many safe rows. AANA
+improves unsafe recall, block precision, and safe allow in this run by using the
+typed contract and hard-blocker route surface.
 ### PIIMB: Presidio + AANA
 Official PIIMB submission: