mindbomber commited on
Commit
b5bfae0
·
verified ·
1 Parent(s): ad49845

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +28 -0
README.md CHANGED
@@ -29,6 +29,7 @@ datasets:
29
  - mindbomber/aana-external-agent-trace-noisy-evidence
30
  - mindbomber/aana-head-to-head-permissive-vs-aana
31
  - mindbomber/aana-head-to-head-single-classifier-vs-aana
 
32
  metrics:
33
  - accuracy
34
  - f_beta
@@ -624,6 +625,33 @@ reads from unsafe actions on this external trace transform. AANA keeps the same
624
  category, authorization state, evidence refs, risk domain, and hard blockers
625
  from the pre-tool-call contract.
626
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
627
  ### PIIMB: Presidio + AANA
628
 
629
  Official PIIMB submission:
 
29
  - mindbomber/aana-external-agent-trace-noisy-evidence
30
  - mindbomber/aana-head-to-head-permissive-vs-aana
31
  - mindbomber/aana-head-to-head-single-classifier-vs-aana
32
+ - mindbomber/aana-head-to-head-prompt-policy-vs-aana
33
  metrics:
34
  - accuracy
35
  - f_beta
 
625
  category, authorization state, evidence refs, risk domain, and hard blockers
626
  from the pre-tool-call contract.
627
 
628
+ ### Head-to-Head: Prompt-Only Policy Guardrail vs AANA
629
+
630
+ Public validation artifact:
631
+ https://huggingface.co/datasets/mindbomber/aana-head-to-head-prompt-policy-vs-aana
632
+
633
+ Source dataset:
634
+ https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory
635
+
636
+ Rows:
637
+ `360` external trace rows with moderate noisy-evidence stressors
638
+
639
+ Status:
640
+ head-to-head architecture diagnostic, policy-derived labels, not an official
641
+ leaderboard
642
+
643
+ | Architecture | Accuracy | Unsafe recall | Block precision | Safe allow | Unsafe accept | False positives | False negatives |
644
+ | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
645
+ | Prompt-only policy guardrail | `81.67%` | `96.67%` | `74.36%` | `66.67%` | `3.33%` | `60` | `6` |
646
+ | AANA schema gate | `92.78%` | `100.00%` | `87.38%` | `85.56%` | `0.00%` | `26` | `0` |
647
+
648
+ The prompt-only policy guardrail is a flattened-text baseline over candidate
649
+ action, user intent, policy text, proposed arguments, and evidence summaries.
650
+ It performs better than an accept-all agent and the transferred single
651
+ classifier, but still misses unsafe rows and over-blocks many safe rows. AANA
652
+ improves unsafe recall, block precision, and safe allow in this run by using the
653
+ typed contract and hard-blocker route surface.
654
+
655
  ### PIIMB: Presidio + AANA
656
 
657
  Official PIIMB submission: