mindbomber commited on
Commit
7223da7
·
verified ·
1 Parent(s): b5bfae0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md CHANGED
@@ -30,6 +30,7 @@ datasets:
30
  - mindbomber/aana-head-to-head-permissive-vs-aana
31
  - mindbomber/aana-head-to-head-single-classifier-vs-aana
32
  - mindbomber/aana-head-to-head-prompt-policy-vs-aana
 
33
  metrics:
34
  - accuracy
35
  - f_beta
@@ -652,6 +653,35 @@ classifier, but still misses unsafe rows and over-blocks many safe rows. AANA
652
  improves unsafe recall, block precision, and safe allow in this run by using the
653
  typed contract and hard-blocker route surface.
654
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
655
  ### PIIMB: Presidio + AANA
656
 
657
  Official PIIMB submission:
 
30
  - mindbomber/aana-head-to-head-permissive-vs-aana
31
  - mindbomber/aana-head-to-head-single-classifier-vs-aana
32
  - mindbomber/aana-head-to-head-prompt-policy-vs-aana
33
+ - mindbomber/aana-head-to-head-llm-judge-vs-aana
34
  metrics:
35
  - accuracy
36
  - f_beta
 
653
  improves unsafe recall, block precision, and safe allow in this run by using the
654
  typed contract and hard-blocker route surface.
655
 
656
+ ### Head-to-Head: LLM-as-Judge Safety Checker vs AANA
657
+
658
+ Public validation artifact:
659
+ https://huggingface.co/datasets/mindbomber/aana-head-to-head-llm-judge-vs-aana
660
+
661
+ Source dataset:
662
+ https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory
663
+
664
+ Rows:
665
+ `360` external trace rows with moderate noisy-evidence stressors
666
+
667
+ LLM judge:
668
+ `gpt-4o-mini`
669
+
670
+ Status:
671
+ head-to-head architecture diagnostic, policy-derived labels, not an official
672
+ leaderboard
673
+
674
+ | Architecture | Accuracy | Unsafe recall | Block precision | Safe allow | Unsafe accept | False positives | False negatives |
675
+ | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
676
+ | LLM-as-judge safety checker | `73.33%` | `100.00%` | `65.22%` | `46.67%` | `0.00%` | `96` | `0` |
677
+ | AANA schema gate | `92.78%` | `100.00%` | `87.38%` | `85.56%` | `0.00%` | `26` | `0` |
678
+
679
+ The live LLM-as-judge baseline is conservative: it blocks all unsafe rows, but
680
+ also blocks many safe identity lookup and authenticated/private-read calls when
681
+ the evidence is noisy or flattened. AANA preserves the same unsafe recall while
682
+ allowing substantially more safe calls by using explicit tool category,
683
+ authorization state, evidence refs, schema validation, and hard blockers.
684
+
685
  ### PIIMB: Presidio + AANA
686
 
687
  Official PIIMB submission: