Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -30,6 +30,7 @@ datasets:
|
|
| 30 |
- mindbomber/aana-head-to-head-permissive-vs-aana
|
| 31 |
- mindbomber/aana-head-to-head-single-classifier-vs-aana
|
| 32 |
- mindbomber/aana-head-to-head-prompt-policy-vs-aana
|
|
|
|
| 33 |
metrics:
|
| 34 |
- accuracy
|
| 35 |
- f_beta
|
|
@@ -652,6 +653,35 @@ classifier, but still misses unsafe rows and over-blocks many safe rows. AANA
|
|
| 652 |
improves unsafe recall, block precision, and safe allow in this run by using the
|
| 653 |
typed contract and hard-blocker route surface.
|
| 654 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 655 |
### PIIMB: Presidio + AANA
|
| 656 |
|
| 657 |
Official PIIMB submission:
|
|
|
|
| 30 |
- mindbomber/aana-head-to-head-permissive-vs-aana
|
| 31 |
- mindbomber/aana-head-to-head-single-classifier-vs-aana
|
| 32 |
- mindbomber/aana-head-to-head-prompt-policy-vs-aana
|
| 33 |
+
- mindbomber/aana-head-to-head-llm-judge-vs-aana
|
| 34 |
metrics:
|
| 35 |
- accuracy
|
| 36 |
- f_beta
|
|
|
|
| 653 |
improves unsafe recall, block precision, and safe allow in this run by using the
|
| 654 |
typed contract and hard-blocker route surface.
|
| 655 |
|
| 656 |
+
### Head-to-Head: LLM-as-Judge Safety Checker vs AANA
|
| 657 |
+
|
| 658 |
+
Public validation artifact:
|
| 659 |
+
https://huggingface.co/datasets/mindbomber/aana-head-to-head-llm-judge-vs-aana
|
| 660 |
+
|
| 661 |
+
Source dataset:
|
| 662 |
+
https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory
|
| 663 |
+
|
| 664 |
+
Rows:
|
| 665 |
+
`360` external trace rows with moderate noisy-evidence stressors
|
| 666 |
+
|
| 667 |
+
LLM judge:
|
| 668 |
+
`gpt-4o-mini`
|
| 669 |
+
|
| 670 |
+
Status:
|
| 671 |
+
head-to-head architecture diagnostic, policy-derived labels, not an official
|
| 672 |
+
leaderboard
|
| 673 |
+
|
| 674 |
+
| Architecture | Accuracy | Unsafe recall | Block precision | Safe allow | Unsafe accept | False positives | False negatives |
|
| 675 |
+
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
| 676 |
+
| LLM-as-judge safety checker | `73.33%` | `100.00%` | `65.22%` | `46.67%` | `0.00%` | `96` | `0` |
|
| 677 |
+
| AANA schema gate | `92.78%` | `100.00%` | `87.38%` | `85.56%` | `0.00%` | `26` | `0` |
|
| 678 |
+
|
| 679 |
+
The live LLM-as-judge baseline is conservative: it blocks all unsafe rows, but
|
| 680 |
+
also blocks many safe identity lookup and authenticated/private-read calls when
|
| 681 |
+
the evidence is noisy or flattened. AANA preserves the same unsafe recall while
|
| 682 |
+
allowing substantially more safe calls by using explicit tool category,
|
| 683 |
+
authorization state, evidence refs, schema validation, and hard blockers.
|
| 684 |
+
|
| 685 |
### PIIMB: Presidio + AANA
|
| 686 |
|
| 687 |
Official PIIMB submission:
|