mindbomber commited on
Commit
29fd6c0
·
verified ·
1 Parent(s): d4daa1f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md CHANGED
@@ -32,6 +32,7 @@ datasets:
32
  - mindbomber/aana-head-to-head-prompt-policy-vs-aana
33
  - mindbomber/aana-head-to-head-llm-judge-vs-aana
34
  - mindbomber/aana-head-to-head-contract-no-recovery-vs-aana
 
35
  metrics:
36
  - accuracy
37
  - f_beta
@@ -711,6 +712,39 @@ support them, preserves true missing-authorization stressors, and corrects the
711
  runtime route before final gating. The recovery pass does not read expected
712
  labels, but the trace features are produced by the included transform scripts.
713
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
714
  ### PIIMB: Presidio + AANA
715
 
716
  Official PIIMB submission:
 
32
  - mindbomber/aana-head-to-head-prompt-policy-vs-aana
33
  - mindbomber/aana-head-to-head-llm-judge-vs-aana
34
  - mindbomber/aana-head-to-head-contract-no-recovery-vs-aana
35
+ - mindbomber/aana-external-validity-hermes-head-to-head
36
  metrics:
37
  - accuracy
38
  - f_beta
 
712
  runtime route before final gating. The recovery pass does not read expected
713
  labels, but the trace features are produced by the included transform scripts.
714
 
715
+ ### External Validity: Hermes Function-Calling Head-to-Head
716
+
717
+ Public validation artifact:
718
+ https://huggingface.co/datasets/mindbomber/aana-external-validity-hermes-head-to-head
719
+
720
+ Second source dataset:
721
+ https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1
722
+
723
+ Rows:
724
+ `360` transformed Hermes function-calling rows with moderate noisy-evidence
725
+ stressors
726
+
727
+ Status:
728
+ second-source architecture diagnostic, policy-derived labels, not an official
729
+ leaderboard
730
+
731
+ | Architecture | Accuracy | Unsafe recall | Block precision | Safe allow | Unsafe accept | False positives | False negatives |
732
+ | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
733
+ | Permissive agent | `50.00%` | `0.00%` | `0.00%` | `100.00%` | `100.00%` | `0` | `180` |
734
+ | Single classifier | `50.00%` | `100.00%` | `50.00%` | `0.00%` | `0.00%` | `180` | `0` |
735
+ | Prompt-only policy guardrail | `93.06%` | `97.22%` | `89.74%` | `88.89%` | `2.78%` | `20` | `5` |
736
+ | LLM-as-judge safety checker | `85.28%` | `99.44%` | `77.49%` | `71.11%` | `0.56%` | `52` | `1` |
737
+ | Structured contract gate without recovery | `92.22%` | `100.00%` | `86.54%` | `84.44%` | `0.00%` | `28` | `0` |
738
+ | AANA with evidence recovery | `100.00%` | `100.00%` | `100.00%` | `100.00%` | `0.00%` | `0` | `0` |
739
+
740
+ This run improves source diversity by using an independent function-calling
741
+ dataset with different domains, schemas, and conversation format. It does not
742
+ provide human-reviewed safety labels: labels and counterfactual
743
+ missing-authorization rows are generated by the included transform scripts. The
744
+ main replicated pattern is that AANA's evidence-recovery loop preserves unsafe
745
+ recall while recovering safe allow better than flat classifiers, prompt-only
746
+ guards, LLM judges, or a static contract gate.
747
+
748
  ### PIIMB: Presidio + AANA
749
 
750
  Official PIIMB submission: