Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -32,6 +32,7 @@ datasets:
|
|
| 32 |
- mindbomber/aana-head-to-head-prompt-policy-vs-aana
|
| 33 |
- mindbomber/aana-head-to-head-llm-judge-vs-aana
|
| 34 |
- mindbomber/aana-head-to-head-contract-no-recovery-vs-aana
|
|
|
|
| 35 |
metrics:
|
| 36 |
- accuracy
|
| 37 |
- f_beta
|
|
@@ -711,6 +712,39 @@ support them, preserves true missing-authorization stressors, and corrects the
|
|
| 711 |
runtime route before final gating. The recovery pass does not read expected
|
| 712 |
labels, but the trace features are produced by the included transform scripts.
|
| 713 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 714 |
### PIIMB: Presidio + AANA
|
| 715 |
|
| 716 |
Official PIIMB submission:
|
|
|
|
| 32 |
- mindbomber/aana-head-to-head-prompt-policy-vs-aana
|
| 33 |
- mindbomber/aana-head-to-head-llm-judge-vs-aana
|
| 34 |
- mindbomber/aana-head-to-head-contract-no-recovery-vs-aana
|
| 35 |
+
- mindbomber/aana-external-validity-hermes-head-to-head
|
| 36 |
metrics:
|
| 37 |
- accuracy
|
| 38 |
- f_beta
|
|
|
|
| 712 |
runtime route before final gating. The recovery pass does not read expected
|
| 713 |
labels, but the trace features are produced by the included transform scripts.
|
| 714 |
|
| 715 |
+
### External Validity: Hermes Function-Calling Head-to-Head
|
| 716 |
+
|
| 717 |
+
Public validation artifact:
|
| 718 |
+
https://huggingface.co/datasets/mindbomber/aana-external-validity-hermes-head-to-head
|
| 719 |
+
|
| 720 |
+
Second source dataset:
|
| 721 |
+
https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1
|
| 722 |
+
|
| 723 |
+
Rows:
|
| 724 |
+
`360` transformed Hermes function-calling rows with moderate noisy-evidence
|
| 725 |
+
stressors
|
| 726 |
+
|
| 727 |
+
Status:
|
| 728 |
+
second-source architecture diagnostic, policy-derived labels, not an official
|
| 729 |
+
leaderboard
|
| 730 |
+
|
| 731 |
+
| Architecture | Accuracy | Unsafe recall | Block precision | Safe allow | Unsafe accept | False positives | False negatives |
|
| 732 |
+
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
| 733 |
+
| Permissive agent | `50.00%` | `0.00%` | `0.00%` | `100.00%` | `100.00%` | `0` | `180` |
|
| 734 |
+
| Single classifier | `50.00%` | `100.00%` | `50.00%` | `0.00%` | `0.00%` | `180` | `0` |
|
| 735 |
+
| Prompt-only policy guardrail | `93.06%` | `97.22%` | `89.74%` | `88.89%` | `2.78%` | `20` | `5` |
|
| 736 |
+
| LLM-as-judge safety checker | `85.28%` | `99.44%` | `77.49%` | `71.11%` | `0.56%` | `52` | `1` |
|
| 737 |
+
| Structured contract gate without recovery | `92.22%` | `100.00%` | `86.54%` | `84.44%` | `0.00%` | `28` | `0` |
|
| 738 |
+
| AANA with evidence recovery | `100.00%` | `100.00%` | `100.00%` | `100.00%` | `0.00%` | `0` | `0` |
|
| 739 |
+
|
| 740 |
+
This run improves source diversity by using an independent function-calling
|
| 741 |
+
dataset with different domains, schemas, and conversation format. It does not
|
| 742 |
+
provide human-reviewed safety labels: labels and counterfactual
|
| 743 |
+
missing-authorization rows are generated by the included transform scripts. The
|
| 744 |
+
main replicated pattern is that AANA's evidence-recovery loop preserves unsafe
|
| 745 |
+
recall while recovering safe allow better than flat classifiers, prompt-only
|
| 746 |
+
guards, LLM judges, or a static contract gate.
|
| 747 |
+
|
| 748 |
### PIIMB: Presidio + AANA
|
| 749 |
|
| 750 |
Official PIIMB submission:
|