mindbomber commited on
Commit
2646360
·
verified ·
1 Parent(s): 29fd6c0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +36 -0
README.md CHANGED
@@ -63,6 +63,42 @@ S = (f_theta, E_phi, R, Pi_psi, G)
63
  The goal is not to claim perfect alignment. The goal is to make deployment-time
64
  correctability, evidence, gating, and auditability explicit.
65
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  ## Try AANA
67
 
68
  Use the public Hugging Face Space as the quickest way to try the AANA gate with
 
63
  The goal is not to claim perfect alignment. The goal is to make deployment-time
64
  correctability, evidence, gating, and auditability explicit.
65
 
66
+ ## Head-to-Head Finding
67
+
68
+ Across two public agent/tool-call sources, the strongest repeated signal is:
69
+
70
+ > AANA improves agent action reliability by combining structured pre-tool-call
71
+ > contracts, verifier gates, and evidence-recovery loops. In these diagnostics,
72
+ > AANA preserves unsafe-action recall while recovering more safe actions than
73
+ > permissive agents, single classifiers, prompt-only guards, LLM judges, or
74
+ > static contract gates.
75
+
76
+ Summary:
77
+
78
+ | Source | Architecture | Accuracy | Unsafe recall | Safe allow | FP | FN |
79
+ | --- | --- | ---: | ---: | ---: | ---: | ---: |
80
+ | Qwen traces | Permissive agent | `50.00%` | `0.00%` | `100.00%` | `0` | `180` |
81
+ | Qwen traces | Single classifier | `50.00%` | `100.00%` | `0.00%` | `180` | `0` |
82
+ | Qwen traces | Prompt-only guardrail | `81.67%` | `96.67%` | `66.67%` | `60` | `6` |
83
+ | Qwen traces | LLM-as-judge | `73.33%` | `100.00%` | `46.67%` | `96` | `0` |
84
+ | Qwen traces | Contract gate, no recovery | `92.78%` | `100.00%` | `85.56%` | `26` | `0` |
85
+ | Qwen traces | AANA with recovery | `100.00%` | `100.00%` | `100.00%` | `0` | `0` |
86
+ | Hermes traces | Permissive agent | `50.00%` | `0.00%` | `100.00%` | `0` | `180` |
87
+ | Hermes traces | Single classifier | `50.00%` | `100.00%` | `0.00%` | `180` | `0` |
88
+ | Hermes traces | Prompt-only guardrail | `93.06%` | `97.22%` | `88.89%` | `20` | `5` |
89
+ | Hermes traces | LLM-as-judge | `85.28%` | `99.44%` | `71.11%` | `52` | `1` |
90
+ | Hermes traces | Contract gate, no recovery | `92.22%` | `100.00%` | `84.44%` | `28` | `0` |
91
+ | Hermes traces | AANA with recovery | `100.00%` | `100.00%` | `100.00%` | `0` | `0` |
92
+
93
+ Evidence tiers matter. PIIMB is an official external benchmark submission.
94
+ The Qwen and Hermes head-to-heads use public datasets with reproducible
95
+ transforms and policy-derived labels, not human-reviewed safety labels. Local
96
+ blind action-gate runs are useful development ablations but weaker external
97
+ validity evidence.
98
+
99
+ Public summary:
100
+ https://mindbomber.github.io/Alignment-Aware-Neural-Architecture--AANA-/aana-head-to-head-findings.md
101
+
102
  ## Try AANA
103
 
104
  Use the public Hugging Face Space as the quickest way to try the AANA gate with