Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -63,6 +63,42 @@ S = (f_theta, E_phi, R, Pi_psi, G)
|
|
| 63 |
The goal is not to claim perfect alignment. The goal is to make deployment-time
|
| 64 |
correctability, evidence, gating, and auditability explicit.
|
| 65 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
## Try AANA
|
| 67 |
|
| 68 |
Use the public Hugging Face Space as the quickest way to try the AANA gate with
|
|
|
|
| 63 |
The goal is not to claim perfect alignment. The goal is to make deployment-time
|
| 64 |
correctability, evidence, gating, and auditability explicit.
|
| 65 |
|
| 66 |
+
## Head-to-Head Finding
|
| 67 |
+
|
| 68 |
+
Across two public agent/tool-call sources, the strongest repeated signal is:
|
| 69 |
+
|
| 70 |
+
> AANA improves agent action reliability by combining structured pre-tool-call
|
| 71 |
+
> contracts, verifier gates, and evidence-recovery loops. In these diagnostics,
|
| 72 |
+
> AANA preserves unsafe-action recall while recovering more safe actions than
|
| 73 |
+
> permissive agents, single classifiers, prompt-only guards, LLM judges, or
|
| 74 |
+
> static contract gates.
|
| 75 |
+
|
| 76 |
+
Summary:
|
| 77 |
+
|
| 78 |
+
| Source | Architecture | Accuracy | Unsafe recall | Safe allow | FP | FN |
|
| 79 |
+
| --- | --- | ---: | ---: | ---: | ---: | ---: |
|
| 80 |
+
| Qwen traces | Permissive agent | `50.00%` | `0.00%` | `100.00%` | `0` | `180` |
|
| 81 |
+
| Qwen traces | Single classifier | `50.00%` | `100.00%` | `0.00%` | `180` | `0` |
|
| 82 |
+
| Qwen traces | Prompt-only guardrail | `81.67%` | `96.67%` | `66.67%` | `60` | `6` |
|
| 83 |
+
| Qwen traces | LLM-as-judge | `73.33%` | `100.00%` | `46.67%` | `96` | `0` |
|
| 84 |
+
| Qwen traces | Contract gate, no recovery | `92.78%` | `100.00%` | `85.56%` | `26` | `0` |
|
| 85 |
+
| Qwen traces | AANA with recovery | `100.00%` | `100.00%` | `100.00%` | `0` | `0` |
|
| 86 |
+
| Hermes traces | Permissive agent | `50.00%` | `0.00%` | `100.00%` | `0` | `180` |
|
| 87 |
+
| Hermes traces | Single classifier | `50.00%` | `100.00%` | `0.00%` | `180` | `0` |
|
| 88 |
+
| Hermes traces | Prompt-only guardrail | `93.06%` | `97.22%` | `88.89%` | `20` | `5` |
|
| 89 |
+
| Hermes traces | LLM-as-judge | `85.28%` | `99.44%` | `71.11%` | `52` | `1` |
|
| 90 |
+
| Hermes traces | Contract gate, no recovery | `92.22%` | `100.00%` | `84.44%` | `28` | `0` |
|
| 91 |
+
| Hermes traces | AANA with recovery | `100.00%` | `100.00%` | `100.00%` | `0` | `0` |
|
| 92 |
+
|
| 93 |
+
Evidence tiers matter. PIIMB is an official external benchmark submission.
|
| 94 |
+
The Qwen and Hermes head-to-heads use public datasets with reproducible
|
| 95 |
+
transforms and policy-derived labels, not human-reviewed safety labels. Local
|
| 96 |
+
blind action-gate runs are useful development ablations but weaker external
|
| 97 |
+
validity evidence.
|
| 98 |
+
|
| 99 |
+
Public summary:
|
| 100 |
+
https://mindbomber.github.io/Alignment-Aware-Neural-Architecture--AANA-/aana-head-to-head-findings.md
|
| 101 |
+
|
| 102 |
## Try AANA
|
| 103 |
|
| 104 |
Use the public Hugging Face Space as the quickest way to try the AANA gate with
|