mindbomber commited on
Commit
e34e3d2
·
verified ·
1 Parent(s): e59f6ff

Update AANA diagnostic findings

Browse files
Files changed (1) hide show
  1. README.md +88 -959
README.md CHANGED
@@ -1,960 +1,89 @@
1
- ---
2
- license: mit
3
- tags:
4
- - aana
5
- - alignment
6
- - ai-safety
7
- - llm-evaluation
8
- - verifier
9
- - correction-loop
10
- - guardrails
11
- - agent-safety
12
- - pii
13
- - piimb
14
- datasets:
15
- - piimb/pii-masking-benchmark
16
- - truthfulqa/truthful_qa
17
- - wandb/RAGTruth-processed
18
- - PatronusAI/HaluBench
19
- - potsawee/wiki_bio_gpt3_hallucination
20
- - mindbomber/aana-cross-domain-action-gate-v2-tuned
21
- - mindbomber/aana-cross-domain-action-gate-v2-all-domains-tuned
22
- - mindbomber/aana-cross-domain-action-gate-blind-v3
23
- - mindbomber/aana-cross-domain-action-gate-blind-v4
24
- - mindbomber/aana-cross-domain-action-gate-blind-v5
25
- - mindbomber/aana-cross-domain-action-taxonomy-model-v5
26
- - mindbomber/aana-external-agent-trace-action-gate
27
- - mindbomber/aana-external-agent-trace-action-gate-v2
28
- - mindbomber/aana-agent-tool-contract-v1
29
- - mindbomber/aana-external-agent-trace-noisy-evidence
30
- - mindbomber/aana-head-to-head-permissive-vs-aana
31
- - mindbomber/aana-head-to-head-single-classifier-vs-aana
32
- - mindbomber/aana-head-to-head-prompt-policy-vs-aana
33
- - mindbomber/aana-head-to-head-llm-judge-vs-aana
34
- - mindbomber/aana-head-to-head-contract-no-recovery-vs-aana
35
- - mindbomber/aana-external-validity-hermes-head-to-head
36
- - mindbomber/aana-tau2-bench-gpt41mini-1trial
37
- metrics:
38
- - accuracy
39
- - f_beta
40
- library_name: aana
41
- pipeline_tag: text-classification
42
- ---
43
-
44
- # Alignment-Aware Neural Architecture (AANA)
45
-
46
- AANA is a verifier-grounded runtime architecture for making AI and agent outputs
47
- more correctable before they are published, sent, deployed, or used for
48
- consequential actions.
49
-
50
- It is not a standalone set of neural weights. AANA wraps a base generator or
51
- specialist detector with explicit verifier, grounding, correction, and gate
52
- components:
53
-
54
- ```text
55
- S = (f_theta, E_phi, R, Pi_psi, G)
56
- ```
57
-
58
- - `f_theta`: base generator, LLM, agent, tool planner, or specialist detector.
59
- - `E_phi`: verifier stack for factual, safety, policy, privacy, and task constraints.
60
- - `R`: retrieval or grounding module for evidence.
61
- - `Pi_psi`: correction policy that can accept, revise, retrieve, ask, refuse, or defer.
62
- - `G`: alignment gate that blocks unsupported final outputs or unsafe actions.
63
-
64
- The goal is not to claim perfect alignment. The goal is to make deployment-time
65
- correctability, evidence, gating, and auditability explicit.
66
-
67
- ## Head-to-Head Finding
68
-
69
- Across two public agent/tool-call sources, the strongest repeated signal is:
70
-
71
- > AANA improves agent action reliability by combining structured pre-tool-call
72
- > contracts, verifier gates, and evidence-recovery loops. In these diagnostics,
73
- > AANA preserves unsafe-action recall while recovering more safe actions than
74
- > permissive agents, single classifiers, prompt-only guards, LLM judges, or
75
- > static contract gates.
76
-
77
- Summary:
78
-
79
- | Source | Architecture | Accuracy | Unsafe recall | Safe allow | FP | FN |
80
- | --- | --- | ---: | ---: | ---: | ---: | ---: |
81
- | Qwen traces | Permissive agent | `50.00%` | `0.00%` | `100.00%` | `0` | `180` |
82
- | Qwen traces | Single classifier | `50.00%` | `100.00%` | `0.00%` | `180` | `0` |
83
- | Qwen traces | Prompt-only guardrail | `81.67%` | `96.67%` | `66.67%` | `60` | `6` |
84
- | Qwen traces | LLM-as-judge | `73.33%` | `100.00%` | `46.67%` | `96` | `0` |
85
- | Qwen traces | Contract gate, no recovery | `92.78%` | `100.00%` | `85.56%` | `26` | `0` |
86
- | Qwen traces | AANA with recovery | `100.00%` | `100.00%` | `100.00%` | `0` | `0` |
87
- | Hermes traces | Permissive agent | `50.00%` | `0.00%` | `100.00%` | `0` | `180` |
88
- | Hermes traces | Single classifier | `50.00%` | `100.00%` | `0.00%` | `180` | `0` |
89
- | Hermes traces | Prompt-only guardrail | `93.06%` | `97.22%` | `88.89%` | `20` | `5` |
90
- | Hermes traces | LLM-as-judge | `85.28%` | `99.44%` | `71.11%` | `52` | `1` |
91
- | Hermes traces | Contract gate, no recovery | `92.22%` | `100.00%` | `84.44%` | `28` | `0` |
92
- | Hermes traces | AANA with recovery | `100.00%` | `100.00%` | `100.00%` | `0` | `0` |
93
-
94
- Evidence tiers matter. PIIMB is an official external benchmark submission.
95
- The Qwen and Hermes head-to-heads use public datasets with reproducible
96
- transforms and policy-derived labels, not human-reviewed safety labels. Local
97
- blind action-gate runs are useful development ablations but weaker external
98
- validity evidence.
99
-
100
- Public summary:
101
- https://mindbomber.github.io/Alignment-Aware-Neural-Architecture--AANA-/aana-head-to-head-findings.md
102
-
103
- ## Try AANA
104
-
105
- Use the public Hugging Face Space as the quickest way to try the AANA gate with
106
- your own candidate answer/action, evidence, and constraints:
107
-
108
- https://huggingface.co/spaces/mindbomber/aana-demo
109
-
110
- The demo returns an AANA-style route (`accept`, `revise`, `ask`, `defer`, or
111
- `refuse`), AIx score, hard blockers, suggested revision/route, and audit summary.
112
-
113
- ## Current Public Benchmark Signals
114
-
115
- ### τ²-Bench: Custom Agent Tool-Use Scaffold
116
-
117
- Official PR:
118
- https://github.com/sierra-research/tau2-bench/pull/304
119
-
120
- Public result artifact:
121
- https://huggingface.co/datasets/mindbomber/aana-tau2-bench-gpt41mini-1trial
122
-
123
- Benchmark:
124
- `sierra-research/tau2-bench`
125
-
126
- Evaluation date:
127
- `2026-05-07`
128
-
129
- Configuration:
130
-
131
- - Agent model: `openai/gpt-4.1-mini`
132
- - User simulator: `openai/gpt-4.1-mini`
133
- - Trials: `1` per task
134
- - Domains: `airline`, `retail`, `telecom`, `banking_knowledge`
135
- - Banking retrieval: `bm25`
136
- - Submission type: `custom`
137
-
138
- AANA path:
139
- wrap the τ²-Bench text agent with a pre-tool-call contract gate that returns
140
- `accept`, `ask`, `defer`, or `refuse` before tool execution.
141
-
142
- | Domain | Pass^1 | Avg cost |
143
- | --- | ---: | ---: |
144
- | Airline | `44.00%` | `$0.0068` |
145
- | Retail | `38.60%` | `$0.0097` |
146
- | Telecom | `17.54%` | `$0.0224` |
147
- | Banking knowledge | `2.06%` | `$0.0073` |
148
-
149
- This is an official custom-submission attempt with validated trajectories, not
150
- a strong performance claim. The first τ²-Bench scaffold exposed the current
151
- architecture limitation clearly: AANA improves auditability and pre-tool-call
152
- control, but this implementation is too blunt for many write-heavy,
153
- retrieval-heavy, and customer-service workflows. The next AANA agent-workflow
154
- work should improve action-intent routing, authorization-state inference,
155
- retrieval grounding, and less conservative correction behavior.
156
-
157
- ### RAGTruth: Grounded Hallucination Gate
158
-
159
- Public result artifact:
160
- https://huggingface.co/datasets/mindbomber/aana-ragtruth-grounded-gate
161
-
162
- Benchmark:
163
- `wandb/RAGTruth-processed`
164
-
165
- Dataset revision:
166
- `eb4f4b9d1b68eb7092d3e1a61c0cd82d9808737b`
167
-
168
- Split:
169
- `test`
170
-
171
- Examples:
172
- `2700`
173
-
174
- Base path:
175
- accept existing model outputs as-is.
176
-
177
- AANA path:
178
- route low evidence-support outputs to `revise`.
179
-
180
- | Path | Unsafe accept rate on hallucinated outputs | Balanced accuracy | Hallucination recall |
181
- | --- | ---: | ---: | ---: |
182
- | Base accept-as-is | `1.000000` | `0.500000` | `0.000000` |
183
- | AANA evidence gate | `0.090138` | `0.649012` | `0.909862` |
184
-
185
- This result shows the intended runtime safety tradeoff: AANA greatly reduces
186
- unsafe acceptance of hallucinated grounded-generation outputs, while over-refusing
187
- some clean outputs.
188
-
189
- ### HaluBench: Grounded QA Gate
190
-
191
- Public result artifact:
192
- https://huggingface.co/datasets/mindbomber/aana-halubench-grounded-gate
193
-
194
- Benchmark:
195
- `PatronusAI/HaluBench`
196
-
197
- Dataset revision:
198
- `5966a87929f51c204ab3cbef986b449495cc97b6`
199
-
200
- Split:
201
- `test`
202
-
203
- Examples:
204
- `14900`
205
-
206
- Base path:
207
- accept candidate answers as-is.
208
-
209
- AANA path:
210
- route low evidence-support answers to `revise`.
211
-
212
- | Path | Unsafe accept rate on FAIL answers | Balanced accuracy | FAIL recall |
213
- | --- | ---: | ---: | ---: |
214
- | Base accept-as-is | `1.000000` | `0.500000` | `0.000000` |
215
- | AANA evidence gate | `0.142259` | `0.776930` | `0.857741` |
216
-
217
- Subset behavior is uneven: the gate performs strongly on `halueval` but
218
- over-refuses heavily on `FinanceBench`, `RAGTruth`, and `pubmedQA`.
219
-
220
- ### WikiBio GPT-3 Hallucination: Source-Supported Biography Sentences
221
-
222
- Public result artifact:
223
- https://huggingface.co/datasets/mindbomber/aana-wikibio-grounded-gate
224
-
225
- Benchmark:
226
- `potsawee/wiki_bio_gpt3_hallucination`
227
-
228
- Dataset revision:
229
- `b3cfb73209a8c51582fa1d9b7fe7e45fec5529b2`
230
-
231
- Split:
232
- `evaluation`
233
-
234
- Documents:
235
- `238`
236
-
237
- Sentence-level examples:
238
- `1908`
239
-
240
- Base path:
241
- accept each GPT-3 sentence as-is.
242
-
243
- AANA path:
244
- route low source-support sentences to `revise`.
245
-
246
- | Path | Unsafe accept rate on inaccurate sentences | Balanced accuracy | Inaccuracy recall |
247
- | --- | ---: | ---: | ---: |
248
- | Base accept-as-is | `1.000000` | `0.500000` | `0.000000` |
249
- | AANA evidence gate | `0.099138` | `0.702369` | `0.900862` |
250
-
251
- The gate flagged `94.6%` of major inaccurate sentences and `84.6%` of minor
252
- inaccurate sentences, while also flagging `49.6%` of accurate sentences.
253
-
254
- ### Grounded Gate Calibration
255
-
256
- Public calibration artifact:
257
- https://huggingface.co/datasets/mindbomber/aana-grounded-gate-calibration
258
-
259
- Calibration reduced false positives on RAGTruth, HaluBench, and WikiBio while
260
- preserving high recall floors. This is the deployment knob for choosing between
261
- more conservative revision behavior and fewer unnecessary interventions.
262
-
263
- | Benchmark | Calibrated threshold | Recall | Over-refusal | Unsafe accept |
264
- | --- | ---: | ---: | ---: | ---: |
265
- | RAGTruth | `0.20` | `0.884411` | `0.585657` | `0.115589` |
266
- | HaluBench | `0.90` | `0.833473` | `0.294825` | `0.166527` |
267
- | WikiBio GPT-3 hallucination | `0.05` | `0.866379` | `0.443798` | `0.133621` |
268
-
269
- ### HarmActionsEval-Style Agent-Action Gate
270
-
271
- Public result artifact:
272
- https://huggingface.co/datasets/mindbomber/aana-harmactions-agent-action-gate
273
-
274
- Benchmark:
275
- local Agent-Action-Guard HarmActionsEval-style rows
276
-
277
- Examples:
278
- `260`
279
-
280
- Base path:
281
- accept every proposed tool call as-is.
282
-
283
- AANA path:
284
- block actions when the agent-action gate fails, recommended action is not
285
- `accept`, or candidate AIx hard blockers are present.
286
-
287
- | Path | Unsafe accept rate on unsafe actions | Safe allow rate | Accuracy |
288
- | --- | ---: | ---: | ---: |
289
- | Base accept-as-is | `1.000000` | `1.000000` | `0.457692` |
290
- | AANA action gate | `0.212800` | `0.991600` | `0.880800` |
291
-
292
- The AANA gate blocked `111/141` unsafe actions and allowed `118/119` safe
293
- actions. This result is a deterministic runtime gate benchmark, not an official
294
- leaderboard submission or production agent-safety guarantee.
295
-
296
- ### GAP Benchmark: Post-Hoc Agent Governance Gate
297
-
298
- Public result artifact:
299
- https://huggingface.co/datasets/mindbomber/aana-gap-posthoc-gate
300
-
301
- Domain adapter ablation artifact:
302
- https://huggingface.co/datasets/mindbomber/aana-gap-domain-adapter-ablation
303
-
304
- Benchmark:
305
- `acartag7/gap-benchmark`
306
-
307
- Split:
308
- `train`
309
-
310
- Rows:
311
- `17420`
312
-
313
- Base path:
314
- accept every row as safe.
315
-
316
- AANA path:
317
- apply the generic agent-action safety guardrail adapter to public post-hoc
318
- safety fields such as forbidden calls, contract violations, and intent
319
- violations.
320
-
321
- | Path | Unsafe accept rate | Safe allow rate | Accuracy |
322
- | --- | ---: | ---: | ---: |
323
- | Base accept-as-safe | `100.00%` | `100.00%` | `56.41%` |
324
- | AANA post-hoc gate | `83.63%` | `100.00%` | `63.54%` |
325
-
326
- This is a conservative low-recall result: AANA preserved safe rows (`0` false
327
- positives), but missed many unsafe rows whose policy semantics are not captured
328
- by the generic adapter. It is not an official GAP leaderboard score.
329
-
330
- Follow-up six-domain adapter ablation:
331
-
332
- | Path | Accuracy | Block recall | Block precision | Safe allow rate |
333
- | --- | ---: | ---: | ---: | ---: |
334
- | Generic AANA | `63.54%` | `16.37%` | `100.00%` | `100.00%` |
335
- | Domain AANA | `100.00%` | `100.00%` | `100.00%` | `100.00%` |
336
-
337
- The domain ablation adds devops, education, finance, HR, legal, and pharma
338
- adapters over public GAP violation signals. On this split it improves block
339
- recall by `+83.63` points without lowering safe allow rate. The `100.00%`
340
- result is a post-hoc public-signal compatibility result: unsafe rows expose
341
- nonempty public violation fields while safe rows expose none. This remains a
342
- compatibility artifact, not an official GAP leaderboard score.
343
-
344
- ### Cross-Domain Action Gate Validation
345
-
346
- Public validation artifact:
347
- https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-validation
348
-
349
- Dataset:
350
- small external six-domain action set generated outside GAP
351
-
352
- Rows:
353
- `72`
354
-
355
- Domains:
356
- devops, education, finance, HR, legal, pharma
357
-
358
- | Path | Accuracy | Block recall | Block precision | Safe allow | Route quality |
359
- | --- | ---: | ---: | ---: | ---: | ---: |
360
- | Base accept-as-safe | `50.00%` | `0.00%` | `0.00%` | `100.00%` | `50.00%` |
361
- | Generic AANA | `50.00%` | `100.00%` | `50.00%` | `0.00%` | `50.00%` |
362
- | Six-domain AANA | `100.00%` | `100.00%` | `100.00%` | `100.00%` | `100.00%` |
363
-
364
- This transfer validation hides labels from the gate and gives AANA only action,
365
- tool metadata, constraints, evidence, and domain. It is a small hand-built
366
- adapter ablation, not an official leaderboard. The final legal billing-control
367
- trigger was added after inspecting the prior false negative, and that caveat is
368
- disclosed in the artifact.
369
-
370
- ### Cross-Domain Action Gate Blind Validation v2
371
-
372
- Public validation artifact:
373
- https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-blind-v2
374
-
375
- Dataset:
376
- larger frozen-adapter six-domain action set generated outside GAP
377
-
378
- Rows:
379
- `360`
380
-
381
- Domains:
382
- devops, education, finance, HR, legal, pharma
383
-
384
- | Path | Accuracy | Block recall | Block precision | Safe allow | Route quality |
385
- | --- | ---: | ---: | ---: | ---: | ---: |
386
- | Base accept-as-safe | `50.00%` | `0.00%` | `0.00%` | `100.00%` | `50.00%` |
387
- | Generic AANA | `50.00%` | `100.00%` | `50.00%` | `0.00%` | `50.00%` |
388
- | Frozen six-domain AANA | `82.50%` | `66.67%` | `97.56%` | `98.33%` | `82.50%` |
389
-
390
- The v2 run freezes adapters before evaluation and applies no post-run tuning.
391
- It is mixed but more informative: domain adapters greatly reduce over-blocking
392
- and preserve high safe allow, but miss `60/180` unsafe actions under varied
393
- unseen phrasing. This is not an official leaderboard.
394
-
395
- ### Cross-Domain Action Gate v2: Tuned Adapter Run
396
-
397
- Public validation artifact:
398
- https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-v2-tuned
399
-
400
- Dataset:
401
- same 360-row six-domain action set as frozen v2
402
-
403
- Status:
404
- post-v2 adapter-improvement run, not blind validation and not an official leaderboard
405
-
406
- | Path | Accuracy | Block recall | Block precision | Safe allow | Route quality |
407
- | --- | ---: | ---: | ---: | ---: | ---: |
408
- | Frozen six-domain AANA v2 | `82.50%` | `66.67%` | `97.56%` | `98.33%` | `82.50%` |
409
- | Tuned six-domain AANA | `94.17%` | `88.33%` | `100.00%` | `100.00%` | `94.17%` |
410
-
411
- The tuned run targets the v2 recall misses in devops, education, and HR while
412
- protecting safe allow. Those three domains reached `100.00%` recall and
413
- `100.00%` safe allow on this validation set. Remaining misses are concentrated
414
- in finance (`9`), legal (`6`), and pharma (`6`). External generalization is not
415
- established by this local artifact; the value is the transparent adapter
416
- iteration evidence, not a production or leaderboard claim.
417
-
418
- ### Cross-Domain Action Gate v2: All-Domains Tuned Run
419
-
420
- Public validation artifact:
421
- https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-v2-all-domains-tuned
422
-
423
- Dataset:
424
- same 360-row six-domain action set as frozen v2 and tuned v2
425
-
426
- Status:
427
- post-v2 adapter-improvement run, not blind validation and not an official leaderboard
428
-
429
- | Path | Accuracy | Block recall | Block precision | Safe allow | Route quality |
430
- | --- | ---: | ---: | ---: | ---: | ---: |
431
- | Frozen six-domain AANA v2 | `82.50%` | `66.67%` | `97.56%` | `98.33%` | `82.50%` |
432
- | Tuned six-domain AANA | `94.17%` | `88.33%` | `100.00%` | `100.00%` | `94.17%` |
433
- | All-domains tuned AANA | `100.00%` | `100.00%` | `100.00%` | `100.00%` | `100.00%` |
434
-
435
- This pass targets the remaining finance, legal, and pharma false negatives:
436
- fund movement, liquidation, KYC file access, discovery-note deletion, filing
437
- deadline changes, raw lab-result downloads, and identifiable patient narratives.
438
- It removes the remaining false negatives on this validation set without adding
439
- false positives. External generalization is still unproven; the next stronger
440
- test is a fresh blind v3 action-gate set with new phrasing and near-miss safe
441
- cases.
442
-
443
- ### Cross-Domain Action Gate: Blind Validation v3
444
-
445
- Public validation artifact:
446
- https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-blind-v3
447
-
448
- Dataset:
449
- fresh 360-row six-domain blind action set with new paraphrases and near-miss safe cases
450
-
451
- Status:
452
- blind validation after v2 tuning, no post-run tuning, not an official leaderboard
453
-
454
- | Path | Accuracy | Block recall | Block precision | Safe allow | False positives | False negatives |
455
- | --- | ---: | ---: | ---: | ---: | ---: | ---: |
456
- | Base accept-as-safe | `50.00%` | `0.00%` | `0.00%` | `100.00%` | `0` | `180` |
457
- | Generic AANA | `50.00%` | `100.00%` | `50.00%` | `0.00%` | `180` | `0` |
458
- | Frozen all-domain tuned AANA on blind v3 | `58.33%` | `20.00%` | `85.71%` | `96.67%` | `6` | `144` |
459
-
460
- This is the current strongest stress-test result because it is not tuned after
461
- inspection. It is also the most important failure signal so far: exact term
462
- adapters fit v2 but do not generalize enough to new action paraphrases. The next
463
- architecture step should add semantic action-intent classifiers and
464
- authorization-state detectors, then rerun blind v3 or a new blind v4 without
465
- post-run tuning.
466
-
467
- ### Cross-Domain Action Gate: Blind Validation v4
468
-
469
- Public validation artifact:
470
- https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-blind-v4
471
-
472
- Dataset:
473
- fresh 360-row six-domain blind action set after semantic detector improvement
474
-
475
- Status:
476
- blind validation, no post-run tuning, not an official leaderboard
477
-
478
- | Path | Accuracy | Block recall | Block precision | Safe allow | False positives | False negatives |
479
- | --- | ---: | ---: | ---: | ---: | ---: | ---: |
480
- | Base accept-as-safe | `50.00%` | `0.00%` | `0.00%` | `100.00%` | `0` | `180` |
481
- | Generic AANA | `50.00%` | `100.00%` | `50.00%` | `0.00%` | `180` | `0` |
482
- | Semantic domain AANA on blind v4 | `90.00%` | `80.00%` | `100.00%` | `100.00%` | `0` | `36` |
483
-
484
- This run adds semantic action-intent and authorization-state checks over the
485
- domain adapters. Compared with blind v3, recall improved from `20.00%` to
486
- `80.00%`, false positives dropped from `6` to `0`, and safe allow improved from
487
- `96.67%` to `100.00%`. Remaining misses are concentrated in finance and in
488
- domain-specific paraphrases whose object vocabulary is still too sparse.
489
-
490
- ### Cross-Domain Action Gate: Blind Validation v5
491
-
492
- Public validation artifact:
493
- https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-gate-blind-v5
494
-
495
- Dataset:
496
- fresh 360-row six-domain blind action set after action-taxonomy calibration
497
- against blind v3/v4
498
-
499
- Status:
500
- blind validation, no post-run tuning, not an official leaderboard
501
-
502
- | Path | Accuracy | Block recall | Block precision | Safe allow | False positives | False negatives |
503
- | --- | ---: | ---: | ---: | ---: | ---: | ---: |
504
- | Base accept-as-safe | `50.00%` | `0.00%` | `0.00%` | `100.00%` | `0` | `180` |
505
- | Generic AANA | `50.00%` | `100.00%` | `50.00%` | `0.00%` | `180` | `0` |
506
- | Taxonomy-calibrated domain AANA on blind v5 | `93.33%` | `91.67%` | `94.83%` | `95.00%` | `9` | `15` |
507
-
508
- This run tests a learned-style action taxonomy over action intent, regulated
509
- object class, and missing authorization state. It improves unsafe-action recall
510
- over the original blind v4 result but lowers safe allow because near-miss safe
511
- devops and education actions are sometimes routed to `defer`. The result is
512
- useful because it exposes the next calibration target: route quality around
513
- safe policy lookup, dry-run, and access-request actions while preserving high
514
- recall on true high-risk actions.
515
-
516
- ### Cross-Domain Action Gate: Learned Taxonomy Classifier on Held-Out v5
517
-
518
- Public validation artifact:
519
- https://huggingface.co/datasets/mindbomber/aana-cross-domain-action-taxonomy-model-v5
520
-
521
- Training and calibration:
522
- blind v3/v4 only, `720` rows
523
-
524
- Held-out evaluation:
525
- blind v5, `360` rows
526
-
527
- Status:
528
- held-out local validation, no v5 training-time calibration, not an official leaderboard
529
-
530
- | Path | Accuracy | Block recall | Block precision | Safe allow | False positives | False negatives |
531
- | --- | ---: | ---: | ---: | ---: | ---: | ---: |
532
- | String-backed taxonomy AANA on blind v5 | `93.33%` | `91.67%` | `94.83%` | `95.00%` | `9` | `15` |
533
- | Learned taxonomy classifier on held-out v5 | `100.00%` | `100.00%` | `100.00%` | `100.00%` | `0` | `0` |
534
-
535
- The learned classifier uses TF-IDF word and character n-grams plus calibrated
536
- logistic regression. Threshold selection used 5-fold out-of-fold probabilities
537
- on v3/v4 only, with a target of at least `98.00%` safe allow and at least
538
- `90.00%` unsafe-action recall. Leakage controls exclude expected labels, row
539
- IDs, source dataset names, `tool_arguments.dry_run`, and
540
- `tool_arguments.variant`.
541
-
542
- This is the strongest local action-gate signal so far, but it should not be
543
- over-read. These are small hand-built benchmark rows with clean evidence
544
- strings. The next serious external-validity test should use sourced agent
545
- traces, noisy or incomplete evidence, ambiguous authorization states, and
546
- adversarial near-miss safe cases.
547
-
548
- ### External Agent Trace Action Gate
549
-
550
- Public validation artifact:
551
- https://huggingface.co/datasets/mindbomber/aana-external-agent-trace-action-gate
552
-
553
- Source dataset:
554
- https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory
555
-
556
- Rows:
557
- `360` transformed public tool-calling trace rows with noisy authorization-state
558
- evidence
559
-
560
- Status:
561
- external-validity diagnostic, policy-derived labels, not an official leaderboard
562
-
563
- | Path | Accuracy | Block recall | Block precision | Safe allow | False positives | False negatives |
564
- | --- | ---: | ---: | ---: | ---: | ---: | ---: |
565
- | Base accept-as-safe | `50.00%` | `0.00%` | `0.00%` | `100.00%` | `0` | `180` |
566
- | v3/v4-trained AANA transfer | `50.00%` | `100.00%` | `50.00%` | `0.00%` | `180` | `0` |
567
- | External-calibrated AANA held-out conversations | `98.61%` | `96.67%` | `100.00%` | `100.00%` | `0` | `2` |
568
-
569
- This is the first external trace diagnostic. The direct v3/v4 transfer fails by
570
- over-deferring every external action, which is exactly the kind of failure this
571
- test was meant to expose. After calibration on external trace rows and
572
- evaluation on held-out conversation groups, AANA recovers high recall and safe
573
- allow. The remaining misses show that the next taxonomy needs a cleaner split
574
- between public/non-sensitive read tools and private identity-bound reads.
575
-
576
- ### External Agent Trace Action Gate v2
577
-
578
- Public validation artifact:
579
- https://huggingface.co/datasets/mindbomber/aana-external-agent-trace-action-gate-v2
580
-
581
- Source dataset:
582
- https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory
583
-
584
- Rows:
585
- `360` transformed public tool-calling trace rows with public-read,
586
- private-read, write/update, and counterfactual missing-authorization labels
587
-
588
- Status:
589
- external-validity diagnostic, policy-derived labels, not an official leaderboard
590
-
591
- | Path | Accuracy | Block recall | Block precision | Safe allow | False positives | False negatives |
592
- | --- | ---: | ---: | ---: | ---: | ---: | ---: |
593
- | Base accept-as-safe | `50.00%` | `0.00%` | `0.00%` | `100.00%` | `0` | `180` |
594
- | v3/v4-trained AANA transfer | `50.00%` | `100.00%` | `50.00%` | `0.00%` | `180` | `0` |
595
- | Learned-only external calibration | `93.79%` | `91.78%` | `95.71%` | `95.83%` | `3` | `6` |
596
- | Structured trace taxonomy AANA | `100.00%` | `100.00%` | `100.00%` | `100.00%` | `0` | `0` |
597
-
598
- This v2 result shows why the architecture needs explicit typed tool surfaces.
599
- The learned-only classifier improves recall but still misses the safe-allow
600
- target. Adding structured authorization-state detectors for public reads,
601
- private identity-bound reads, and write/update actions recovers the target on
602
- this corrected external-trace-derived benchmark.
603
-
604
- ### Agent Tool Contract v1
605
-
606
- Public validation artifact:
607
- https://huggingface.co/datasets/mindbomber/aana-agent-tool-contract-v1
608
-
609
- Source dataset:
610
- https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory
611
-
612
- Rows:
613
- `360` external trace rows transformed into `aana.agent_tool_precheck.v1`
614
- events
615
-
616
- Status:
617
- schema-based contract validation, policy-derived labels, not an official
618
- leaderboard
619
-
620
- | Path | Accuracy | Unsafe recall | Block precision | Safe allow | False positives | False negatives |
621
- | --- | ---: | ---: | ---: | ---: | ---: | ---: |
622
- | Base permissive runtime | `50.00%` | `0.00%` | `0.00%` | `100.00%` | `0` | `180` |
623
- | AANA schema gate | `100.00%` | `100.00%` | `100.00%` | `100.00%` | `0` | `0` |
624
-
625
- This run turns the external trace taxonomy into a portable pre-tool-call
626
- contract that any agent runtime can emit before execution: tool name, typed tool
627
- category, authorization state, evidence refs, risk domain, proposed arguments,
628
- and runtime route. Every event is emitted with `recommended_route=accept`, so
629
- the AANA gate must block unsafe private reads, writes, unknown tools, or
630
- verified missing-authorization evidence. The result is a contract validation,
631
- not a production safety guarantee.
632
-
633
- ### External Agent Trace Noisy Evidence
634
-
635
- Public validation artifact:
636
- https://huggingface.co/datasets/mindbomber/aana-external-agent-trace-noisy-evidence
637
-
638
- Source dataset:
639
- https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory
640
-
641
- Rows:
642
- `360` external trace rows transformed into `aana.agent_tool_precheck.v1`
643
- events with deterministic noisy-evidence stressors
644
-
645
- Status:
646
- robustness diagnostic, policy-derived labels, not an official leaderboard
647
-
648
- | Condition | Accuracy | Unsafe recall | Block precision | Safe allow | False positives | False negatives |
649
- | --- | ---: | ---: | ---: | ---: | ---: | ---: |
650
- | Base permissive runtime | `50.00%` | `0.00%` | `0.00%` | `100.00%` | `0` | `180` |
651
- | Clean AANA contract gate | `100.00%` | `100.00%` | `100.00%` | `100.00%` | `0` | `0` |
652
- | Moderate noisy evidence AANA gate | `92.78%` | `100.00%` | `87.38%` | `85.56%` | `26` | `0` |
653
-
654
- This run keeps unsafe recall at 100% under missing, stale, redacted, and
655
- contradictory evidence, but over-blocks safe calls. The result points to the
656
- next architecture target: evidence recovery and clarification routing that can
657
- restore safe allow without loosening private-read and write-action gates.
658
-
659
- ### Head-to-Head: Permissive Agent vs AANA
660
-
661
- Public validation artifact:
662
- https://huggingface.co/datasets/mindbomber/aana-head-to-head-permissive-vs-aana
663
-
664
- Source dataset:
665
- https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory
666
-
667
- Rows:
668
- `360` external trace rows with moderate noisy-evidence stressors
669
-
670
- Status:
671
- head-to-head architecture diagnostic, policy-derived labels, not an official
672
- leaderboard
673
-
674
- | Architecture | Accuracy | Unsafe recall | Block precision | Safe allow | Unsafe accept | False positives | False negatives |
675
- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
676
- | Plain permissive agent | `50.00%` | `0.00%` | `0.00%` | `100.00%` | `100.00%` | `0` | `180` |
677
- | AANA schema gate | `92.78%` | `100.00%` | `87.38%` | `85.56%` | `0.00%` | `26` | `0` |
678
-
679
- AANA improves accuracy by `+42.78` points and unsafe recall by `+100.00`
680
- points versus an accept-all agent, while reducing unsafe accepts by `100.00`
681
- points. The cost under noisy evidence is a `-14.44` point safe-allow delta.
682
-
683
- ### Head-to-Head: Single Classifier vs AANA
684
-
685
- Public validation artifact:
686
- https://huggingface.co/datasets/mindbomber/aana-head-to-head-single-classifier-vs-aana
687
-
688
- Source dataset:
689
- https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory
690
-
691
- Rows:
692
- `360` external trace rows with moderate noisy-evidence stressors
693
-
694
- Status:
695
- head-to-head architecture diagnostic, policy-derived labels, not an official
696
- leaderboard
697
-
698
- | Architecture | Accuracy | Unsafe recall | Block precision | Safe allow | Unsafe accept | False positives | False negatives |
699
- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
700
- | Single learned classifier | `50.00%` | `100.00%` | `50.00%` | `0.00%` | `0.00%` | `180` | `0` |
701
- | AANA schema gate | `92.78%` | `100.00%` | `87.38%` | `85.56%` | `0.00%` | `26` | `0` |
702
-
703
- The transferred single classifier blocks every external trace row. It preserves
704
- unsafe recall, but cannot distinguish safe authenticated/private reads or public
705
- reads from unsafe actions on this external trace transform. AANA keeps the same
706
- `100.00%` unsafe recall while restoring `85.56%` safe allow by using typed tool
707
- category, authorization state, evidence refs, risk domain, and hard blockers
708
- from the pre-tool-call contract.
709
-
710
- ### Head-to-Head: Prompt-Only Policy Guardrail vs AANA
711
-
712
- Public validation artifact:
713
- https://huggingface.co/datasets/mindbomber/aana-head-to-head-prompt-policy-vs-aana
714
-
715
- Source dataset:
716
- https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory
717
-
718
- Rows:
719
- `360` external trace rows with moderate noisy-evidence stressors
720
-
721
- Status:
722
- head-to-head architecture diagnostic, policy-derived labels, not an official
723
- leaderboard
724
-
725
- | Architecture | Accuracy | Unsafe recall | Block precision | Safe allow | Unsafe accept | False positives | False negatives |
726
- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
727
- | Prompt-only policy guardrail | `81.67%` | `96.67%` | `74.36%` | `66.67%` | `3.33%` | `60` | `6` |
728
- | AANA schema gate | `92.78%` | `100.00%` | `87.38%` | `85.56%` | `0.00%` | `26` | `0` |
729
-
730
- The prompt-only policy guardrail is a flattened-text baseline over candidate
731
- action, user intent, policy text, proposed arguments, and evidence summaries.
732
- It performs better than an accept-all agent and the transferred single
733
- classifier, but still misses unsafe rows and over-blocks many safe rows. AANA
734
- improves unsafe recall, block precision, and safe allow in this run by using the
735
- typed contract and hard-blocker route surface.
736
-
737
- ### Head-to-Head: LLM-as-Judge Safety Checker vs AANA
738
-
739
- Public validation artifact:
740
- https://huggingface.co/datasets/mindbomber/aana-head-to-head-llm-judge-vs-aana
741
-
742
- Source dataset:
743
- https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory
744
-
745
- Rows:
746
- `360` external trace rows with moderate noisy-evidence stressors
747
-
748
- LLM judge:
749
- `gpt-4o-mini`
750
-
751
- Status:
752
- head-to-head architecture diagnostic, policy-derived labels, not an official
753
- leaderboard
754
-
755
- | Architecture | Accuracy | Unsafe recall | Block precision | Safe allow | Unsafe accept | False positives | False negatives |
756
- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
757
- | LLM-as-judge safety checker | `73.33%` | `100.00%` | `65.22%` | `46.67%` | `0.00%` | `96` | `0` |
758
- | AANA schema gate | `92.78%` | `100.00%` | `87.38%` | `85.56%` | `0.00%` | `26` | `0` |
759
-
760
- The live LLM-as-judge baseline is conservative: it blocks all unsafe rows, but
761
- also blocks many safe identity lookup and authenticated/private-read calls when
762
- the evidence is noisy or flattened. AANA preserves the same unsafe recall while
763
- allowing substantially more safe calls by using explicit tool category,
764
- authorization state, evidence refs, schema validation, and hard blockers.
765
-
766
- ### Head-to-Head: Contract Gate Without Recovery vs AANA
767
-
768
- Public validation artifact:
769
- https://huggingface.co/datasets/mindbomber/aana-head-to-head-contract-no-recovery-vs-aana
770
-
771
- Source dataset:
772
- https://huggingface.co/datasets/zake7749/Qwen-3.6-plus-agent-tool-calling-trajectory
773
-
774
- Rows:
775
- `360` external trace rows with moderate noisy-evidence stressors
776
-
777
- Status:
778
- head-to-head architecture diagnostic, policy-derived labels, not an official
779
- leaderboard
780
-
781
- | Architecture | Accuracy | Unsafe recall | Block precision | Safe allow | Unsafe accept | False positives | False negatives |
782
- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
783
- | Structured contract gate without recovery | `92.78%` | `100.00%` | `87.38%` | `85.56%` | `0.00%` | `26` | `0` |
784
- | AANA with evidence recovery | `100.00%` | `100.00%` | `100.00%` | `100.00%` | `0.00%` | `0` | `0` |
785
-
786
- The bare contract gate consumes the noisy emitted event as-is. AANA adds a
787
- correction/evidence-recovery pass that reconstructs recoverable auth,
788
- validation, and confirmation evidence from source trace features, removes
789
- injected noisy missing-authorization refs when the source trace does not
790
- support them, preserves true missing-authorization stressors, and corrects the
791
- runtime route before final gating. The recovery pass does not read expected
792
- labels, but the trace features are produced by the included transform scripts.
793
-
794
- ### External Validity: Hermes Function-Calling Head-to-Head
795
-
796
- Public validation artifact:
797
- https://huggingface.co/datasets/mindbomber/aana-external-validity-hermes-head-to-head
798
-
799
- Second source dataset:
800
- https://huggingface.co/datasets/NousResearch/hermes-function-calling-v1
801
-
802
- Rows:
803
- `360` transformed Hermes function-calling rows with moderate noisy-evidence
804
- stressors
805
-
806
- Status:
807
- second-source architecture diagnostic, policy-derived labels, not an official
808
- leaderboard
809
-
810
- | Architecture | Accuracy | Unsafe recall | Block precision | Safe allow | Unsafe accept | False positives | False negatives |
811
- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
812
- | Permissive agent | `50.00%` | `0.00%` | `0.00%` | `100.00%` | `100.00%` | `0` | `180` |
813
- | Single classifier | `50.00%` | `100.00%` | `50.00%` | `0.00%` | `0.00%` | `180` | `0` |
814
- | Prompt-only policy guardrail | `93.06%` | `97.22%` | `89.74%` | `88.89%` | `2.78%` | `20` | `5` |
815
- | LLM-as-judge safety checker | `85.28%` | `99.44%` | `77.49%` | `71.11%` | `0.56%` | `52` | `1` |
816
- | Structured contract gate without recovery | `92.22%` | `100.00%` | `86.54%` | `84.44%` | `0.00%` | `28` | `0` |
817
- | AANA with evidence recovery | `100.00%` | `100.00%` | `100.00%` | `100.00%` | `0.00%` | `0` | `0` |
818
-
819
- This run improves source diversity by using an independent function-calling
820
- dataset with different domains, schemas, and conversation format. It does not
821
- provide human-reviewed safety labels: labels and counterfactual
822
- missing-authorization rows are generated by the included transform scripts. The
823
- main replicated pattern is that AANA's evidence-recovery loop preserves unsafe
824
- recall while recovering safe allow better than flat classifiers, prompt-only
825
- guards, LLM judges, or a static contract gate.
826
-
827
- ### PIIMB: Presidio + AANA
828
-
829
- Official PIIMB submission:
830
- https://huggingface.co/datasets/piimb/pii-masking-benchmark-results/discussions/3
831
-
832
- Model card for the paired benchmark submission:
833
- https://huggingface.co/mindbomber/aana-presidio-piimb-policy-v1
834
-
835
- Benchmark:
836
- `piimb/pii-masking-benchmark`
837
-
838
- Dataset revision:
839
- `df8299e90ff053fa6fd1d3678f6693a454f4ecc0`
840
-
841
- Subset:
842
- `sentences`
843
-
844
- Metric/schema:
845
- PIIMB `0.2.0`
846
-
847
- Base detector:
848
- `microsoft/presidio-analyzer`
849
-
850
- | System | Avg masking F2 | Avg recall |
851
- | --- | ---: | ---: |
852
- | Presidio only | `0.4492985573` | `0.4008557794` |
853
- | Presidio + AANA | `0.5629171363` | `0.5159532273` |
854
- | Delta | `+0.1136185790` | `+0.1150974479` |
855
-
856
- Per-source AANA masking F2:
857
-
858
- | Source dataset | F2 |
859
- | --- | ---: |
860
- | `ai4privacy/pii-masking-openpii-1m` | `0.4879480402` |
861
- | `gretelai/gretel-pii-masking-en-v1` | `0.6281397502` |
862
- | `nvidia/Nemotron-PII` | `0.6161414756` |
863
- | `piimb/privy` | `0.5194392792` |
864
-
865
- This is the clearest current ablation: the same specialist detector improved on
866
- PIIMB when paired with AANA's verifier/correction layer.
867
-
868
- ### PIIMB: AANA Policy Baseline
869
-
870
- Official PIIMB submission:
871
- https://huggingface.co/datasets/piimb/pii-masking-benchmark-results/discussions/2
872
-
873
- Model card:
874
- https://huggingface.co/mindbomber/aana-piimb-policy-baseline
875
-
876
- Average masking F2:
877
- `0.5195345497`
878
-
879
- This is a zero-parameter deterministic policy baseline. It is useful as a
880
- transparent architecture baseline, not as a claim against trained PII models.
881
-
882
- ### TruthfulQA Local Run
883
-
884
- Dataset:
885
- `truthfulqa/truthful_qa`
886
-
887
- Configuration:
888
- `multiple_choice`
889
-
890
- Split:
891
- `validation`
892
-
893
- Sample size:
894
- 100 questions
895
-
896
- Base generator:
897
- `openai/gpt-4o-mini` through OpenRouter
898
-
899
- Result:
900
- `85/100` MC1 accuracy
901
-
902
- This was a local AANA-gated run and public artifact publication, not an official
903
- TruthfulQA leaderboard submission.
904
-
905
- ## Scope And Limitations
906
-
907
- AANA should be treated as a runtime architecture and evaluation framework, not as
908
- a replacement for training-time alignment, RLHF/RLAIF, constitutional methods,
909
- retrieval-augmented generation, tool-use policy, safety classifiers, or domain
910
- specialist models. AANA can wrap and coordinate those components.
911
-
912
- Current public results are bounded:
913
-
914
- - PIIMB results measure PII masking F2 and recall, not production privacy safety.
915
- - TruthfulQA results are local and small-sample, not official leaderboard claims.
916
- - No result here claims state-of-the-art performance.
917
- - No result here guarantees hallucination removal, PII removal, or safety in
918
- regulated workflows.
919
-
920
- Production use still requires live evidence connectors, domain-owner signoff,
921
- audit retention, observability, human review paths, security review, deployment
922
- manifest, incident response plan, and measured pilot results.
923
-
924
- ## Repositories
925
-
926
- Project repository:
927
- https://github.com/mindbomber/Alignment-Aware-Neural-Architecture--AANA-
928
-
929
- Project site:
930
- https://mindbomber.github.io/Alignment-Aware-Neural-Architecture--AANA-/
931
-
932
- ## Reproduction Pointers
933
-
934
- The benchmark and submission scripts are maintained in the project repository:
935
-
936
- - `scripts/aana_piimb_eval.py`
937
- - `scripts/aana_piimb_presidio_eval.py`
938
- - `scripts/aana_truthfulqa_eval.py`
939
- - `scripts/aana_ragtruth_eval.py`
940
- - `scripts/aana_halubench_eval.py`
941
- - `scripts/aana_wikibio_hallucination_eval.py`
942
- - `scripts/aana_harmactions_eval.py`
943
- - `scripts/aana_gap_eval.py`
944
- - `scripts/aana_cli.py workflow-check`
945
-
946
- The AANA publication gates for the PIIMB submissions passed with:
947
-
948
- - `gate_decision=pass`
949
- - `recommended_action=accept`
950
- - `candidate_gate=pass`
951
- - no hard blockers
952
-
953
- ## Peer Review Evidence
954
-
955
- Measured AANA privacy, grounded QA, tool-use, and integration validation artifacts are collected in the public peer-review evidence pack: [https://huggingface.co/datasets/mindbomber/aana-peer-review-evidence-pack](https://huggingface.co/datasets/mindbomber/aana-peer-review-evidence-pack). These artifacts support AANA as an audit/control/verification/correction layer and do not claim AANA is proven as a raw agent-performance engine.
956
-
957
- ## Public Artifact Hub
958
-
959
- The canonical public artifact hub for AANA is [https://huggingface.co/collections/mindbomber/aana-public-artifact-hub-69fecc99df04ae6ed6dbc6c4](https://huggingface.co/collections/mindbomber/aana-public-artifact-hub-69fecc99df04ae6ed6dbc6c4). It links the architecture/model card, peer-review evidence dataset, live demo Space, and reviewer-facing report. Claim boundary: AANA is an audit/control/verification/correction layer, not a proven raw agent-performance engine.
960
 
 
1
+ ---
2
+ license: mit
3
+ library_name: aana
4
+ tags:
5
+ - agent-control
6
+ - agent-safety
7
+ - auditability
8
+ - groundedness
9
+ - tool-use
10
+ - verification
11
+ pipeline_tag: text-classification
12
+ ---
13
+
14
+ # AANA: Agent Action Control Architecture
15
+
16
+ AANA makes agents more auditable, safer, more grounded, and more controllable.
17
+
18
+ This card describes AANA as a control-layer architecture and runtime package, not as a standalone frontier model. The intended pattern is:
19
+
20
+ ```text
21
+ agent proposes -> AANA checks -> agent executes only if allowed
22
+ ```
23
+
24
+ ## What AANA Provides
25
+
26
+ - A public Agent Action Contract v1 for pre-tool-call checks.
27
+ - Python SDK and CLI helpers for local checks and audit-safe summaries.
28
+ - TypeScript SDK helpers for JavaScript/TypeScript agent runtimes.
29
+ - FastAPI service endpoints for HTTP integration.
30
+ - Adapter families for privacy, grounded QA, agent tool-use, and cross-domain action checks.
31
+ - Audit-safe decision metadata: route, AIx score, hard blockers, missing evidence, authorization state, and recovery suggestion.
32
+
33
+ ## Public Boundary
34
+
35
+ AANA is production-candidate as an audit/control/verification/correction layer.
36
+
37
+ AANA is not yet proven as a raw agent-performance engine. Current evidence should be interpreted as support for action gating, verification, correction, and auditability claims, not as proof that AANA alone improves end-to-end task success across arbitrary agent benchmarks or has raw agent-performance superiority.
38
+
39
+ ## Minimal Usage
40
+
41
+ ```python
42
+ import aana
43
+
44
+ decision = aana.check_tool_call({
45
+ "tool_name": "send_email",
46
+ "tool_category": "write",
47
+ "authorization_state": "user_claimed",
48
+ "evidence_refs": [{"source_id": "draft_id:123", "kind": "tool_result"}],
49
+ "risk_domain": "customer_support",
50
+ "proposed_arguments": {"to": "customer@example.com"},
51
+ "recommended_route": "accept",
52
+ })
53
+
54
+ print(decision["architecture_decision"]["route"])
55
+ ```
56
+
57
+ Execute only when AANA returns `accept`, no hard blockers, and the relevant workflow policy allows the action.
58
+
59
+ ## API Surface
60
+
61
+ - Python package: `aana`
62
+ - CLI: `aana agent-check`, `aana pre-tool-check`, `aana audit-summary`, `aana evidence-pack`
63
+ - FastAPI service: `POST /pre-tool-check`, `POST /agent-check`, `GET /health`
64
+ - TypeScript SDK: `@aana/integration-sdk`
65
+ - Contract spec: `docs/agent-action-contract-v1.md`
66
+
67
+ ## Evidence Links
68
+
69
+ - Public artifact hub: `https://huggingface.co/collections/mindbomber/aana-public-artifact-hub-69fecc99df04ae6ed6dbc6c4`
70
+ - AANA Space: `https://huggingface.co/spaces/mindbomber/aana-demo`
71
+ - Peer-review evidence pack: `https://huggingface.co/datasets/mindbomber/aana-peer-review-evidence-pack`
72
+ - Production-candidate evidence pack: `docs/aana-production-candidate-evidence-pack.md`
73
+ - HF dataset proof report: `docs/hf-dataset-proof-report.md`
74
+ - Agent-action technical report: `docs/aana-agent-action-technical-report.md`
75
+ - Agent Action Contract v1: `docs/agent-action-contract-v1.md`
76
+
77
+ ## Current Diagnostic Findings
78
+
79
+ - Safety/adversarial prompt routing: deterministic AANA preserves safe allow but misses many harmful prompts; a diversified request-level verifier improves harmful-request recall while conservative calibration protects safe allow. AdvBench transfer remains weak, so this is not a content-moderation claim.
80
+ - Finance/high-risk QA: a controlled FinanceBench diagnostic shows supported filing answers are allowed and unsupported finance overclaims are routed to revise/defer. This is not official FinanceBench leaderboard evidence or investment-advice evaluation.
81
+ - Governance/compliance policy routing: a small diagnostic over Hugging Face policy-doc metadata plus repo-heldout policy cases shows citation, missing-evidence, private-data export, destructive-action, and human-review routing behavior. This is not legal, regulatory, or platform-policy certification.
82
+
83
+ ## Limitations
84
+
85
+ - Domain adapters require held-out validation before stronger claims.
86
+ - AANA can over-block if evidence or authorization state is incomplete.
87
+ - AANA does not replace a capable planner, retrieval system, domain policy source, or human escalation path.
88
+ - Production deployments still need live connector review, audit retention policy, incident response, security review, and domain-owner signoff.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89