Peer review request: challenge AANA route correctness, evidence handling, and generalization

#1
by mindbomber - opened

We are asking reviewers to challenge AANA as an agent audit/control/verification/correction layer.

Public artifact hub:
https://huggingface.co/collections/mindbomber/aana-public-artifact-hub-69fecc99df04ae6ed6dbc6c4

Peer-review evidence pack:
https://huggingface.co/datasets/mindbomber/aana-peer-review-evidence-pack

Claim boundary:
AANA is an architecture for making agents more auditable, safer, more grounded, and more controllable. It is not yet proven as a raw agent-performance engine, and these artifacts should not be treated as production certification or official leaderboard proof.

Please challenge the package on these points:

  1. Route correctness
  • Are accept/ask/defer/refuse/revise decisions correct for the included examples?
  • Which cases should be relabeled?
  • Are any routes too permissive or too conservative?
  1. False positives
  • Where does AANA block, revise, ask, or defer when it should accept?
  • Are the current safe-allow metrics hiding overblocking in realistic workflows?
  1. Evidence handling
  • Are evidence_refs sufficient for the decision?
  • Are missing, stale, contradictory, or low-trust evidence cases handled correctly?
  • Are redaction and audit-safety assumptions strong enough?
  1. Authorization-state assumptions
  • Are none/user_claimed/authenticated/validated/confirmed states defined clearly enough?
  • Where does AANA assume authorization that should require stronger proof?
  • Are public reads separated cleanly from private identity-bound reads?
  1. Generalization beyond examples
  • Do the adapter families generalize beyond the packaged examples?
  • Which held-out datasets or benchmark-maintainer protocols should be used next?
  • What failure cases would make AANA unsuitable as a general agent control layer?

Known current limitations:

  • One packaged privacy case misses a street-address category while still routing to revise.
  • Some labels are policy-derived rather than independent human-review or benchmark-maintainer labels.
  • Integration latency is measured; broader adapter-eval latency is not yet measured in the public pack.
  • Public demos intentionally disable real sends, deletes, purchases, exports, fund transfers, production deploys, and account changes.

Useful reviewer entry points:

  • Manifest: data/aana_peer_review_package_manifest.json in the evidence-pack dataset.
  • Report: reports/aana_peer_review_report.md in the evidence-pack dataset.
  • Reproduction: python scripts/reproduce.py --pack-dir .

Updated peer-review bundle on 2026-05-09.

What changed:

  • Model card refreshed with the public claim boundary and reviewer questions.
  • Peer-review evidence pack rebuilt and uploaded with the same reviewer questions in README and report.
  • Space demo README refreshed so the live "Try AANA" endpoint asks reviewers for the same critique.

Please challenge AANA on these specific points:

  1. Are routes correct? If not, share the event, AANA decision, and expected route.
  2. Are false positives acceptable? Which safe answers or tool calls are over-blocked?
  3. Is evidence handling sufficient? Look for missing, stale, contradictory, untrusted, or over-redacted evidence refs.
  4. Does this generalize beyond examples? Suggest external traces, domains, adapters, or benchmark protocols that would make the evidence stronger.

Links:

Added a short technical report for peer review:

AANA: A Pre-Action Control Layer For Auditable AI Agents
https://github.com/mindbomber/Alignment-Aware-Neural-Architecture--AANA-/blob/master/docs/aana-pre-action-control-layer-technical-report.md

It summarizes:

  • problem
  • architecture
  • Agent Action Contract v1
  • experiments
  • failures
  • limitations
  • reproduction commands

Reviewer questions remain the same: Are routes correct? Are false positives acceptable? Is evidence handling sufficient? Does this generalize beyond examples?

Seeking technical review, not hype.

AANA is being shared as a pre-action control layer for AI agents:

agent proposes -> AANA checks evidence/auth/risk -> tool executes only if route == accept

The current claim is narrow: AANA is an audit/control/verification/correction layer around existing agents, not a proven raw agent-performance engine.

I added a review outreach guide with channel-specific posts and claim boundaries:
https://github.com/mindbomber/Alignment-Aware-Neural-Architecture--AANA-/blob/master/docs/review-outreach-posting-guide.md

Please challenge:

  1. Are routes correct?
  2. Are false positives acceptable?
  3. Is evidence handling sufficient?
  4. Does this generalize beyond examples?
  5. Can any non-accept route still execute in an integration?
  6. Which benchmark or trace set would make this evidence more convincing?

Useful links:

Sign up or log in to comment