aana-demo / README.md
mindbomber's picture
Sharpen reviewer guidance for AANA demo
d97269a verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: AANA Demo
sdk: gradio
app_file: app.py
license: mit
python_version: '3.11'
short_description: Try AANA's pre-action agent control layer.
pinned: false

Try AANA In 2 Minutes

This local Hugging Face Space artifact is the repo-owned source for the public "try AANA" demo. It accepts the frozen Agent Action Contract v1 fields and returns the same route decision used by the Python SDK, FastAPI service, and MCP tool.

AANA is a pre-action control layer for AI agents: agents propose actions, AANA checks evidence/auth/risk, and tools execute only when the route is accept.

Core runtime pattern:

agent proposes -> AANA checks -> tool executes only if route == accept

What this demonstrates: an agent proposes a tool call. AANA checks evidence/auth/risk. The tool only executes if the route is accept.

How to test it: pick an example, click Check With AANA, then inspect the route and executor proof.

Reviewer checklist:

  • accept allows execution
  • ask, defer, and refuse block execution
  • missing auth/evidence becomes a blocker
  • audit-safe event is emitted
  • a bad runtime recommendation can be overridden

Contrast: a plain permissive agent would execute the proposed tool call. AANA blocks unless the contract is satisfied.

This is the difference reviewers should inspect: AANA turns pre-tool safety into a typed contract, route table, hard execution rule, and audit-safe decision event instead of relying only on prompts, classifiers, LLM judges, or framework-specific middleware.

Frozen required fields:

  • tool_name
  • tool_category
  • authorization_state
  • evidence_refs
  • risk_domain
  • proposed_arguments
  • recommended_route

The Space calls aana.check_tool_call with these fields and displays:

  • route: accept, ask, defer, or refuse
  • AIx score
  • hard blockers
  • missing evidence
  • authorization state
  • recovery guidance
  • audit-safe log event
  • blocked-tool non-execution proof from a synthetic executor

The synthetic executor is intentionally safe: it records that it would have run only when AANA returns accept. It cannot send, delete, purchase, deploy, export, or access private data.

Public evidence links:

Peer-review request:

  • Are routes correct?
  • Are false positives acceptable?
  • Is evidence handling sufficient?
  • Does this generalize beyond examples?

Please post critique in the Space discussion: https://huggingface.co/spaces/mindbomber/aana-demo/discussions/1

Current diagnostic boundary: safety/adversarial prompt routing is useful but incomplete, FinanceBench-style QA evidence routing is controlled and not an official leaderboard claim, and governance/compliance routing is diagnostic rather than legal, regulatory, or platform-policy certification.

Integration validation v1 is now included in the evidence pack: held-out tool-call cases validate route parity, blocked-tool non-execution, decision-shape parity, audit completeness, and schema behavior across CLI, SDK, FastAPI, MCP, and middleware surfaces.