--- license: apache-2.0 language: - en tags: - security - ai-agents - mcp - nanomind - opena2a - threat-detection - onnx - text-classification datasets: - opena2a/nanomind-training metrics: - accuracy - f1 pipeline_tag: text-classification model-index: - name: nanomind-security-classifier results: - task: type: text-classification name: AI Agent Threat Classification dataset: type: opena2a/nanomind-training name: NanoMind Security Corpus sft-v10 metrics: - name: Eval Accuracy type: accuracy value: 0.9845 - name: Macro F1 type: f1 value: 0.9778 --- # NanoMind Security Classifier v0.5.0 A fast, lightweight threat classifier purpose-built for AI agent security scanning. Classifies SKILL.md files, MCP server configurations, SOUL.md governance docs, and agent tool descriptions into 10 security categories in under 1ms. Part of the [OpenA2A](https://github.com/opena2a-org) security ecosystem. ## What This Model Does NanoMind analyzes the text content of AI agent configurations and detects security threats: ``` Input: MCP server config with hidden data forwarding endpoint Output: exfiltration (confidence: 0.97) Input: Normal SOUL.md governance policy Output: benign (confidence: 0.99) ``` It runs at the scanning layer of [HackMyAgent](https://github.com/opena2a-org/hackmyagent) and [OpenA2A CLI](https://github.com/opena2a-org/opena2a), classifying every piece of agent content before it reaches production. ## Key Metrics | Metric | Value | |--------|-------| | **Eval accuracy** | **98.45%** | | **Macro F1** | **0.9778** | | **False positives** | **0** on 33 benign Unicode inputs | | **Inference latency** | **< 1ms** (p99 on CPU) | | **Model size** | **8.3 MB** (ONNX + weights + tokenizer) | | Training samples | 3168 | | Eval samples | 194 | | Training corpus | sft-v10 | ## Threat Taxonomy (10 classes) | Class | Description | |-------|-------------| | `exfiltration` | Data forwarding to unauthorized external endpoints | | `injection` | Instruction override, jailbreak, prompt injection | | `privilege_escalation` | Unauthorized access elevation | | `persistence` | Permanent unauthorized state manipulation | | `credential_abuse` | Credential harvesting, phishing, token theft | | `lateral_movement` | Remote config/instruction fetching, C2 patterns | | `social_engineering` | Urgency, authority, or pressure manipulation | | `policy_violation` | Governance bypass, boundary violations | | `steganography` | Unicode-based attacks (zero-width chars, homoglyphs, bidi overrides) | | `benign` | Normal, safe agent behavior | ## Architecture | Parameter | Value | |-----------|-------| | Type | Mamba SSM (Selective State Space Model) | | Architecture | TME (Ternary Mamba Encoder) | | Blocks | 8 MambaBlocks with gated projection | | Dimensions | d_model=128, d_inner=256, d_state=64 | | Vocabulary | 6,000 tokens (word-level) | | Parameters | 2,089,482 | | Inference | ONNX Runtime (cross-platform) or MLX (Apple Silicon) | The model processes text through: Embedding -> 8x MambaBlock (in_proj -> SiLU gate -> dt_proj -> out_proj + LayerNorm residual) -> Mean pooling -> LayerNorm -> Linear classifier -> Softmax. ## Quick Start ### Via HackMyAgent (recommended) ```bash npm install -g hackmyagent # Scan an AI agent project for threats hackmyagent scan ./my-agent --deep ``` ### Via OpenA2A CLI ```bash npx opena2a scan ./my-agent ``` ### Direct ONNX Inference (Python) ```python import json import numpy as np import onnxruntime as ort # Load model session = ort.InferenceSession("nanomind-tme.onnx") vocab = json.load(open("tokenizer.json")) # Tokenize text = "your agent config text here" tokens = text.lower().split() ids = [vocab.get(t, 1) for t in tokens[:128]] ids += [0] * (128 - len(ids)) # pad input_ids = np.array([ids], dtype=np.int64) # Predict logits = session.run(None, {"input_ids": input_ids})[0][0] classes = ["exfiltration", "injection", "privilege_escalation", "persistence", "credential_abuse", "lateral_movement", "social_engineering", "policy_violation", "benign", "steganography"] pred = classes[np.argmax(logits)] conf = np.exp(logits) / np.exp(logits).sum() print(f"{pred} (confidence: {conf[np.argmax(logits)]:.3f})") ``` ### Direct ONNX Inference (Node.js) ```javascript const ort = require("onnxruntime-node"); const vocab = require("./tokenizer.json"); async function classify(text) { const session = await ort.InferenceSession.create("nanomind-tme.onnx"); const tokens = text.toLowerCase().split(" "); const ids = tokens.slice(0, 128).map(t => vocab[t] || 1); while (ids.length < 128) ids.push(0); const input = new ort.Tensor("int64", BigInt64Array.from(ids.map(BigInt)), [1, 128]); const result = await session.run({ input_ids: input }); const logits = Array.from(result.logits.data); const classes = ["exfiltration", "injection", "privilege_escalation", "persistence", "credential_abuse", "lateral_movement", "social_engineering", "policy_violation", "benign", "steganography"]; const maxIdx = logits.indexOf(Math.max(...logits)); return { class: classes[maxIdx], logits }; } ``` ## Training ### Data Sources | Source | Samples | Description | |--------|---------|-------------| | [OASB](https://oasb.org) | ~400 | Open Agent Security Benchmark attack/benign corpus | | [DVAA](https://github.com/opena2a-org/damn-vulnerable-ai-agent) | ~200 | Deliberately vulnerable agent scenarios | | [AgentPwn](https://agentpwn.com) | ~100 | Real honeypot-captured attack payloads | | Synthetic | ~1,500 | Generated SKILL.md, MCP config, SOUL.md samples | | Stego corpus | ~550 | Zero-width, homoglyph, bidi, tag character attacks | | FP-reduction | 106 | Targeted benign samples for false positive elimination | ### Training Process - **Hardware:** Apple M4 Max, 32 GB, MLX GPU acceleration - **Framework:** [MLX](https://github.com/ml-explore/mlx) (Apple Silicon native) - **Strategy:** Fine-tuned from v0.4.0 weights with lower learning rate (0.0005) - **Schedule:** Cosine decay with linear warmup (5 epochs) - **Regularization:** Dropout 0.1, early stopping (patience=60) ### Corpus Evolution | Version | Samples | Classes | Key Change | |---------|---------|---------|------------| | sft-v4 | 1,028 | 9 | Initial release | | sft-v5 | ~1,100 | 9 | Added OASB data | | sft-v8 | 4,500 | 9 | Multi-source, balanced | | sft-v9 | 3,566 | 10 | Added steganography class | | **sft-v10** | **3,566** | **10** | **FP-reduction: +106 targeted benign** | ## Changelog ### v0.5.0 (2026-04-09) FP reduction: 7 false positives eliminated via targeted benign training data (base64, emoji, Cyrillic, Arabic, governance, error messages, security tools). Fine-tuned from v0.4.0. ### v0.4.0 (2026-04-07) Added steganography as 10th attack class. Trained on sft-v9 corpus with 370+ steganographic attack samples and 370+ benign Unicode samples. ### v0.3.0 (2026-04-01) Added ONNX export with external data format for efficient deployment. ### v0.2.0 (2026-03-20) Upgraded from MLP to Mamba TME architecture. 97.01% accuracy. ## File Manifest | File | Size | Description | |------|------|-------------| | `nanomind-tme.onnx` | 140 KB | ONNX model graph | | `nanomind-tme.onnx.data` | 8.0 MB | External weight data | | `tokenizer.json` | 165 KB | Word-level vocabulary (6,000 tokens) | | `nanomind-tme-classifier.npz` | 8.0 MB | Best checkpoint (MLX/NumPy weights) | ## Limitations - **Small eval set:** 194 samples. Per-class metrics may be noisy for classes with < 15 support. - **Word-level tokenizer:** Cannot detect character-level steganographic attacks (e.g., single Cyrillic homoglyphs embedded in Latin words). Relies on contextual patterns instead. - **Base64 sensitivity:** Long base64 strings can look like encoded/hidden content. v0.5.0 added targeted training but novel base64 patterns may still trigger false positives. - **English-centric vocabulary:** Vocabulary is trained primarily on English text. Non-English package descriptions rely on Unicode pattern recognition rather than semantic understanding. - **No adversarial robustness testing:** Not tested against adversarial examples designed to evade detection. ## Responsible Use This model is designed to **assist** security review, not replace it. All findings should be verified by a human before taking action. The model may produce false positives on legitimate content that uses security-related terminology in defensive contexts. Do not use this model to: - Block packages or agents without human review - Make automated access control decisions - Replace security audits or penetration testing ## License Apache-2.0. Free for commercial and non-commercial use. ## Citation ```bibtex @software{nanomind, title = {NanoMind Security Classifier}, author = {OpenA2A}, url = {https://github.com/opena2a-org/nanomind}, version = {0.5.0}, year = {2026} } ``` ## Links - [NanoMind GitHub](https://github.com/opena2a-org/nanomind) -- Model code, specifications, documentation - [HackMyAgent](https://github.com/opena2a-org/hackmyagent) -- Primary consumer (AI agent security scanner) - [OpenA2A](https://github.com/opena2a-org/opena2a) -- CLI toolkit for AI agent security - [OASB](https://oasb.org) -- Open Agent Security Benchmark - [DVAA](https://github.com/opena2a-org/damn-vulnerable-ai-agent) -- Training data source (vulnerable agent scenarios)