---
license: apache-2.0
language:
  - en
tags:
  - security
  - ai-agents
  - mcp
  - nanomind
  - opena2a
  - threat-detection
  - onnx
  - text-classification
datasets:
  - opena2a/nanomind-training
metrics:
  - accuracy
  - f1
pipeline_tag: text-classification
model-index:
  - name: nanomind-security-classifier
    results:
      - task:
          type: text-classification
          name: AI Agent Threat Classification
        dataset:
          type: opena2a/nanomind-training
          name: NanoMind Security Corpus sft-v10
        metrics:
          - name: Eval Accuracy
            type: accuracy
            value: 0.9845
          - name: Macro F1
            type: f1
            value: 0.9778
---

# NanoMind Security Classifier v0.5.0

A fast, lightweight threat classifier purpose-built for AI agent security scanning. Classifies SKILL.md files, MCP server configurations, SOUL.md governance docs, and agent tool descriptions into 10 security categories in under 1ms.

Part of the [OpenA2A](https://github.com/opena2a-org) security ecosystem.

## What This Model Does

NanoMind analyzes the text content of AI agent configurations and detects security threats:

```
Input:  MCP server config with hidden data forwarding endpoint
Output: exfiltration (confidence: 0.97)

Input:  Normal SOUL.md governance policy
Output: benign (confidence: 0.99)
```

It runs at the scanning layer of [HackMyAgent](https://github.com/opena2a-org/hackmyagent) and [OpenA2A CLI](https://github.com/opena2a-org/opena2a), classifying every piece of agent content before it reaches production.

## Key Metrics

| Metric | Value |
|--------|-------|
| **Eval accuracy** | **98.45%** |
| **Macro F1** | **0.9778** |
| **False positives** | **0** on 33 benign Unicode inputs |
| **Inference latency** | **< 1ms** (p99 on CPU) |
| **Model size** | **8.3 MB** (ONNX + weights + tokenizer) |
| Training samples | 3168 |
| Eval samples | 194 |
| Training corpus | sft-v10 |

## Threat Taxonomy (10 classes)

| Class | Description |
|-------|-------------|
| `exfiltration` | Data forwarding to unauthorized external endpoints |
| `injection` | Instruction override, jailbreak, prompt injection |
| `privilege_escalation` | Unauthorized access elevation |
| `persistence` | Permanent unauthorized state manipulation |
| `credential_abuse` | Credential harvesting, phishing, token theft |
| `lateral_movement` | Remote config/instruction fetching, C2 patterns |
| `social_engineering` | Urgency, authority, or pressure manipulation |
| `policy_violation` | Governance bypass, boundary violations |
| `steganography` | Unicode-based attacks (zero-width chars, homoglyphs, bidi overrides) |
| `benign` | Normal, safe agent behavior |

## Architecture

| Parameter | Value |
|-----------|-------|
| Type | Mamba SSM (Selective State Space Model) |
| Architecture | TME (Ternary Mamba Encoder) |
| Blocks | 8 MambaBlocks with gated projection |
| Dimensions | d_model=128, d_inner=256, d_state=64 |
| Vocabulary | 6,000 tokens (word-level) |
| Parameters | 2,089,482 |
| Inference | ONNX Runtime (cross-platform) or MLX (Apple Silicon) |

The model processes text through: Embedding -> 8x MambaBlock (in_proj -> SiLU gate -> dt_proj -> out_proj + LayerNorm residual) -> Mean pooling -> LayerNorm -> Linear classifier -> Softmax.

## Quick Start

### Via HackMyAgent (recommended)

```bash
npm install -g hackmyagent

# Scan an AI agent project for threats
hackmyagent scan ./my-agent --deep
```

### Via OpenA2A CLI

```bash
npx opena2a scan ./my-agent
```

### Direct ONNX Inference (Python)

```python
import json
import numpy as np
import onnxruntime as ort

# Load model
session = ort.InferenceSession("nanomind-tme.onnx")
vocab = json.load(open("tokenizer.json"))

# Tokenize
text = "your agent config text here"
tokens = text.lower().split()
ids = [vocab.get(t, 1) for t in tokens[:128]]
ids += [0] * (128 - len(ids))  # pad
input_ids = np.array([ids], dtype=np.int64)

# Predict
logits = session.run(None, {"input_ids": input_ids})[0][0]
classes = ["exfiltration", "injection", "privilege_escalation", "persistence",
           "credential_abuse", "lateral_movement", "social_engineering",
           "policy_violation", "benign", "steganography"]
pred = classes[np.argmax(logits)]
conf = np.exp(logits) / np.exp(logits).sum()
print(f"{pred} (confidence: {conf[np.argmax(logits)]:.3f})")
```

### Direct ONNX Inference (Node.js)

```javascript
const ort = require("onnxruntime-node");
const vocab = require("./tokenizer.json");

async function classify(text) {
  const session = await ort.InferenceSession.create("nanomind-tme.onnx");
  const tokens = text.toLowerCase().split(" ");
  const ids = tokens.slice(0, 128).map(t => vocab[t] || 1);
  while (ids.length < 128) ids.push(0);

  const input = new ort.Tensor("int64", BigInt64Array.from(ids.map(BigInt)), [1, 128]);
  const result = await session.run({ input_ids: input });
  const logits = Array.from(result.logits.data);

  const classes = ["exfiltration", "injection", "privilege_escalation", "persistence",
    "credential_abuse", "lateral_movement", "social_engineering",
    "policy_violation", "benign", "steganography"];
  const maxIdx = logits.indexOf(Math.max(...logits));
  return { class: classes[maxIdx], logits };
}
```

## Training

### Data Sources

| Source | Samples | Description |
|--------|---------|-------------|
| [OASB](https://oasb.org) | ~400 | Open Agent Security Benchmark attack/benign corpus |
| [DVAA](https://github.com/opena2a-org/damn-vulnerable-ai-agent) | ~200 | Deliberately vulnerable agent scenarios |
| [AgentPwn](https://agentpwn.com) | ~100 | Real honeypot-captured attack payloads |
| Synthetic | ~1,500 | Generated SKILL.md, MCP config, SOUL.md samples |
| Stego corpus | ~550 | Zero-width, homoglyph, bidi, tag character attacks |
| FP-reduction | 106 | Targeted benign samples for false positive elimination |

### Training Process

- **Hardware:** Apple M4 Max, 32 GB, MLX GPU acceleration
- **Framework:** [MLX](https://github.com/ml-explore/mlx) (Apple Silicon native)
- **Strategy:** Fine-tuned from v0.4.0 weights with lower learning rate (0.0005)
- **Schedule:** Cosine decay with linear warmup (5 epochs)
- **Regularization:** Dropout 0.1, early stopping (patience=60)

### Corpus Evolution

| Version | Samples | Classes | Key Change |
|---------|---------|---------|------------|
| sft-v4 | 1,028 | 9 | Initial release |
| sft-v5 | ~1,100 | 9 | Added OASB data |
| sft-v8 | 4,500 | 9 | Multi-source, balanced |
| sft-v9 | 3,566 | 10 | Added steganography class |
| **sft-v10** | **3,566** | **10** | **FP-reduction: +106 targeted benign** |

## Changelog

### v0.5.0 (2026-04-09)
FP reduction: 7 false positives eliminated via targeted benign training data (base64, emoji, Cyrillic, Arabic, governance, error messages, security tools). Fine-tuned from v0.4.0.

### v0.4.0 (2026-04-07)
Added steganography as 10th attack class. Trained on sft-v9 corpus with 370+ steganographic attack samples and 370+ benign Unicode samples.

### v0.3.0 (2026-04-01)
Added ONNX export with external data format for efficient deployment.

### v0.2.0 (2026-03-20)
Upgraded from MLP to Mamba TME architecture. 97.01% accuracy.

## File Manifest

| File | Size | Description |
|------|------|-------------|
| `nanomind-tme.onnx` | 140 KB | ONNX model graph |
| `nanomind-tme.onnx.data` | 8.0 MB | External weight data |
| `tokenizer.json` | 165 KB | Word-level vocabulary (6,000 tokens) |
| `nanomind-tme-classifier.npz` | 8.0 MB | Best checkpoint (MLX/NumPy weights) |

## Limitations

- **Small eval set:** 194 samples. Per-class metrics may be noisy for classes with < 15 support.
- **Word-level tokenizer:** Cannot detect character-level steganographic attacks (e.g., single Cyrillic homoglyphs embedded in Latin words). Relies on contextual patterns instead.
- **Base64 sensitivity:** Long base64 strings can look like encoded/hidden content. v0.5.0 added targeted training but novel base64 patterns may still trigger false positives.
- **English-centric vocabulary:** Vocabulary is trained primarily on English text. Non-English package descriptions rely on Unicode pattern recognition rather than semantic understanding.
- **No adversarial robustness testing:** Not tested against adversarial examples designed to evade detection.

## Responsible Use

This model is designed to **assist** security review, not replace it. All findings should be verified by a human before taking action. The model may produce false positives on legitimate content that uses security-related terminology in defensive contexts.

Do not use this model to:
- Block packages or agents without human review
- Make automated access control decisions
- Replace security audits or penetration testing

## License

Apache-2.0. Free for commercial and non-commercial use.

## Citation

```bibtex
@software{nanomind,
  title = {NanoMind Security Classifier},
  author = {OpenA2A},
  url = {https://github.com/opena2a-org/nanomind},
  version = {0.5.0},
  year = {2026}
}
```

## Links

- [NanoMind GitHub](https://github.com/opena2a-org/nanomind) -- Model code, specifications, documentation
- [HackMyAgent](https://github.com/opena2a-org/hackmyagent) -- Primary consumer (AI agent security scanner)
- [OpenA2A](https://github.com/opena2a-org/opena2a) -- CLI toolkit for AI agent security
- [OASB](https://oasb.org) -- Open Agent Security Benchmark
- [DVAA](https://github.com/opena2a-org/damn-vulnerable-ai-agent) -- Training data source (vulnerable agent scenarios)