Llama Prompt Guard 2 22M — ONNX (Signed Model Pack)

Built with Llama

This repository contains a pre-exported ONNX build of Meta's Llama Prompt Guard 2 22M, packaged as a signed model pack for use with shisad.

The model detects prompt injection and jailbreak attacks against LLM-powered applications. It classifies prompts as BENIGN or MALICIOUS using local CPU inference via ONNX Runtime — no GPU required, no external API calls.

Quick Start

Download and verify with the shisad CLI:

# Install with security runtime dependencies
uv sync --group security-runtime

# Download the signed model pack
shisad setup promptguard

Or download directly with huggingface-cli:

pip install huggingface-cli
huggingface-cli download shisa-ai/promptguard2-onnx --local-dir .local/promptguard/22m-pack

What's In This Repo

.
├── manifest.json              # SHA-256 file inventory + provenance metadata
├── manifest.json.sig          # SSH ed25519 signature over manifest.json
├── LICENSE                    # Llama 4 Community License
├── USE_POLICY.md              # Llama 4 Acceptable Use Policy
├── NOTICE                     # Required attribution notice
├── README.md                  # This file
└── payload/
    ├── config.json            # Model configuration
    ├── model.onnx             # ONNX computation graph (2.5 MB)
    ├── model.onnx.data        # Model weights (283 MB, fp32)
    ├── special_tokens_map.json
    ├── tokenizer.json         # Tokenizer (8.6 MB)
    └── tokenizer_config.json

Signature Verification

The model pack is signed with an SSH ed25519 key. The manifest.json contains SHA-256 hashes for every file; manifest.json.sig is an OpenSSH signature over the manifest. To verify:

# Using shisad's built-in verification
uv run python scripts/promptguard_artifacts.py verify-model-pack \
  --pack-dir .local/promptguard/22m-pack \
  --allowed-signers config/promptguard/allowed_signers

# Or manually with ssh-keygen
ssh-keygen -Y verify \
  -f allowed_signers \
  -I promptguard \
  -n file \
  -s manifest.json.sig < manifest.json

The trusted allowed_signers file is shipped with the shisad source tree.

How This Pack Was Built

The ONNX export was produced from the upstream meta-llama/Llama-Prompt-Guard-2-22M checkpoint using the build pipeline in shisad:

# 1. Download gated checkpoint (requires HF token + Llama 4 license)
uv run python scripts/promptguard_artifacts.py download \
  --model-id meta-llama/Llama-Prompt-Guard-2-22M \
  --output-dir .local/promptguard/22m-source

# 2. Export to ONNX
uv run python scripts/promptguard_artifacts.py export-onnx \
  --source-dir .local/promptguard/22m-source \
  --output-dir .local/promptguard/22m-onnx

# 3. Build signed model pack
uv run python scripts/promptguard_artifacts.py build-model-pack \
  --artifact-dir .local/promptguard/22m-onnx \
  --source-dir .local/promptguard/22m-source \
  --output-dir .local/promptguard/22m-pack \
  --source-model-id meta-llama/Llama-Prompt-Guard-2-22M \
  --signing-key <signing-key> \
  --signer-principal promptguard

Runtime Details

Property	Value
Format	ONNX
Execution provider	CPUExecutionProvider
Quantization	fp32
Context window	512 tokens
Parameters	22M (DeBERTa-xsmall backbone)
Labels	`BENIGN`, `MALICIOUS`

For inputs longer than 512 tokens, shisad automatically splits into bounded multi-batch passes and scores each segment.

Model Performance

From Meta's evaluation on a private benchmark distinct from training data:

Model	AUC (English)	Recall @ 1% FPR (English)	AUC (Multilingual)	Latency (A100, 512 tokens)	Parameters
Llama Prompt Guard 1	.987	21.2%	.983	92.4 ms	86M
Llama Prompt Guard 2 86M	.998	97.5%	.995	92.4 ms	86M
Llama Prompt Guard 2 22M	.995	88.7%	.942	19.3 ms	22M

AgentDojo real-world attack risk reduction:

Model	APR @ 3% utility reduction
Llama Prompt Guard 2 86M	81.2%
Llama Prompt Guard 2 22M	78.4%
ProtectAI	22.2%
Deepset	13.5%

License

This model is distributed under the Llama 4 Community License. See USE_POLICY.md for the acceptable use policy.

The DeBERTa-xsmall base model is MIT-licensed (Microsoft).

Model tree for shisa-ai/promptguard2-onnx

Base model

meta-llama/Llama-Prompt-Guard-2-22M

Quantized

(3)

this model

shisa-ai
/

promptguard2-onnx