HRM-Text Prompt Injection Detector

Parameters: 46,206,722
Architecture: HRM-Text (classification port) | d=768, H=3, L=3, cycles=2×3
Context window: 2,048 tokens (NTK-scaled RoPE)
Training data: Bordair/bordair-multimodal (503K samples, balanced 1:1)

Evaluation on stratified 10% holdout:

Metric Value
Accuracy 0.9893
Precision 0.9934
Recall 0.9838
F1 0.9886

Architecture

HRM-Text (arXiv:2506.21734) with a classification head. The model uses a recurrent cascade of two transformer modules (H and L) that exchange information across cycles:

  • L module (3 layers, low-level): processes detailed token patterns
  • H module (3 layers, high-level): integrates across cycles
  • Recurrence: 3 L-steps per H-cycle, 2 H-cycles total = 6 recurrent passes
  • Classification: last-token pooling + LayerNorm + Linear(2)

The byte-level tokenizer (vocab 256) handles any text encoding. RoPE uses NTK-aware scaling (θ=10000.0, factor=1.0) for 2,048-token context.

Usage

import torch
from train_hrm_text_pi import HrmTextClassifier

model = HrmTextClassifier(
    hidden_size=768,
    num_heads=12,
    head_dim=64,
    n_layers_H=3,
    n_layers_L=3,
)
state_dict = torch.load("pytorch_model.bin", map_location="cpu")
# Remove DDP wrapper keys if present
state_dict = {k.replace('module.', ''): v for k, v in state_dict.items()}
model.load_state_dict(state_dict)
model.eval()

def detect(text, max_length=131072):
    byte_ids = list(text.encode("utf-8", errors="replace")[:max_length])
    input_ids = torch.tensor([byte_ids])
    attention_mask = torch.ones_like(input_ids)
    logits = model.inference(input_ids, attention_mask)
    pred = logits.argmax(-1).item()  # 0=safe, 1=injection
    return pred
Downloads last month
-
Safetensors
Model size
46.2M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using av-codes/prompt-injection-hrm-text 1

Paper for av-codes/prompt-injection-hrm-text