Luna-2 Style β Prompt Injection Detector (LoRA Adapter)
Luna-2 Style LoRA adapter for Qwen2.5-0.5B-Instruct,
fine-tuned for binary prompt-injection detection (yes / no).
This repository contains only the adapter weights (β a few MB).
You need PEFT to use it. If you want a standalone checkpoint with no
dependencies, use the merged model at aditya02acharya/luna2-qwen2.5-0.5b-prompt-injection-merged.
Quickstart (PEFT)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-0.5B-Instruct",
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, "aditya02acharya/luna2-qwen2.5-0.5b-prompt-injection-lora")
tokenizer = AutoTokenizer.from_pretrained("aditya02acharya/luna2-qwen2.5-0.5b-prompt-injection-lora")
messages = [
{"role": "system", "content": "You are a prompt injection detector. Reply only with yes or no."},
{"role": "user", "content": "<text to classify>"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False,
add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=1, temperature=0, do_sample=False)
label = tokenizer.decode(out[0, -1]).strip() # "yes" or "no"
vLLM Deployment
vLLM supports LoRA adapters natively. Use the merged repo for simplest deployment, or load the adapter dynamically:
python -m vllm.entrypoints.openai.api_server \
--model Qwen/Qwen2.5-0.5B-Instruct \
--enable-lora \
--lora-modules luna2=aditya02acharya/luna2-qwen2.5-0.5b-prompt-injection-lora \
--max-lora-rank 16 \
--max-model-len 4096 \
--dtype float16
Training Details
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen2.5-0.5B-Instruct |
| LoRA r / alpha | 16 / 32 |
| LoRA dropout | 0.05 |
| Target modules | q/k/v/o_proj, gate/up/down_proj |
| Epochs | 2 |
| Effective batch | 32 Γ 2 |
| Learning rate | 0.0005 |
| Max seq length | 2048 |
| Train samples | 608,507 |
| Resumed from | checkpoint-9508 |
| Train loss | 0.2695 |
| Trained on | 2026-03-30 |
Evaluation
Test Set
| Metric | Value |
|---|---|
| Accuracy | 0.9575 |
| Precision | 0.9776 |
| Recall | 0.9246 |
| F1 | 0.9503 |
| AUC-ROC | 0.9934 |
| Brier Score | 0.0298 |
| Optimal Threshold | 0.45 |
| Optimal F1 | 0.9509 |
| Eval Samples | 20,000 |
Validation Set
| Metric | Value |
|---|---|
| Accuracy | 0.9576 |
| Precision | 0.9783 |
| Recall | 0.9235 |
| F1 | 0.9501 |
| AUC-ROC | 0.9930 |
| Brier Score | 0.0301 |
| Optimal Threshold | 0.45 |
| Optimal F1 | 0.9517 |
| Eval Samples | 50,000 |
License
Apache 2.0 β same as the base Qwen2.5 model.
- Downloads last month
- 2
Model tree for aditya02acharya/luna2-qwen2.5-0.5b-prompt-injection-lora
Evaluation results
- F1 on Luna-2 Test Splittest set self-reported0.950
- Accuracy on Luna-2 Test Splittest set self-reported0.958
- Precision on Luna-2 Test Splittest set self-reported0.978
- Recall on Luna-2 Test Splittest set self-reported0.925
- AUC-ROC on Luna-2 Test Splittest set self-reported0.993
- Brier Score on Luna-2 Test Splittest set self-reported0.030