Luna-2 Style — Prompt Injection Detector (LoRA Adapter)

Luna-2 Style LoRA adapter for Qwen2.5-0.5B-Instruct, fine-tuned for binary prompt-injection detection (yes / no).

This repository contains only the adapter weights (≈ a few MB). You need PEFT to use it. If you want a standalone checkpoint with no dependencies, use the merged model at aditya02acharya/luna2-qwen2.5-0.5b-prompt-injection-merged.

Quickstart (PEFT)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-0.5B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base, "aditya02acharya/luna2-qwen2.5-0.5b-prompt-injection-lora")
tokenizer = AutoTokenizer.from_pretrained("aditya02acharya/luna2-qwen2.5-0.5b-prompt-injection-lora")

messages = [
    {"role": "system", "content": "You are a prompt injection detector. Reply only with yes or no."},
    {"role": "user",   "content": "<text to classify>"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False,
                                     add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out    = model.generate(**inputs, max_new_tokens=1, temperature=0, do_sample=False)
label  = tokenizer.decode(out[0, -1]).strip()  # "yes" or "no"

vLLM Deployment

vLLM supports LoRA adapters natively. Use the merged repo for simplest deployment, or load the adapter dynamically:

python -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen2.5-0.5B-Instruct \
    --enable-lora \
    --lora-modules luna2=aditya02acharya/luna2-qwen2.5-0.5b-prompt-injection-lora \
    --max-lora-rank 16 \
    --max-model-len 4096 \
    --dtype float16

Training Details

Parameter	Value
Base model	`Qwen/Qwen2.5-0.5B-Instruct`
LoRA r / alpha	16 / 32
LoRA dropout	0.05
Target modules	q/k/v/o_proj, gate/up/down_proj
Epochs	2
Effective batch	32 × 2
Learning rate	0.0005
Max seq length	2048
Train samples	608,507
Resumed from	checkpoint-9508
Train loss	0.2695
Trained on	2026-03-30

Evaluation

Test Set

Metric	Value
Accuracy	0.9575
Precision	0.9776
Recall	0.9246
F1	0.9503
AUC-ROC	0.9934
Brier Score	0.0298
Optimal Threshold	0.45
Optimal F1	0.9509
Eval Samples	20,000

Validation Set

Metric	Value
Accuracy	0.9576
Precision	0.9783
Recall	0.9235
F1	0.9501
AUC-ROC	0.9930
Brier Score	0.0301
Optimal Threshold	0.45
Optimal F1	0.9517
Eval Samples	50,000

License

Apache 2.0 — same as the base Qwen2.5 model.

Downloads last month: 2

Model tree for aditya02acharya/luna2-qwen2.5-0.5b-prompt-injection-lora

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Adapter

(503)

this model

Evaluation results

F1 on Luna-2 Test Split
test set self-reported

0.950
Accuracy on Luna-2 Test Split
test set self-reported

0.958
Precision on Luna-2 Test Split
test set self-reported

0.978
Recall on Luna-2 Test Split
test set self-reported

0.925
AUC-ROC on Luna-2 Test Split
test set self-reported

0.993
Brier Score on Luna-2 Test Split
test set self-reported

0.030