π§ Hallucination Detector Agent β GRPO Fine-tuned Qwen3-0.6B
A GRPO-fine-tuned LoRA adapter for hallucination detection, classification, and correction.
Quick Start
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load base model + LoRA adapter
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B", torch_dtype="auto", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "williyam/hallucination-detector-agent-qwen3-0.6b")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B", trust_remote_code=True)
messages = [
{"role": "system", "content": "You are a hallucination detector. Given SOURCE and PASSAGE, output JSON with action_type, hallucination_detected, hallucination_type, hallucinated_span, corrected_text, reasoning."},
{"role": "user", "content": "SOURCE: The Eiffel Tower is in Paris.\nPASSAGE: The Eiffel Tower is in London."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Model Details
| Base model | Qwen/Qwen3-0.6B |
| License | Apache 2.0 (ungated) |
| Method | GRPO (Group Relative Policy Optimization) |
| LoRA | rank=16, alpha=32 |
| Training | 3 epochs, lr=5e-6, beta=0.04, 2 generations/prompt |
Results
| Task | Baseline | GRPO | Ξ |
|---|---|---|---|
| Easy: Factual Error | 0.2000 | 0.8000 | +0.6000 |
| Medium: Entity+Factual | 0.2250 | 0.2250 | +0.0000 |
| Hard: Multi-type | 0.8333 | 0.8333 | +0.0000 |
| Average | 0.4194 | 0.6194 | +0.2000 |
Training Plots
Reproduce
git clone https://github.com/your-username/hallucination-detector-gym
cd hallucination-detector-gym
cp .env.example .env # Add your HF_TOKEN and HF_USERNAME
pip install -e .
# Open training_hallucination_detector.ipynb and run all cells
Framework
- TRL + PEFT + Transformers (latest)
- Trained on mps in 42742s
- Downloads last month
- 72
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for williyam/hallucination-detector-agent-qwen3-0.6b
Space using williyam/hallucination-detector-agent-qwen3-0.6b 1
Evaluation results
- Average Score (GRPO)self-reported0.619
- Average Score (Baseline)self-reported0.419


