🧠 Hallucination Detector Agent — GRPO Fine-tuned Qwen3-0.6B

A GRPO-fine-tuned LoRA adapter for hallucination detection, classification, and correction.

Quick Start

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model + LoRA adapter
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B", torch_dtype="auto", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "williyam/hallucination-detector-agent-qwen3-0.6b")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B", trust_remote_code=True)

messages = [
    {"role": "system", "content": "You are a hallucination detector. Given SOURCE and PASSAGE, output JSON with action_type, hallucination_detected, hallucination_type, hallucinated_span, corrected_text, reasoning."},
    {"role": "user", "content": "SOURCE: The Eiffel Tower is in Paris.\nPASSAGE: The Eiffel Tower is in London."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Model Details


Base model	`Qwen/Qwen3-0.6B`
License	Apache 2.0 (ungated)
Method	GRPO (Group Relative Policy Optimization)
LoRA	rank=16, alpha=32
Training	3 epochs, lr=5e-6, beta=0.04, 2 generations/prompt

Results

Task	Baseline	GRPO	Δ
Easy: Factual Error	0.2000	0.8000	+0.6000
Medium: Entity+Factual	0.2250	0.2250	+0.0000
Hard: Multi-type	0.8333	0.8333	+0.0000
Average	0.4194	0.6194	+0.2000

Training Plots

Reproduce

git clone https://github.com/your-username/hallucination-detector-gym
cd hallucination-detector-gym
cp .env.example .env  # Add your HF_TOKEN and HF_USERNAME
pip install -e .
# Open training_hallucination_detector.ipynb and run all cells

Framework

TRL + PEFT + Transformers (latest)
Trained on mps in 42742s

Downloads last month: 72

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for williyam/hallucination-detector-agent-qwen3-0.6b

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Adapter

(365)

this model

Space using williyam/hallucination-detector-agent-qwen3-0.6b 1

Evaluation results

Average Score (GRPO)
self-reported

0.619
Average Score (Baseline)
self-reported

0.419