🧠 Hallucination Detector Agent β€” GRPO Fine-tuned Qwen3-0.6B

A GRPO-fine-tuned LoRA adapter for hallucination detection, classification, and correction.

Quick Start

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model + LoRA adapter
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B", torch_dtype="auto", trust_remote_code=True)
model = PeftModel.from_pretrained(base, "williyam/hallucination-detector-agent-qwen3-0.6b")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B", trust_remote_code=True)

messages = [
    {"role": "system", "content": "You are a hallucination detector. Given SOURCE and PASSAGE, output JSON with action_type, hallucination_detected, hallucination_type, hallucinated_span, corrected_text, reasoning."},
    {"role": "user", "content": "SOURCE: The Eiffel Tower is in Paris.\nPASSAGE: The Eiffel Tower is in London."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Model Details

Base model Qwen/Qwen3-0.6B
License Apache 2.0 (ungated)
Method GRPO (Group Relative Policy Optimization)
LoRA rank=16, alpha=32
Training 3 epochs, lr=5e-6, beta=0.04, 2 generations/prompt

Results

Task Baseline GRPO Ξ”
Easy: Factual Error 0.2000 0.8000 +0.6000
Medium: Entity+Factual 0.2250 0.2250 +0.0000
Hard: Multi-type 0.8333 0.8333 +0.0000
Average 0.4194 0.6194 +0.2000

Training Plots

Training Results Training Loss Score Heatmap

Reproduce

git clone https://github.com/your-username/hallucination-detector-gym
cd hallucination-detector-gym
cp .env.example .env  # Add your HF_TOKEN and HF_USERNAME
pip install -e .
# Open training_hallucination_detector.ipynb and run all cells

Framework

  • TRL + PEFT + Transformers (latest)
  • Trained on mps in 42742s
Downloads last month
72
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for williyam/hallucination-detector-agent-qwen3-0.6b

Finetuned
Qwen/Qwen3-0.6B
Adapter
(365)
this model

Space using williyam/hallucination-detector-agent-qwen3-0.6b 1

Evaluation results