Introduction

This repository contains a LoRA adapter fine-tuned using Direct Preference Optimization (DPO) over a Retrieval-Augmented Generation (RAG) evaluation pipeline built on the Natural Questions validation set.

Training Pipeline

Base Model: Qwen/Qwen2.5-3B-Instruct
RAG responses generated over NQ validation split
Responses scored using custom reward signals:

Faithfulness
Citation usage
Hallucination detection
Refusal detection

Preference pairs constructed using margin filtering
LoRA fine-tuning using DPO

Dataset Lineage

This model is trained and evaluated using:

Dataset repository:

AnjanSB/NQ-RAG-DPO-Evaluation

Configurations used:

rag_responses (base + trained generations)
responses_scores (reward signals)
dpo_train_data (preference dataset)
comparison_metrics (evaluation results)

Evaluation Summary

Metric	Base Model	Trained Model	Train Data
Mean Responses Margin	0.4246	0.3612	2.1835
Total Prompts	1500	1500	228
Mean Total Reward	0.8378	0.8356	0.8582
Faithfulness	0.4300	0.4370	0.5307
Citation Score	0.8353	0.8683	0.8035
Hallucination	0.1974	0.1842	0.1095

The Train Data which is used to train base model indicates small in size(228 Total records) and high quality(2.18 a significant jump of mean response margin between choosen and rejected responses of a same prompt).

The trained model shows:

Significantly improved responses margin which indicates stable responses.
Improved citation usage
Reduced hallucination
Slightly improved faithfulness
Stable refusal behavior

👤 Author

AnjanSB

Experiment Repo : https://dagshub.com/AnjanSB/RAG-DPO-PEFT-LLMOPS

Profile : https://www.linkedin.com/in/anjansb/

Downloads last month: 6

Model tree for AnjanSB/Qwen2.5-3B-Instruct-NQ-RAG-DPO-LoRA

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Adapter

(1129)

this model

Datasets used to train AnjanSB/Qwen2.5-3B-Instruct-NQ-RAG-DPO-LoRA

Evaluation results

Mean Responses Margin on NQ-RAG-DPO-Evaluation (Metrics & Inference)
self-reported

0.361
Mean Total Reward on NQ-RAG-DPO-Evaluation (Metrics & Inference)
self-reported

0.836
Faithfulness on NQ-RAG-DPO-Evaluation (Metrics & Inference)
self-reported

0.437
Citation Score on NQ-RAG-DPO-Evaluation (Metrics & Inference)
self-reported

0.868
Hallucination Score on NQ-RAG-DPO-Evaluation (Metrics & Inference)
self-reported

0.184
Training Loss on NQ-RAG-DPO-Evaluation (Training Subset)
self-reported

0.694