meseretbolled
/

Tenacious-Qwen3-DPO-v01

text-generation-inference

Model card Files Files and versions

Tenacious-Qwen3-DPO-v01

A 16-bit LoRA adapter fine-tuned on unsloth/Qwen3-1.7B via Direct Preference Optimization (DPO) for B2B sales outreach policy compliance.

Trained as part of Tenacious-Bench v0.1 — a domain-specific benchmark for Tenacious-style outreach evaluation.

Evaluation Results (52 held-out tasks)

Metric	Score
Base model (Qwen3-1.7B)	0.751
This adapter	0.941
Delta A	+0.1904
95% CI (10k bootstrap)	[0.1115, 0.2788]
p-value (one-tailed)	0.0000

Training Details

Setting	Value
Algorithm	DPO (Rafailov et al., NeurIPS 2023)
Base model	unsloth/Qwen3-1.7B
Quantization	None — 16-bit LoRA (fp16)
LoRA rank	r=16, alpha=32
Training pairs	159 preference pairs
Steps	60 (3 epochs, batch size 8)
Final loss	0.1035
Hardware	Google Colab T4 (free tier)
Training time	11.6 minutes
Framework	Unsloth + TRL PatchDPOTrainer

What it learns

The adapter trains the model to:

Avoid banned phrases (urgency language, over-commitment)
Ground every claim in the supplied hiring signal brief
Never reference a prospect's layoffs as a buying signal
Always include a calendar link
Match Tenacious tone markers (professional, signal-specific, brief)

Dataset

Tenacious-Bench v0.1 — 238 tasks, 159 DPO preference pairs used for training.

Made with Unsloth

This model was trained 2x faster with Unsloth.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for meseretbolled/Tenacious-Qwen3-DPO-v01

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

unsloth/Qwen3-1.7B

Adapter

(11)

this model