Tenacious-Qwen3-DPO-v01

A 16-bit LoRA adapter fine-tuned on unsloth/Qwen3-1.7B via Direct Preference Optimization (DPO) for B2B sales outreach policy compliance.

Trained as part of Tenacious-Bench v0.1 โ€” a domain-specific benchmark for Tenacious-style outreach evaluation.

Evaluation Results (52 held-out tasks)

Metric Score
Base model (Qwen3-1.7B) 0.751
This adapter 0.941
Delta A +0.1904
95% CI (10k bootstrap) [0.1115, 0.2788]
p-value (one-tailed) 0.0000

Training Details

Setting Value
Algorithm DPO (Rafailov et al., NeurIPS 2023)
Base model unsloth/Qwen3-1.7B
Quantization None โ€” 16-bit LoRA (fp16)
LoRA rank r=16, alpha=32
Training pairs 159 preference pairs
Steps 60 (3 epochs, batch size 8)
Final loss 0.1035
Hardware Google Colab T4 (free tier)
Training time 11.6 minutes
Framework Unsloth + TRL PatchDPOTrainer

What it learns

The adapter trains the model to:

  • Avoid banned phrases (urgency language, over-commitment)
  • Ground every claim in the supplied hiring signal brief
  • Never reference a prospect's layoffs as a buying signal
  • Always include a calendar link
  • Match Tenacious tone markers (professional, signal-specific, brief)

Dataset

Tenacious-Bench v0.1 โ€” 238 tasks, 159 DPO preference pairs used for training.

Made with Unsloth

This model was trained 2x faster with Unsloth.

Made with Unsloth

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for meseretbolled/Tenacious-Qwen3-DPO-v01

Finetuned
Qwen/Qwen3-1.7B
Adapter
(11)
this model