meseretbolled
/

Tenacious-Qwen3-DPO-v01

text-generation-inference

Model card Files Files and versions

meseretbolled commited on 20 days ago

Commit

97fd35d

·

verified ·

1 Parent(s): b28ff11

Update README.md

Files changed (1) hide show

README.md +55 -9

README.md CHANGED Viewed

@@ -1,22 +1,68 @@
 ---
 base_model: unsloth/Qwen3-1.7B
 tags:
 - text-generation-inference
 - transformers
 - unsloth
 - qwen3
 - trl
-license: apache-2.0
-language:
-- en
 ---
-# Uploaded  model
-- **Developed by:** meseretbolled
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/Qwen3-1.7B
-This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth)
-[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 ---
 base_model: unsloth/Qwen3-1.7B
+language:
+- en
+license: apache-2.0
 tags:
 - text-generation-inference
 - transformers
 - unsloth
 - qwen3
 - trl
+- dpo
+- b2b-sales
+- lora
 ---
+# Tenacious-Qwen3-DPO-v01
+A 16-bit LoRA adapter fine-tuned on [unsloth/Qwen3-1.7B](https://huggingface.co/unsloth/Qwen3-1.7B)
+via Direct Preference Optimization (DPO) for **B2B sales outreach policy compliance**.
+Trained as part of [Tenacious-Bench v0.1](https://github.com/Meseretbolled/Sales-Agent-Evaluation-Bench) —
+a domain-specific benchmark for Tenacious-style outreach evaluation.
+## Evaluation Results (52 held-out tasks)
+| Metric | Score |
+|--------|-------|
+| Base model (Qwen3-1.7B) | 0.751 |
+| This adapter | **0.941** |
+| Delta A | **+0.1904** |
+| 95% CI (10k bootstrap) | [0.1115, 0.2788] |
+| p-value (one-tailed) | 0.0000 |
+## Training Details
+| Setting | Value |
+|---------|-------|
+| Algorithm | DPO (Rafailov et al., NeurIPS 2023) |
+| Base model | unsloth/Qwen3-1.7B |
+| Quantization | None — 16-bit LoRA (fp16) |
+| LoRA rank | r=16, alpha=32 |
+| Training pairs | 159 preference pairs |
+| Steps | 60 (3 epochs, batch size 8) |
+| Final loss | 0.1035 |
+| Hardware | Google Colab T4 (free tier) |
+| Training time | 11.6 minutes |
+| Framework | Unsloth + TRL PatchDPOTrainer |
+## What it learns
+The adapter trains the model to:
+- Avoid banned phrases (urgency language, over-commitment)
+- Ground every claim in the supplied hiring signal brief
+- Never reference a prospect's layoffs as a buying signal
+- Always include a calendar link
+- Match Tenacious tone markers (professional, signal-specific, brief)
+## Dataset
+[Tenacious-Bench v0.1](https://github.com/Meseretbolled/Sales-Agent-Evaluation-Bench) —
+238 tasks, 159 DPO preference pairs used for training.
+## Made with Unsloth
+This model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth).
+[![Made with Unsloth](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20badge.png)](https://github.com/unslothai/unsloth)