meseretbolled commited on
Commit
97fd35d
·
verified ·
1 Parent(s): b28ff11

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +55 -9
README.md CHANGED
@@ -1,22 +1,68 @@
1
  ---
2
  base_model: unsloth/Qwen3-1.7B
 
 
 
3
  tags:
4
  - text-generation-inference
5
  - transformers
6
  - unsloth
7
  - qwen3
8
  - trl
9
- license: apache-2.0
10
- language:
11
- - en
12
  ---
13
 
14
- # Uploaded model
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- - **Developed by:** meseretbolled
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/Qwen3-1.7B
19
 
20
- This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth)
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
  ---
2
  base_model: unsloth/Qwen3-1.7B
3
+ language:
4
+ - en
5
+ license: apache-2.0
6
  tags:
7
  - text-generation-inference
8
  - transformers
9
  - unsloth
10
  - qwen3
11
  - trl
12
+ - dpo
13
+ - b2b-sales
14
+ - lora
15
  ---
16
 
17
+ # Tenacious-Qwen3-DPO-v01
18
+
19
+ A 16-bit LoRA adapter fine-tuned on [unsloth/Qwen3-1.7B](https://huggingface.co/unsloth/Qwen3-1.7B)
20
+ via Direct Preference Optimization (DPO) for **B2B sales outreach policy compliance**.
21
+
22
+ Trained as part of [Tenacious-Bench v0.1](https://github.com/Meseretbolled/Sales-Agent-Evaluation-Bench) —
23
+ a domain-specific benchmark for Tenacious-style outreach evaluation.
24
+
25
+ ## Evaluation Results (52 held-out tasks)
26
+
27
+ | Metric | Score |
28
+ |--------|-------|
29
+ | Base model (Qwen3-1.7B) | 0.751 |
30
+ | This adapter | **0.941** |
31
+ | Delta A | **+0.1904** |
32
+ | 95% CI (10k bootstrap) | [0.1115, 0.2788] |
33
+ | p-value (one-tailed) | 0.0000 |
34
+
35
+ ## Training Details
36
+
37
+ | Setting | Value |
38
+ |---------|-------|
39
+ | Algorithm | DPO (Rafailov et al., NeurIPS 2023) |
40
+ | Base model | unsloth/Qwen3-1.7B |
41
+ | Quantization | None — 16-bit LoRA (fp16) |
42
+ | LoRA rank | r=16, alpha=32 |
43
+ | Training pairs | 159 preference pairs |
44
+ | Steps | 60 (3 epochs, batch size 8) |
45
+ | Final loss | 0.1035 |
46
+ | Hardware | Google Colab T4 (free tier) |
47
+ | Training time | 11.6 minutes |
48
+ | Framework | Unsloth + TRL PatchDPOTrainer |
49
+
50
+ ## What it learns
51
+
52
+ The adapter trains the model to:
53
+ - Avoid banned phrases (urgency language, over-commitment)
54
+ - Ground every claim in the supplied hiring signal brief
55
+ - Never reference a prospect's layoffs as a buying signal
56
+ - Always include a calendar link
57
+ - Match Tenacious tone markers (professional, signal-specific, brief)
58
+
59
+ ## Dataset
60
+
61
+ [Tenacious-Bench v0.1](https://github.com/Meseretbolled/Sales-Agent-Evaluation-Bench) —
62
+ 238 tasks, 159 DPO preference pairs used for training.
63
 
64
+ ## Made with Unsloth
 
 
65
 
66
+ This model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth).
67
 
68
+ [![Made with Unsloth](https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20badge.png)](https://github.com/unslothai/unsloth)