Safetensors
English
qwen3_5
judge
b2b-sales
orpo
lora
preference-learning
tenacious-bench
evaluation
qwen2.5
unsloth
Instructions to use rafiakedir/tenacious-bench-adapter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use rafiakedir/tenacious-bench-adapter with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rafiakedir/tenacious-bench-adapter to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rafiakedir/tenacious-bench-adapter to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rafiakedir/tenacious-bench-adapter to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="rafiakedir/tenacious-bench-adapter", max_seq_length=2048, )
File size: 6,093 Bytes
f6f0607 e1374e8 f02a80f f6f0607 e1374e8 f02a80f e1374e8 f02a80f f6f0607 e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f6f0607 e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f02a80f e1374e8 f6f0607 e1374e8 f6f0607 e1374e8 f02a80f e1374e8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 | ---
license: cc-by-4.0
language:
- en
base_model: unsloth/Qwen2.5-1.5B-Instruct
tags:
- judge
- b2b-sales
- orpo
- lora
- preference-learning
- tenacious-bench
- evaluation
- qwen2.5
- unsloth
datasets:
- rafiakedir/tenacious-bench-v0.1
---
# Tenacious-Bench Judge — ORPO LoRA Adapter (Qwen2.5-1.5B)
A rubric-aware scoring judge for B2B outbound sales emails, trained with ORPO on
[Tenacious-Bench v0.1](https://huggingface.co/datasets/rafiakedir/tenacious-bench-v0.1)
preference pairs. Deployed as a **rejection-sampling gate** in the Tenacious Conversion Engine.
**Base model:** `unsloth/Qwen2.5-1.5B-Instruct`
**Adapter type:** LoRA (PEFT) — load with base model + `PeftModel.from_pretrained`
**Training algorithm:** ORPO (no reference model required)
**Precision:** 4-bit quantized during training (Unsloth), fp16 for inference
**Training data:** 94 ORPO preference pairs from `rafiakedir/tenacious-bench-v0.1` (train split)
**Training:** 3 epochs · 36 steps · lr=8e-6 · beta=0.1 · LoRA r=16 alpha=32
---
## What It Scores
| Dimension | Trigger Rate (Week 10 probes) | Risk if Missed |
|---|---|---|
| `signal_grounding_fidelity` | 35% | CTO credibility loss |
| `competitor_gap_honesty` | 45% | Irreversible brand damage |
| `icp_segment_appropriateness` | 20% | ~$480K ACV per error |
| `tone_preservation` | 15% | Brand voice violation |
| `bench_commitment_honesty` | 5% | SOW-breach / delivery failure |
---
## Quick Start — Inference
```python
import json, torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
BASE_MODEL = "unsloth/Qwen2.5-1.5B-Instruct"
ADAPTER_ID = "rafiakedir/tenacious-bench-adapter"
tokenizer = AutoTokenizer.from_pretrained(ADAPTER_ID)
base = AutoModelForCausalLM.from_pretrained(
BASE_MODEL, torch_dtype=torch.float16, device_map="auto"
)
model = PeftModel.from_pretrained(base, ADAPTER_ID)
model.eval()
JUDGE_SYSTEM = (
"You are a rubric-aware judge for Tenacious Consulting B2B outbound sales emails. "
"Given a task context and a candidate email, score it on the specified rubric dimension. "
"Respond with a JSON object only:\n"
'{"dimension": "<dim>", "score": <0.0-1.0>, "pass": <true|false>, "reasoning": "<one sentence>"}'
)
def judge(email, context, dimension):
user = (
f"EVALUATION DIMENSION: {dimension}\n\n"
f"TASK CONTEXT:\n{context}\n\n"
f"CANDIDATE EMAIL:\n{email}\n\n"
f"Score this email on the {dimension} dimension."
)
msgs = [{"role": "system", "content": JUDGE_SYSTEM},
{"role": "user", "content": user}]
text = tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=128, temperature=0.1, do_sample=True,
pad_token_id=tokenizer.eos_token_id)
resp = tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True).strip()
s, e = resp.find("{"), resp.rfind("}") + 1
return json.loads(resp[s:e]) if s >= 0 else {"score": 0.5, "raw": resp[:200]}
result = judge(
email="Casey — TalentBridge has 8 open AI/ML roles this quarter. 30-min scoping call: calendly.com/tenacious",
context="company: TalentBridge, stage: Series A, open_roles: 8, confidence: high",
dimension="signal_grounding_fidelity"
)
print(result)
```
---
## Training Details
| Parameter | Value |
|---|---|
| Base model | `unsloth/Qwen2.5-1.5B-Instruct` (4-bit during training) |
| LoRA rank | 16 |
| LoRA alpha | 32 |
| Target modules | q_proj, v_proj |
| LoRA dropout | 0.05 |
| Learning rate | 8e-6 |
| Effective batch size | 8 (batch=2, grad_accum=4) |
| Epochs | 3 |
| Total steps | 36 |
| ORPO beta | 0.1 |
| Max sequence length | 1024 |
| Seed | 42 |
**Training loss:** 2.8676 → 2.9646 → 2.9386 (3 checkpoints)
**Reward accuracy:** 0.5375 → 0.6026 → 0.5128
**Training data:** 94 preference pairs from the train partition. Preference leakage prevention:
generator (DeepSeek V3.2) ≠ judge family (Claude Sonnet 4.6 / `scoring_evaluator.py`).
All generation decisions logged in the dataset repo at `training_data/generation_log.jsonl`.
---
## Evaluation Results
Evaluated on 59 held-out tasks from `rafiakedir/tenacious-bench-v0.1`.
Full results in `ablation_results.json` in the dataset repo.
**Deployment recommendation:** Run `ablations/run_ablations.py` with this adapter to get Delta A.
The ablation script loads this adapter via HuggingFace — requires GPU + transformers + peft.
---
## Known Limitations
1. **Dimension coverage gap.** 0 training pairs for `bench_commitment_honesty`, 4 for `icp_segment_appropriateness` due to scoring key mismatch during pair construction. The model received zero gradient signal on bench commitment honesty.
2. **Backbone below Prometheus-2 threshold.** Prometheus-2 demonstrated rubric-matching at 7B+ parameters. At 1.5B the model may underfit multi-dimension generalization.
3. **Synthetic training distribution.** All pairs derive from synthetic prospect briefs and LLM-generated emails.
4. **Static bench_summary.** Judge calibration drifts as real bench composition changes weekly.
---
## Files
| File | Description |
|---|---|
| `adapter_config.json` | LoRA configuration (r=16, alpha=32, q_proj+v_proj) |
| `adapter_model.safetensors` | Trained LoRA weights (8.4 MB) |
| `tokenizer.json`, `tokenizer_config.json` | Tokenizer (ChatML format) |
| `run_on_colab.ipynb` | End-to-end training + push notebook |
| `train_judge.py` | Training script |
| `inference_example.py` | Per-dimension and all-dimension scoring helper |
Training data: [rafiakedir/tenacious-bench-v0.1](https://huggingface.co/datasets/rafiakedir/tenacious-bench-v0.1)
---
## Citation
```bibtex
@misc{tenacious-bench-adapter-2026,
title = {Tenacious-Bench Judge: ORPO LoRA Adapter for B2B Sales Evaluation},
author = {Kedir, Rafia},
year = {2026},
url = {https://huggingface.co/rafiakedir/tenacious-bench-adapter}
}
```
|