Safetensors
English
qwen3_5
judge
b2b-sales
orpo
lora
preference-learning
tenacious-bench
evaluation
qwen2.5
unsloth
Instructions to use rafiakedir/tenacious-bench-adapter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use rafiakedir/tenacious-bench-adapter with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rafiakedir/tenacious-bench-adapter to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rafiakedir/tenacious-bench-adapter to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rafiakedir/tenacious-bench-adapter to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="rafiakedir/tenacious-bench-adapter", max_seq_length=2048, )
| license: cc-by-4.0 | |
| language: | |
| - en | |
| base_model: unsloth/Qwen2.5-1.5B-Instruct | |
| tags: | |
| - judge | |
| - b2b-sales | |
| - orpo | |
| - lora | |
| - preference-learning | |
| - tenacious-bench | |
| - evaluation | |
| - qwen2.5 | |
| - unsloth | |
| datasets: | |
| - rafiakedir/tenacious-bench-v0.1 | |
| # Tenacious-Bench Judge — ORPO LoRA Adapter (Qwen2.5-1.5B) | |
| A rubric-aware scoring judge for B2B outbound sales emails, trained with ORPO on | |
| [Tenacious-Bench v0.1](https://huggingface.co/datasets/rafiakedir/tenacious-bench-v0.1) | |
| preference pairs. Deployed as a **rejection-sampling gate** in the Tenacious Conversion Engine. | |
| **Base model:** `unsloth/Qwen2.5-1.5B-Instruct` | |
| **Adapter type:** LoRA (PEFT) — load with base model + `PeftModel.from_pretrained` | |
| **Training algorithm:** ORPO (no reference model required) | |
| **Precision:** 4-bit quantized during training (Unsloth), fp16 for inference | |
| **Training data:** 94 ORPO preference pairs from `rafiakedir/tenacious-bench-v0.1` (train split) | |
| **Training:** 3 epochs · 36 steps · lr=8e-6 · beta=0.1 · LoRA r=16 alpha=32 | |
| --- | |
| ## What It Scores | |
| | Dimension | Trigger Rate (Week 10 probes) | Risk if Missed | | |
| |---|---|---| | |
| | `signal_grounding_fidelity` | 35% | CTO credibility loss | | |
| | `competitor_gap_honesty` | 45% | Irreversible brand damage | | |
| | `icp_segment_appropriateness` | 20% | ~$480K ACV per error | | |
| | `tone_preservation` | 15% | Brand voice violation | | |
| | `bench_commitment_honesty` | 5% | SOW-breach / delivery failure | | |
| --- | |
| ## Quick Start — Inference | |
| ```python | |
| import json, torch | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| from peft import PeftModel | |
| BASE_MODEL = "unsloth/Qwen2.5-1.5B-Instruct" | |
| ADAPTER_ID = "rafiakedir/tenacious-bench-adapter" | |
| tokenizer = AutoTokenizer.from_pretrained(ADAPTER_ID) | |
| base = AutoModelForCausalLM.from_pretrained( | |
| BASE_MODEL, torch_dtype=torch.float16, device_map="auto" | |
| ) | |
| model = PeftModel.from_pretrained(base, ADAPTER_ID) | |
| model.eval() | |
| JUDGE_SYSTEM = ( | |
| "You are a rubric-aware judge for Tenacious Consulting B2B outbound sales emails. " | |
| "Given a task context and a candidate email, score it on the specified rubric dimension. " | |
| "Respond with a JSON object only:\n" | |
| '{"dimension": "<dim>", "score": <0.0-1.0>, "pass": <true|false>, "reasoning": "<one sentence>"}' | |
| ) | |
| def judge(email, context, dimension): | |
| user = ( | |
| f"EVALUATION DIMENSION: {dimension}\n\n" | |
| f"TASK CONTEXT:\n{context}\n\n" | |
| f"CANDIDATE EMAIL:\n{email}\n\n" | |
| f"Score this email on the {dimension} dimension." | |
| ) | |
| msgs = [{"role": "system", "content": JUDGE_SYSTEM}, | |
| {"role": "user", "content": user}] | |
| text = tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True) | |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) | |
| with torch.no_grad(): | |
| out = model.generate(**inputs, max_new_tokens=128, temperature=0.1, do_sample=True, | |
| pad_token_id=tokenizer.eos_token_id) | |
| resp = tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True).strip() | |
| s, e = resp.find("{"), resp.rfind("}") + 1 | |
| return json.loads(resp[s:e]) if s >= 0 else {"score": 0.5, "raw": resp[:200]} | |
| result = judge( | |
| email="Casey — TalentBridge has 8 open AI/ML roles this quarter. 30-min scoping call: calendly.com/tenacious", | |
| context="company: TalentBridge, stage: Series A, open_roles: 8, confidence: high", | |
| dimension="signal_grounding_fidelity" | |
| ) | |
| print(result) | |
| ``` | |
| --- | |
| ## Training Details | |
| | Parameter | Value | | |
| |---|---| | |
| | Base model | `unsloth/Qwen2.5-1.5B-Instruct` (4-bit during training) | | |
| | LoRA rank | 16 | | |
| | LoRA alpha | 32 | | |
| | Target modules | q_proj, v_proj | | |
| | LoRA dropout | 0.05 | | |
| | Learning rate | 8e-6 | | |
| | Effective batch size | 8 (batch=2, grad_accum=4) | | |
| | Epochs | 3 | | |
| | Total steps | 36 | | |
| | ORPO beta | 0.1 | | |
| | Max sequence length | 1024 | | |
| | Seed | 42 | | |
| **Training loss:** 2.8676 → 2.9646 → 2.9386 (3 checkpoints) | |
| **Reward accuracy:** 0.5375 → 0.6026 → 0.5128 | |
| **Training data:** 94 preference pairs from the train partition. Preference leakage prevention: | |
| generator (DeepSeek V3.2) ≠ judge family (Claude Sonnet 4.6 / `scoring_evaluator.py`). | |
| All generation decisions logged in the dataset repo at `training_data/generation_log.jsonl`. | |
| --- | |
| ## Evaluation Results | |
| Evaluated on 59 held-out tasks from `rafiakedir/tenacious-bench-v0.1`. | |
| Full results in `ablation_results.json` in the dataset repo. | |
| **Deployment recommendation:** Run `ablations/run_ablations.py` with this adapter to get Delta A. | |
| The ablation script loads this adapter via HuggingFace — requires GPU + transformers + peft. | |
| --- | |
| ## Known Limitations | |
| 1. **Dimension coverage gap.** 0 training pairs for `bench_commitment_honesty`, 4 for `icp_segment_appropriateness` due to scoring key mismatch during pair construction. The model received zero gradient signal on bench commitment honesty. | |
| 2. **Backbone below Prometheus-2 threshold.** Prometheus-2 demonstrated rubric-matching at 7B+ parameters. At 1.5B the model may underfit multi-dimension generalization. | |
| 3. **Synthetic training distribution.** All pairs derive from synthetic prospect briefs and LLM-generated emails. | |
| 4. **Static bench_summary.** Judge calibration drifts as real bench composition changes weekly. | |
| --- | |
| ## Files | |
| | File | Description | | |
| |---|---| | |
| | `adapter_config.json` | LoRA configuration (r=16, alpha=32, q_proj+v_proj) | | |
| | `adapter_model.safetensors` | Trained LoRA weights (8.4 MB) | | |
| | `tokenizer.json`, `tokenizer_config.json` | Tokenizer (ChatML format) | | |
| | `run_on_colab.ipynb` | End-to-end training + push notebook | | |
| | `train_judge.py` | Training script | | |
| | `inference_example.py` | Per-dimension and all-dimension scoring helper | | |
| Training data: [rafiakedir/tenacious-bench-v0.1](https://huggingface.co/datasets/rafiakedir/tenacious-bench-v0.1) | |
| --- | |
| ## Citation | |
| ```bibtex | |
| @misc{tenacious-bench-adapter-2026, | |
| title = {Tenacious-Bench Judge: ORPO LoRA Adapter for B2B Sales Evaluation}, | |
| author = {Kedir, Rafia}, | |
| year = {2026}, | |
| url = {https://huggingface.co/rafiakedir/tenacious-bench-adapter} | |
| } | |
| ``` | |