feat: upload actual trained LoRA adapter (Qwen2.5-1.5B ORPO, 3 epochs, 36 steps)

f02a80f verified 18 days ago

6.09 kB

	---
	license: cc-by-4.0
	language:
	- en
	base_model: unsloth/Qwen2.5-1.5B-Instruct
	tags:
	- judge
	- b2b-sales
	- orpo
	- lora
	- preference-learning
	- tenacious-bench
	- evaluation
	- qwen2.5
	- unsloth
	datasets:
	- rafiakedir/tenacious-bench-v0.1
	---

	# Tenacious-Bench Judge — ORPO LoRA Adapter (Qwen2.5-1.5B)

	A rubric-aware scoring judge for B2B outbound sales emails, trained with ORPO on
	[Tenacious-Bench v0.1](https://huggingface.co/datasets/rafiakedir/tenacious-bench-v0.1)
	preference pairs. Deployed as a rejection-sampling gate in the Tenacious Conversion Engine.

	Base model: `unsloth/Qwen2.5-1.5B-Instruct`
	Adapter type: LoRA (PEFT) — load with base model + `PeftModel.from_pretrained`
	Training algorithm: ORPO (no reference model required)
	Precision: 4-bit quantized during training (Unsloth), fp16 for inference
	Training data: 94 ORPO preference pairs from `rafiakedir/tenacious-bench-v0.1` (train split)
	Training: 3 epochs · 36 steps · lr=8e-6 · beta=0.1 · LoRA r=16 alpha=32

	---

	## What It Scores

	\| Dimension \| Trigger Rate (Week 10 probes) \| Risk if Missed \|
	\|---\|---\|---\|
	\| `signal_grounding_fidelity` \| 35% \| CTO credibility loss \|
	\| `competitor_gap_honesty` \| 45% \| Irreversible brand damage \|
	\| `icp_segment_appropriateness` \| 20% \| ~$480K ACV per error \|
	\| `tone_preservation` \| 15% \| Brand voice violation \|
	\| `bench_commitment_honesty` \| 5% \| SOW-breach / delivery failure \|

	---

	## Quick Start — Inference

	```python
	import json, torch
	from transformers import AutoTokenizer, AutoModelForCausalLM
	from peft import PeftModel

	BASE_MODEL = "unsloth/Qwen2.5-1.5B-Instruct"
	ADAPTER_ID = "rafiakedir/tenacious-bench-adapter"

	tokenizer = AutoTokenizer.from_pretrained(ADAPTER_ID)
	base = AutoModelForCausalLM.from_pretrained(
	BASE_MODEL, torch_dtype=torch.float16, device_map="auto"
	)
	model = PeftModel.from_pretrained(base, ADAPTER_ID)
	model.eval()

	JUDGE_SYSTEM = (
	"You are a rubric-aware judge for Tenacious Consulting B2B outbound sales emails. "
	"Given a task context and a candidate email, score it on the specified rubric dimension. "
	"Respond with a JSON object only:\n"
	'{"dimension": "<dim>", "score": <0.0-1.0>, "pass": <true\|false>, "reasoning": "<one sentence>"}'
	)

	def judge(email, context, dimension):
	user = (
	f"EVALUATION DIMENSION: {dimension}\n\n"
	f"TASK CONTEXT:\n{context}\n\n"
	f"CANDIDATE EMAIL:\n{email}\n\n"
	f"Score this email on the {dimension} dimension."
	)
	msgs = [{"role": "system", "content": JUDGE_SYSTEM},
	{"role": "user", "content": user}]
	text = tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=True)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)
	with torch.no_grad():
	out = model.generate(**inputs, max_new_tokens=128, temperature=0.1, do_sample=True,
	pad_token_id=tokenizer.eos_token_id)
	resp = tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True).strip()
	s, e = resp.find("{"), resp.rfind("}") + 1
	return json.loads(resp[s:e]) if s >= 0 else {"score": 0.5, "raw": resp[:200]}

	result = judge(
	email="Casey — TalentBridge has 8 open AI/ML roles this quarter. 30-min scoping call: calendly.com/tenacious",
	context="company: TalentBridge, stage: Series A, open_roles: 8, confidence: high",
	dimension="signal_grounding_fidelity"
	)
	print(result)
	```

	---

	## Training Details

	\| Parameter \| Value \|
	\|---\|---\|
	\| Base model \| `unsloth/Qwen2.5-1.5B-Instruct` (4-bit during training) \|
	\| LoRA rank \| 16 \|
	\| LoRA alpha \| 32 \|
	\| Target modules \| q_proj, v_proj \|
	\| LoRA dropout \| 0.05 \|
	\| Learning rate \| 8e-6 \|
	\| Effective batch size \| 8 (batch=2, grad_accum=4) \|
	\| Epochs \| 3 \|
	\| Total steps \| 36 \|
	\| ORPO beta \| 0.1 \|
	\| Max sequence length \| 1024 \|
	\| Seed \| 42 \|

	Training loss: 2.8676 → 2.9646 → 2.9386 (3 checkpoints)
	Reward accuracy: 0.5375 → 0.6026 → 0.5128

	Training data: 94 preference pairs from the train partition. Preference leakage prevention:
	generator (DeepSeek V3.2) ≠ judge family (Claude Sonnet 4.6 / `scoring_evaluator.py`).
	All generation decisions logged in the dataset repo at `training_data/generation_log.jsonl`.

	---

	## Evaluation Results

	Evaluated on 59 held-out tasks from `rafiakedir/tenacious-bench-v0.1`.
	Full results in `ablation_results.json` in the dataset repo.

	Deployment recommendation: Run `ablations/run_ablations.py` with this adapter to get Delta A.
	The ablation script loads this adapter via HuggingFace — requires GPU + transformers + peft.

	---

	## Known Limitations

	1. Dimension coverage gap. 0 training pairs for `bench_commitment_honesty`, 4 for `icp_segment_appropriateness` due to scoring key mismatch during pair construction. The model received zero gradient signal on bench commitment honesty.

	2. Backbone below Prometheus-2 threshold. Prometheus-2 demonstrated rubric-matching at 7B+ parameters. At 1.5B the model may underfit multi-dimension generalization.

	3. Synthetic training distribution. All pairs derive from synthetic prospect briefs and LLM-generated emails.

	4. Static bench_summary. Judge calibration drifts as real bench composition changes weekly.

	---

	## Files

	\| File \| Description \|
	\|---\|---\|
	\| `adapter_config.json` \| LoRA configuration (r=16, alpha=32, q_proj+v_proj) \|
	\| `adapter_model.safetensors` \| Trained LoRA weights (8.4 MB) \|
	\| `tokenizer.json`, `tokenizer_config.json` \| Tokenizer (ChatML format) \|
	\| `run_on_colab.ipynb` \| End-to-end training + push notebook \|
	\| `train_judge.py` \| Training script \|
	\| `inference_example.py` \| Per-dimension and all-dimension scoring helper \|

	Training data: [rafiakedir/tenacious-bench-v0.1](https://huggingface.co/datasets/rafiakedir/tenacious-bench-v0.1)

	---

	## Citation

	```bibtex
	@misc{tenacious-bench-adapter-2026,
	title = {Tenacious-Bench Judge: ORPO LoRA Adapter for B2B Sales Evaluation},
	author = {Kedir, Rafia},
	year = {2026},
	url = {https://huggingface.co/rafiakedir/tenacious-bench-adapter}
	}
	```