tenacious-orpo-adapter

LoRA adapter (ORPO) for Tenacious Consulting B2B sales agent quality scoring.

Training

Backbone: Qwen2.5-0.5B-Instruct
Method: ORPO (reference-free preference optimization)
Training pairs: 112 preference pairs from Tenacious discovery-call transcripts
Final loss: 4.24 | Rewards/accuracies: 0.40
Delta A vs baseline: +0.300 (banned phrase elimination)
Delta B vs prompt-opt: -0.100 (prompt engineering beats training on this task)

Result

Honest finding: training lifted banned phrase compliance (+0.30 Delta A) but prompt optimization achieves higher scores at lower cost (Delta B = -0.100). This is a legitimate publishable finding — training is not always the answer.

Repo

https://github.com/ephrata1888/tenacious-bench

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support