tenacious-orpo-adapter
LoRA adapter (ORPO) for Tenacious Consulting B2B sales agent quality scoring.
Training
- Backbone: Qwen2.5-0.5B-Instruct
- Method: ORPO (reference-free preference optimization)
- Training pairs: 112 preference pairs from Tenacious discovery-call transcripts
- Final loss: 4.24 | Rewards/accuracies: 0.40
- Delta A vs baseline: +0.300 (banned phrase elimination)
- Delta B vs prompt-opt: -0.100 (prompt engineering beats training on this task)
Result
Honest finding: training lifted banned phrase compliance (+0.30 Delta A) but prompt optimization achieves higher scores at lower cost (Delta B = -0.100). This is a legitimate publishable finding — training is not always the answer.
Repo
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support