tenacious-orpo-adapter

LoRA adapter (ORPO) for Tenacious Consulting B2B sales agent quality scoring.

Training

  • Backbone: Qwen2.5-0.5B-Instruct
  • Method: ORPO (reference-free preference optimization)
  • Training pairs: 112 preference pairs from Tenacious discovery-call transcripts
  • Final loss: 4.24 | Rewards/accuracies: 0.40
  • Delta A vs baseline: +0.300 (banned phrase elimination)
  • Delta B vs prompt-opt: -0.100 (prompt engineering beats training on this task)

Result

Honest finding: training lifted banned phrase compliance (+0.30 Delta A) but prompt optimization achieves higher scores at lower cost (Delta B = -0.100). This is a legitimate publishable finding — training is not always the answer.

Repo

https://github.com/ephrata1888/tenacious-bench

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support