Sentence Selection ORPO v3

LoRA adapter for debate card cutting / sentence selection task.

Model Description

This is an ORPO-trained LoRA adapter for selecting relevant sentences from evidence to support debate claims. The model was trained on 507 DPO pairs using ORPO (Odds Ratio Preference Optimization).

Base Model: Qwen/Qwen3-30B-A3B (with SFT fine-tuning)

Training Details

  • Method: ORPO (Odds Ratio Preference Optimization)
  • Training Data: 507 DPO pairs (456 train, 51 validation)
  • Learning Rate: 5e-7
  • LoRA Rank: 32
  • LoRA Alpha: 64
  • Target Modules: q_proj, k_proj, v_proj, o_proj
  • Epochs: ~0.35 (checkpoint at step 40)

Training Metrics

Checkpoint Epoch Eval Accuracy Eval Loss Eval Margin
Step 10 0.09 64.7% 2.481 +0.023
Step 20 0.18 64.7% 2.481 +0.023
Step 30 0.26 64.7% 2.482 +0.023
Step 40 0.35 64.7% 2.482 +0.023

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-30B-A3B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-30B-A3B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "debaterhub/sentence-selection-orpo-v3")

Task Format

Input format:

Select sentences supporting:
Claim: [claim text]

TEXT ([citation]):
[1] First sentence.
[2] Second sentence.
...

Expected output:

Selected IDs: [1, 3, 5]

License

Apache 2.0

Framework Versions

  • PEFT 0.15.2
  • Transformers 4.57.3
  • PyTorch 2.9.0
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for debaterhub/sentence-selection-orpo-v3

Adapter
(39)
this model