Safetensors
English
qwen3_5
judge
b2b-sales
orpo
lora
preference-learning
tenacious-bench
evaluation
qwen2.5
unsloth
Instructions to use rafiakedir/tenacious-bench-adapter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use rafiakedir/tenacious-bench-adapter with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rafiakedir/tenacious-bench-adapter to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rafiakedir/tenacious-bench-adapter to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rafiakedir/tenacious-bench-adapter to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="rafiakedir/tenacious-bench-adapter", max_seq_length=2048, )
| { | |
| "model_id": "unsloth/Qwen2.5-1.5B-Instruct", | |
| "training_algorithm": "ORPO", | |
| "lora": { | |
| "r": 16, | |
| "lora_alpha": 32, | |
| "target_modules": ["q_proj", "v_proj"], | |
| "lora_dropout": 0.05, | |
| "bias": "none", | |
| "task_type": "CAUSAL_LM" | |
| }, | |
| "orpo_trainer": { | |
| "learning_rate": 8e-6, | |
| "per_device_train_batch_size": 2, | |
| "gradient_accumulation_steps": 4, | |
| "effective_batch_size": 8, | |
| "num_train_epochs": 3, | |
| "warmup_ratio": 0.1, | |
| "lr_scheduler_type": "cosine", | |
| "beta": 0.1, | |
| "max_length": 1024, | |
| "max_prompt_length": 512, | |
| "logging_steps": 10, | |
| "save_steps": 50, | |
| "seed": 42 | |
| }, | |
| "precision": { | |
| "bf16": false, | |
| "fp16": true, | |
| "note": "T4 GPU: fp16 only. Switch to bf16 on A100/4090." | |
| }, | |
| "adapter_output_dir": "training/adapter", | |
| "hub_model_id": "rafiakedir/tenacious-bench-adapter", | |
| "fixed_seed": 42, | |
| "rationale": { | |
| "orpo_vs_dpo": "ORPO chosen over DPO because it requires no reference model, reducing GPU memory footprint by ~40% on T4. Reference-free approach is appropriate for a judge component where the reference policy is undefined.", | |
| "backbone_choice": "Qwen2.5-1.5B-Instruct selected per Prometheus-2 paper (Kim et al., 2024) showing 7B-class judge viability at 1.5B with preference tuning.", | |
| "lora_rank": "Rank 16 with alpha 32 (2:1 ratio) is standard for task-specific adaptation. Rank 8 was considered but judge rubric complexity warrants higher rank.", | |
| "beta_orpo": "Beta=0.1 follows ORPO paper (Hong et al., 2024) recommendation for instruction-following tasks." | |
| } | |
| } | |