SpeciaRL_SpeciaRL_mixed

This repository provides the LoRA adapter for SpeciaRL trained using the mixture of all available domains, more information in SpeciaRL. Built on top of Qwen/Qwen2.5-VL-7B-Instruct.

Training hyperparameters

algorithm: GRPO
reward_function: best prediction (dedup)
learning_rate: 3e-5
train_batch_size: 256
max_prompt_length: 2048
max_response_length: 2048
lora_rank: 64
lora_alpha: 32
target_modules: all-linear (visual layers excluded)
kl_loss: True (coef=0.01, type=low_var_kl)
rollout_n: 10
num_gpus: 4
total_epochs: 15

Framework versions

VERL
PEFT 0.17.1
Transformers 4.57.0
PyTorch 2.6.0+cu124

Downloads last month: 2

Model tree for s-angheben/SpeciaRL_qwen2_5vl-7b_SpeciaRL_mixed

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Adapter

(245)

this model

Collection including s-angheben/SpeciaRL_qwen2_5vl-7b_SpeciaRL_mixed

SpeciaRL

Collection

Collection of Specificity-aware reinforcement learning for fine-grained open-world classification • 5 items • Updated Mar 15

Paper for s-angheben/SpeciaRL_qwen2_5vl-7b_SpeciaRL_mixed

Specificity-aware reinforcement learning for fine-grained open-world classification

Paper • 2603.03197 • Published Mar 3 • 16