SpeciaRL_SpeciaRL_mixed

This repository provides the LoRA adapter for SpeciaRL trained using the mixture of all available domains, more information in SpeciaRL. Built on top of Qwen/Qwen2.5-VL-7B-Instruct.

Training hyperparameters

  • algorithm: GRPO
  • reward_function: best prediction (dedup)
  • learning_rate: 3e-5
  • train_batch_size: 256
  • max_prompt_length: 2048
  • max_response_length: 2048
  • lora_rank: 64
  • lora_alpha: 32
  • target_modules: all-linear (visual layers excluded)
  • kl_loss: True (coef=0.01, type=low_var_kl)
  • rollout_n: 10
  • num_gpus: 4
  • total_epochs: 15

Framework versions

  • VERL
  • PEFT 0.17.1
  • Transformers 4.57.0
  • PyTorch 2.6.0+cu124
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for s-angheben/SpeciaRL_qwen2_5vl-7b_SpeciaRL_mixed

Adapter
(245)
this model

Collection including s-angheben/SpeciaRL_qwen2_5vl-7b_SpeciaRL_mixed

Paper for s-angheben/SpeciaRL_qwen2_5vl-7b_SpeciaRL_mixed