grpo_0328_step200
LoRA adapter trained from Qwen/Qwen3-VL-8B-Instruct.
Usage
Load this adapter with PEFT on top of the base model:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = "Qwen/Qwen3-VL-8B-Instruct"
adapter_repo = "elaine1wan/grpo_0328_step200"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model)
model = PeftModel.from_pretrained(model, adapter_repo)
Training checkpoint
Exported from:
verl/checkpoints/qwen3_vl_8b_grpo_multiturn_new_0328_run3/global_step_200/actor/lora_adapter
- Downloads last month
- 19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for elaine1wan/grpo_0328_step200
Base model
Qwen/Qwen3-VL-8B-Instruct