--- license: apache-2.0 base_model: Qwen/Qwen3-VL-8B-Instruct library_name: transformers tags: - grpo - trl - video - video-text-to-text - planner - long-video pipeline_tag: video-text-to-text --- # ToolMerge planner — GRPO-finetuned Qwen3-VL-8B (step 50) GRPO-finetuned planner from [Qwen3-VL-8B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct), used as the text-only query decomposer in the ToolMerge keyframe-retrieval pipeline. Trained with TRL's GRPO trainer on Molmo-2 Moments (M2M) training data, optimizing the `frames-in-GT` + `consistency` reward at `global_step=50`. ## Quick start ```python from transformers import AutoProcessor, AutoModelForCausalLM processor = AutoProcessor.from_pretrained("michalsr/toolmerge-planner-grpo") model = AutoModelForCausalLM.from_pretrained( "michalsr/toolmerge-planner-grpo", torch_dtype="bfloat16", ) ``` To use inside ToolMerge, override the planner checkpoint at the CLI: ```bash toolmerge config=configs/m2m/qwen3_8.yaml \ model.base=michalsr/toolmerge-planner-grpo ``` ## Training recipe | Setting | Value | |---|---| | Base model | `Qwen/Qwen3-VL-8B-Instruct` | | Reward | `frames_in_gt=1.0`, `consistency=1.0` | | Training data | `train_correct_uniform_8f_clip_max1.json` (filtered M2M train split, ~1500 items) | | Optimizer | `paged_adamw_8bit`, lr=1e-6, bf16 | | Compute | 2 nodes × 4 GPUs | | Step | `global_step=50` | | Framework | TRL 0.27.2, transformers 4.57.6, PyTorch 2.10.0 | Full training config: [`training/configs/m2m_grpo.yaml`](https://github.com/michalsr/ToolMerge/blob/main/training/configs/m2m_grpo.yaml) in the ToolMerge repo. ## Citation ```bibtex @misc{shlapentokhrothman2026decomposingqueriestoolcalls, title = {Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval}, author = {Michal Shlapentokh-Rothman and Prachi Garg and Yu-Xiong Wang and Derek Hoiem}, year = {2026}, eprint = {2605.23826}, archivePrefix = {arXiv}, primaryClass = {cs.CV}, url = {https://arxiv.org/abs/2605.23826}, } ``` Cite the GRPO method: ```bibtex @article{shao2024deepseekmath, title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}}, author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo}, year = 2024, eprint = {arXiv:2402.03300}, } ``` Code repo: .