r2egym_8b_rope_65k-step17

RL-trained Qwen3-8B on R2EGym tasks (65k context with YaRN rope scaling, 17 steps).

Training Details

Parameter Value
Base model laion/GLM-4_7-r2egym_sandboxes-maxeps-131k-lc (Qwen3-8B SFT, long-context)
Dataset R2EGym GPT5/Codex solved tasks (1,785 tasks)
Algorithm RLOO-N (Leave-One-Out with neutral masking)
Learning rate 3.0e-5
Train batch size 32
Samples per prompt 8
Max episodes 64
Max generate length 8,192 tokens
Max input tokens 57,344
Max model length 65,536
Rope scaling YaRN (factor=4.0, original_max_pos_emb=32,768)
KL loss Disabled
Reward shaping Enabled (pass_ratio)
Staleness steps 16
Policy nodes 2 (8 GPUs, FSDP2)
Inference engines 20 (TP=1)
Training steps 17
Framework BenSkyRL + Harbor

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("laion/r2egym_8b_rope_65k-step17")
tokenizer = AutoTokenizer.from_pretrained("laion/r2egym_8b_rope_65k-step17")
Downloads last month
3
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for laion/r2egym_8b_rope_65k-step17

Finetuned
Qwen/Qwen3-8B
Finetuned
(2)
this model

Dataset used to train laion/r2egym_8b_rope_65k-step17