RL / model /EasyR1 /assets /baselines.md
WangYe007's picture
Upload folder using huggingface_hub
d65b589 verified

Baselines

Environment: hiyouga/verl:ngc-th2.7.1-cu12.6-vllm0.10.0

EasyR1 version: v0.3.2

Welcome to contribute new data points!

Algorithm Baselines

Qwen2.5-Instruct on Math12k

Size Algorithm Bits LR KL Test Accuracy
7B GRPO AMP 1e-6 1e-2 0.75 -> 0.77 (+0.02)

Qwen2.5-VL-Instruct on Geometry3k

Size Algorithm Bits LR KL Test Accuracy
7B GRPO AMP 1e-6 1e-2 0.37 -> 0.48 (+0.11)
7B GRPO BF16 1e-6 1e-2 0.37 -> 0.48 (+0.11)
7B DAPO AMP 1e-6 1e-2 0.37 -> 0.50 (+0.13)
3B GRPO AMP 1e-6 1e-2 0.24 -> 0.38 (+0.14)
32B GRPO BF16 1e-6 1e-2 0.50 -> 0.56 (+0.06)

The hyper-parameters not listed are all the same as the default values.

Performance Baselines

Qwen2.5-VL-Instruct on Geometry3k

Size GPU Type Bits Batch Size vLLM TP Peak Mem Peak VRAM Throughput Sec per step Actor MFU
3B 8 * H100 80GB AMP 1 / 2 2 120GB 54GB 1800 (+600) 120s 8.1%
7B 8 * H100 80GB AMP 1 / 2 2 120GB 68GB 1600 (+400) 145s 16.0%
7B 8 * H100 80GB AMP 4 / 8 2 200GB 72GB 2000 (+600) 120s 23.2%
7B 8 * L20 48GB AMP 1 / 2 2 120GB 42GB 410 (+0) 580s 26.5%
7B 8 * H100 80GB BF16 1 / 2 2 120GB 58GB 1600 (+320) 145s 16.0%
32B 8 * H100 80GB BF16 1 / 2 8 260GB 72GB 620 (+260) 530s 25.8%
  • Batch Size: micro_batch_size_per_device_for_update / micro_batch_size_per_device_for_experience
  • vLLM TP: rollout.tensor_parallel_size
  • Peak Mem: Peak CPU memory usage
  • Peak VRAM: Peak GPU memory usage
  • Throughput: Number of tokens per second per GPU by one training step (including the improvement compared to the previous version)
  • Sec per step: Average time per step in seconds

The hyper-parameters not listed are all the same as the default values.