WangYe007
/

RL

Model card Files Files and versions

RL / model /EasyR1 /assets /baselines.md

WangYe007's picture

Upload folder using huggingface_hub

d65b589 verified 16 days ago

|

history blame contribute delete

3.07 kB

Baselines

Environment: hiyouga/verl:ngc-th2.7.1-cu12.6-vllm0.10.0

EasyR1 version: v0.3.2

Welcome to contribute new data points!

Algorithm Baselines

Qwen2.5-Instruct on Math12k

Size	Algorithm	Bits	LR	KL	Test Accuracy
7B	GRPO	AMP	1e-6	1e-2	0.75 -> 0.77 (+0.02)

Qwen2.5-VL-Instruct on Geometry3k

Size	Algorithm	Bits	LR	KL	Test Accuracy
7B	GRPO	AMP	1e-6	1e-2	0.37 -> 0.48 (+0.11)
7B	GRPO	BF16	1e-6	1e-2	0.37 -> 0.48 (+0.11)
7B	DAPO	AMP	1e-6	1e-2	0.37 -> 0.50 (+0.13)
3B	GRPO	AMP	1e-6	1e-2	0.24 -> 0.38 (+0.14)
32B	GRPO	BF16	1e-6	1e-2	0.50 -> 0.56 (+0.06)

The hyper-parameters not listed are all the same as the default values.

Performance Baselines

Qwen2.5-VL-Instruct on Geometry3k

Size	GPU Type	Bits	Batch Size	vLLM TP	Peak Mem	Peak VRAM	Throughput	Sec per step	Actor MFU
3B	8 * H100 80GB	AMP	1 / 2	2	120GB	54GB	1800 (+600)	120s	8.1%
7B	8 * H100 80GB	AMP	1 / 2	2	120GB	68GB	1600 (+400)	145s	16.0%
7B	8 * H100 80GB	AMP	4 / 8	2	200GB	72GB	2000 (+600)	120s	23.2%
7B	8 * L20 48GB	AMP	1 / 2	2	120GB	42GB	410 (+0)	580s	26.5%
7B	8 * H100 80GB	BF16	1 / 2	2	120GB	58GB	1600 (+320)	145s	16.0%
32B	8 * H100 80GB	BF16	1 / 2	8	260GB	72GB	620 (+260)	530s	25.8%

Batch Size: micro_batch_size_per_device_for_update / micro_batch_size_per_device_for_experience
vLLM TP: rollout.tensor_parallel_size
Peak Mem: Peak CPU memory usage
Peak VRAM: Peak GPU memory usage
Throughput: Number of tokens per second per GPU by one training step (including the improvement compared to the previous version)
Sec per step: Average time per step in seconds

The hyper-parameters not listed are all the same as the default values.