WangYe007
/

RL

Model card Files Files and versions

RL / model /EasyR1 /assets /baselines.md

WangYe007's picture

Upload folder using huggingface_hub

d65b589 verified 17 days ago

|

history blame contribute delete

3.07 kB

	# Baselines

	Environment: [hiyouga/verl:ngc-th2.7.1-cu12.6-vllm0.10.0](https://hub.docker.com/layers/hiyouga/verl/ngc-th2.7.1-cu12.6-vllm0.10.0/images/sha256-cfc8c1ce3ea52dee0444f3e58e900d0b1d3b6b315deaf5f58c44b5fbb52fa989)

	EasyR1 version: [v0.3.2](https://github.com/hiyouga/EasyR1/tree/v0.3.2)

	Welcome to contribute new data points!

	## Algorithm Baselines

	### [Qwen2.5-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) on [Math12k](https://huggingface.co/datasets/hiyouga/math12k)

	\| Size \| Algorithm \| Bits \| LR \| KL \| Test Accuracy \|
	\| ---- \| ----------- \| ---- \| ---- \| ---- \| -------------------- \|
	\| 7B \| GRPO \| AMP \| 1e-6 \| 1e-2 \| 0.75 -> 0.77 (+0.02) \|

	### [Qwen2.5-VL-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on [Geometry3k](https://huggingface.co/datasets/hiyouga/geometry3k)

	\| Size \| Algorithm \| Bits \| LR \| KL \| Test Accuracy \|
	\| ---- \| ----------- \| ---- \| ---- \| ---- \| -------------------- \|
	\| 7B \| GRPO \| AMP \| 1e-6 \| 1e-2 \| 0.37 -> 0.48 (+0.11) \|
	\| 7B \| GRPO \| BF16 \| 1e-6 \| 1e-2 \| 0.37 -> 0.48 (+0.11) \|
	\| 7B \| DAPO \| AMP \| 1e-6 \| 1e-2 \| 0.37 -> 0.50 (+0.13) \|
	\| 3B \| GRPO \| AMP \| 1e-6 \| 1e-2 \| 0.24 -> 0.38 (+0.14) \|
	\| 32B \| GRPO \| BF16 \| 1e-6 \| 1e-2 \| 0.50 -> 0.56 (+0.06) \|

	> [!NOTE]
	> The hyper-parameters not listed are all the same as the default values.

	## Performance Baselines

	### [Qwen2.5-VL-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) on [Geometry3k](https://huggingface.co/datasets/hiyouga/geometry3k)

	\| Size \| GPU Type \| Bits \| Batch Size \| vLLM TP \| Peak Mem \| Peak VRAM \| Throughput \| Sec per step \| Actor MFU \|
	\| ---- \| ------------- \| ---- \| ---------- \| ------- \| -------- \| --------- \| ----------- \| ------------ \| --------- \|
	\| 3B \| 8 * H100 80GB \| AMP \| 1 / 2 \| 2 \| 120GB \| 54GB \| 1800 (+600) \| 120s \| 8.1% \|
	\| 7B \| 8 * H100 80GB \| AMP \| 1 / 2 \| 2 \| 120GB \| 68GB \| 1600 (+400) \| 145s \| 16.0% \|
	\| 7B \| 8 * H100 80GB \| AMP \| 4 / 8 \| 2 \| 200GB \| 72GB \| 2000 (+600) \| 120s \| 23.2% \|
	\| 7B \| 8 * L20 48GB \| AMP \| 1 / 2 \| 2 \| 120GB \| 42GB \| 410 (+0) \| 580s \| 26.5% \|
	\| 7B \| 8 * H100 80GB \| BF16 \| 1 / 2 \| 2 \| 120GB \| 58GB \| 1600 (+320) \| 145s \| 16.0% \|
	\| 32B \| 8 * H100 80GB \| BF16 \| 1 / 2 \| 8 \| 260GB \| 72GB \| 620 (+260) \| 530s \| 25.8% \|

	- Batch Size: micro_batch_size_per_device_for_update / micro_batch_size_per_device_for_experience
	- vLLM TP: rollout.tensor_parallel_size
	- Peak Mem: Peak CPU memory usage
	- Peak VRAM: Peak GPU memory usage
	- Throughput: Number of tokens per second per GPU by one training step (including the improvement compared to the [previous version](https://github.com/hiyouga/EasyR1/blob/v0.3.1/assets/baselines.md))
	- Sec per step: Average time per step in seconds

	> [!NOTE]
	> The hyper-parameters not listed are all the same as the default values.