Parallel-R1
/

Parallel-R1-Unseen_Step_200

Model card Files Files and versions

Parallel-R1-Unseen_Step_200 / README.md

TongZheng1999's picture

Create README.md

ee90a38 verified 6 months ago

|

history blame contribute delete

517 Bytes

	---
	license: mit
	datasets:
	- Leo-Dai/dapo-math-17k_dedup
	---
	# 🧠 Parallel-R1-Unseen_Step_200

	> Mid-Training Checkpoint of Parallel-R1: Towards Parallel Thinking via Reinforcement Learning
	> Stage: After 200 RL steps via alternating rewards — showing the adaptive parallel reasoning ability and serve as structure exploration stage.

	This checkpoint aims to help you reproduce experimental results in Section 4.5: Extra Bonus: Parallel Thinking as a Mid-Training Exploration Strategy for RL Training.