| --- |
| license: mit |
| datasets: |
| - Leo-Dai/dapo-math-17k_dedup |
| --- |
| # 🧠 Parallel-R1-Unseen_Step_200 |
|
|
| > **Mid-Training Checkpoint of Parallel-R1: Towards Parallel Thinking via Reinforcement Learning** |
| > Stage: **After 200 RL steps via alternating rewards** — showing the adaptive parallel reasoning ability and serve as structure exploration stage. |
|
|
| This checkpoint aims to help you reproduce experimental results in Section 4.5: Extra Bonus: Parallel Thinking as a Mid-Training Exploration Strategy for RL Training. |
|
|
|
|