Add top navigation badges
Browse files
README.md
CHANGED
|
@@ -24,6 +24,23 @@ base_model_relation: finetune
|
|
| 24 |
|
| 25 |
# Qwen3-4B-Base-GRPO
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
Qwen3-4B-Base-GRPO is a post-RL checkpoint trained with the **verl** framework.
|
| 28 |
It starts from **Qwen3-4B-Base** and applies GRPO on the **DAPO-Math-17k-Processed** dataset for mathematical reasoning and problem-solving.
|
| 29 |
|
|
@@ -109,4 +126,4 @@ If you use this model, please consider citing the related paper:
|
|
| 109 |
journal={arXiv preprint arXiv:2604.13016},
|
| 110 |
year={2026}
|
| 111 |
}
|
| 112 |
-
```
|
|
|
|
| 24 |
|
| 25 |
# Qwen3-4B-Base-GRPO
|
| 26 |
|
| 27 |
+
<div align="center" style="line-height: 1;">
|
| 28 |
+
<a href="https://arxiv.org/abs/2604.13016" style="margin: 2px;">
|
| 29 |
+
<img alt="Paper" src="https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
| 30 |
+
</a>
|
| 31 |
+
<a href="https://github.com/thunlp/OPD" style="margin: 2px;">
|
| 32 |
+
<img alt="Github" src="https://img.shields.io/badge/OPD-000000?style=for-the-badge&logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
| 33 |
+
</a>
|
| 34 |
+
<a href="https://huggingface.co/papers/2604.13016" style="margin: 2px;">
|
| 35 |
+
<img alt="HF Papers" src="https://img.shields.io/badge/HF--Paper-%23FFD14D?style=for-the-badge&logo=huggingface&logoColor=black" style="display: inline-block; vertical-align: middle;"/>
|
| 36 |
+
</a>
|
| 37 |
+
<a href="https://x.com/HBX_hbx/status/2044464414829777354" style="margin: 2px;">
|
| 38 |
+
<img alt="Twitter" src="https://img.shields.io/badge/Twitter-%23000000.svg?style=for-the-badge&logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
|
| 39 |
+
</a>
|
| 40 |
+
</div>
|
| 41 |
+
|
| 42 |
+
<br>
|
| 43 |
+
|
| 44 |
Qwen3-4B-Base-GRPO is a post-RL checkpoint trained with the **verl** framework.
|
| 45 |
It starts from **Qwen3-4B-Base** and applies GRPO on the **DAPO-Math-17k-Processed** dataset for mathematical reasoning and problem-solving.
|
| 46 |
|
|
|
|
| 126 |
journal={arXiv preprint arXiv:2604.13016},
|
| 127 |
year={2026}
|
| 128 |
}
|
| 129 |
+
```
|