DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 448
This model is a quantitative version of DeepSeek-R1-Distill-Llama-8B.
| Model | GSM8K 5-shot |
|---|---|
| DeepSeek-R1-Distill-Llama-8B | - |
| DeepSeek-R1-Distill-Llama-8B-GPTQ_W8A8_G128 | - |
This code repository and the model weights are licensed under the MIT License. DeepSeek-R1 series support commercial use, allow for any modifications and derivative works, including, but not limited to, distillation for training other LLMs. Please note that:
@misc{deepseekai2025deepseekr1incentivizingreasoningcapability,
title={DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning},
author={DeepSeek-AI},
year={2025},
eprint={2501.12948},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.12948},
}
If you have any questions, please raise an issue or contact us at Github(Lornatang) or liuchangyu1111@gmail.com.
Base model
deepseek-ai/DeepSeek-R1-Distill-Llama-8B