EvoCoT: Overcoming the Exploration Bottleneck in Reinforcement Learning
Paper • 2508.07809 • Published
EvoCoT-R1-Qwen-1.5B is part of the EvoCoT framework, a self-evolving curriculum learning method for LLM reasoning based on two-stage Chain-of-Thought (CoT) optimization. The model is trained to improve reasoning ability by generating and refining its own step-by-step reasoning paths.
To use this model, simply load it from Hugging Face’s transformers library:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "gtxygyzb/EvoCoT-R1-Qwen-1.5B"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
For more detailed information, please refer to:
If you use EvoCoT-R1-Qwen-1.5B in your research, please cite our work:
@misc{liu2025evocotovercomingexplorationbottleneck,
title={EvoCoT: Overcoming the Exploration Bottleneck in Reinforcement Learning},
author={Huanyu Liu and Jia Li and Chang Yu and Taozhi Chen and Yihong Dong and Lecheng Wang and XiaoLong Hu and Ge Li},
year={2025},
eprint={2508.07809},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2508.07809},
}