EvoCoT-R1-Qwen-1.5B

Model Overview

EvoCoT-R1-Qwen-1.5B is part of the EvoCoT framework, a self-evolving curriculum learning method for LLM reasoning based on two-stage Chain-of-Thought (CoT) optimization. The model is trained to improve reasoning ability by generating and refining its own step-by-step reasoning paths.

Model Details

EvoCoT-R1-Qwen-1.5B: A 1.5 billion parameter model.

Usage

To use this model, simply load it from Hugging Face’s transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "gtxygyzb/EvoCoT-R1-Qwen-1.5B"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Further Information

For more detailed information, please refer to:

Our GitHub repository
The research paper

Citation

If you use EvoCoT-R1-Qwen-1.5B in your research, please cite our work:

@misc{liu2025evocotovercomingexplorationbottleneck,
      title={EvoCoT: Overcoming the Exploration Bottleneck in Reinforcement Learning}, 
      author={Huanyu Liu and Jia Li and Chang Yu and Taozhi Chen and Yihong Dong and Lecheng Wang and XiaoLong Hu and Ge Li},
      year={2025},
      eprint={2508.07809},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.07809}, 
}

Downloads last month: 4

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gtxygyzb/EvoCoT-R1-Qwen-1.5B

Quantizations

1 model

Paper for gtxygyzb/EvoCoT-R1-Qwen-1.5B

EvoCoT: Overcoming the Exploration Bottleneck in Reinforcement Learning

Paper • 2508.07809 • Published Aug 11, 2025