Art-Qwen3-4B-Instruct-2507

This is the CoT (Chain-of-Thought) efficient version of the Qwen3-4B-Instruct-2507 model, developed as part of the research presented in the paper The Art of Efficient Reasoning: Data, Reward, and Optimization.

Model Description

Art-Qwen3-4B is optimized to produce short yet accurate reasoning trajectories. By using reward shaping and Reinforcement Learning (RL), the training process follows a two-stage paradigm: length adaptation and reasoning refinement. This approach aims to provide the benefits of scaled reasoning while minimizing the heavy computational overhead typically associated with long CoT outputs.

The model was trained on the DeepScaleR-Easy dataset.

Citation

@inproceedings{wu2026art,
  title={The Art of Efficient Reasoning: Data, Reward, and Optimization},
  author={Taiqiang Wu and Zenan Xu and Bo Zhou and Ngai Wong},
  year={2026},
  url={https://arxiv.org/pdf/2602.20945}
}
Downloads last month
153
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for taki555/Qwen3-4B-Instruct-2507-Art

Finetuned
(1537)
this model

Dataset used to train taki555/Qwen3-4B-Instruct-2507-Art

Collection including taki555/Qwen3-4B-Instruct-2507-Art

Paper for taki555/Qwen3-4B-Instruct-2507-Art