Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning
📄 Paper 💻 Code

Introduction

Bingo is a reinforcement learning (RL) framework designed to improve the efficiency of reasoning in large language models.
It introduces two key mechanisms:

Significance-aware length reward: Gradually reduces only insignificant tokens while preserving essential reasoning steps.
Dynamic length reward: Encourages detailed reasoning for hard problems in early training, then decays to promote concise outputs.

This approach achieves a favorable balance between accuracy and efficiency, outperforming vanilla rewards and prior length-based reward baselines.

Checkpoints

The released checkpoints are trained from DeepSeek-R1-Distill-Qwen-1.5 and target reasoning-intensive tasks:

Bingo-A 🟢 Accuracy-preferred checkpoint, selected at peak validation accuracy.
Bingo-E ⚡ Efficiency-preferred checkpoint, selected when response length stabilizes.

Checkpoints correspond to the folders r1_1.5b_Bingo_A and r1_1.5b_Bingo_E.

License: MIT

Citation

If you use these models, please cite:

@article{liu2025bingo,
    title   = {Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning},
    author  = {Liu, Hanbing and Cao, Lang and Ren, Yuanyi and Zhou, Mengyu and Dong, Haoyu and Ma, Xiaojun and Han, Shi and Zhang, Dongmei},
    journal = {arXiv preprint arXiv:2506.08125},
    year    = {2025}
}

Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning 📄 Paper 💻 Code

Introduction

Checkpoints

License: MIT

Citation

Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning
📄 Paper 💻 Code