Bingo / README.md
hanbing0's picture
Update README.md
246aec5 verified

Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning

📄 Paper 💻 Code


Introduction

Bingo is a reinforcement learning (RL) framework designed to improve the efficiency of reasoning in large language models.
It introduces two key mechanisms:

  • Significance-aware length reward: Gradually reduces only insignificant tokens while preserving essential reasoning steps.
  • Dynamic length reward: Encourages detailed reasoning for hard problems in early training, then decays to promote concise outputs.

This approach achieves a favorable balance between accuracy and efficiency, outperforming vanilla rewards and prior length-based reward baselines.


Checkpoints

The released checkpoints are trained from DeepSeek-R1-Distill-Qwen-1.5 and target reasoning-intensive tasks:

  • Bingo-A 🟢 Accuracy-preferred checkpoint, selected at peak validation accuracy.
  • Bingo-E ⚡ Efficiency-preferred checkpoint, selected when response length stabilizes.

Checkpoints correspond to the folders r1_1.5b_Bingo_A and r1_1.5b_Bingo_E.


License: MIT


Citation

If you use these models, please cite:

@article{liu2025bingo,
    title   = {Bingo: Boosting Efficient Reasoning of LLMs via Dynamic and Significance-based Reinforcement Learning},
    author  = {Liu, Hanbing and Cao, Lang and Ren, Yuanyi and Zhou, Mengyu and Dong, Haoyu and Ma, Xiaojun and Han, Shi and Zhang, Dongmei},
    journal = {arXiv preprint arXiv:2506.08125},
    year    = {2025}
}