Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding
Abstract
CoRD is a collaborative multi-teacher decoding framework that synthesizes reasoning trajectories through predictive perplexity scoring and beam search, enabling efficient distillation of large reasoning models with high-quality outputs and generalized performance.
Distilling large reasoning models is essential for making Long-CoT reasoning practical, as full-scale inference remains computationally prohibitive. Existing curation-based approaches select complete reasoning traces post-hoc, overlooking collaboration among heterogeneous teachers and lacking dynamic exploration, which leads to redundant sampling and missed complementary reasoning. We introduce CoRD, a collaborative multi-teacher decoding framework that performs step-wise reasoning synthesis guided by predictive perplexity-based scoring and beam search. This enables heterogeneous LRMs to jointly construct coherent reasoning trajectories while efficiently preserving diverse, high-potential hypotheses. Experiments show that CoRD produces higher-quality reasoning data and achieves near teacher-level student performance with fewer, structured supervision signals, without substantial efficiency overhead. CoRD further generalizes well to out-of-domain and open-ended settings. The dataset and model are available at https://github.com/DISL-Lab/CoRD{https://github.com/DISL-Lab/CoRD}.
Community
This paper is accepted at ACL 2026 (Findings, long). It is related to Long-CoT(chain-of-thought) distillation from LRMs (Large Reasoning Models). If you have any questions, please feel free to contact us.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Confidence-Aware Alignment Makes Reasoning LLMs More Reliable (2026)
- Student-in-the-Loop Chain-of-Thought Distillation via Generation-Time Selection (2026)
- STOP: Structured On-Policy Pruning of Long-Form Reasoning in Low-Data Regimes (2026)
- CoTEvol: Self-Evolving Chain-of-Thoughts for Data Synthesis in Mathematical Reasoning (2026)
- Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories (2026)
- HintMR: Eliciting Stronger Mathematical Reasoning in Small Language Models (2026)
- SOD: Step-wise On-policy Distillation for Small Language Model Agents (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.02290 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper