view article Article From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate +2 Jun 13, 2024 • 62
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs Paper • 2402.12030 • Published Feb 19, 2024 • 3
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization Paper • 2508.07629 • Published Aug 11, 2025 • 43
ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge Paper • 2507.21990 • Published Jul 29, 2025 • 27