ThinkTwice: Jointly Optimizing Large Language Models for Reasoning and Self-Refinement Paper • 2604.01591 • Published 13 days ago • 40
Embarrassingly Simple Self-Distillation Improves Code Generation Paper • 2604.01193 • Published 13 days ago • 37
Hybrid Architectures for Language Models: Systematic Analysis and Design Insights Paper • 2510.04800 • Published Oct 6, 2025 • 37
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data Paper • 2510.03264 • Published Sep 26, 2025 • 25
andreasskyscanner/llama-31-hhrlhf-squad-rlhf-policy-model Text Generation • 1B • Updated Jul 1, 2025 • 1
andreasskyscanner/llama-31-hhrlhf-squad-rlhf-policy-model Text Generation • 1B • Updated Jul 1, 2025 • 1