view article Article SyGra: The One-Stop Framework for Building Data for LLMs and SLMs Sep 22, 2025 • 14
view article Article TRL v1.0: Post-Training Library Built to Move with the Field +2 17 days ago • 49
view article Article Multimodal Embedding & Reranker Models with Sentence Transformers 8 days ago • 43
view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention Oct 7, 2024 • 70
view article Article 🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It? Mar 17, 2025 • 355
Reasoning Datasets Collection Datasets with reasoning traces across various domains released by the community. • 15 items • Updated Jun 30, 2025 • 3
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19, 2025 • 188
Code Conversations Collection List of evol instruct based code dataset • 7 items • Updated Apr 30, 2024 • 3
view article Article StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation +7 Apr 29, 2024 • 79