BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline Paper • 2408.15079 • Published Aug 27, 2024 • 56
Sommelier: Scalable Open Multi-turn Audio Pre-processing for Full-duplex Speech Language Models Paper • 2603.25750 • Published 26 days ago • 36
Running on CPU Upgrade Featured 103 Cohere Multilingual ASR 🎙 103 Transcribe audio clips to text in many languages
Running Featured 193 Voxtral TTS Demo ⚡ 193 Generate realistic speech from text with custom or preset voices
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning Paper • 2603.23483 • Published 22 days ago • 62
Running on CPU Upgrade 220 The Synthetic Data Playbook: Generating Trillions of the Finest Tokens 📝 220 Explore synthetic data experiments on a virtual bookshelf