microsoft/VibeVoice-ASR
Automatic Speech Recognition โข 9B โข Updated โข 720k โข 1.03k
Chat with multimodal AI using text, audio, images, and video
Generate custom voices from text using natural language prompts
Create a custom voice clone and synthesize speech
Chat with Kimi-VL: respond to text, images, video, PDFs
Generate custom images from a reference photo and text
New Ghibli EasyControl model is now released!!
Image Generation and Image Editing Arena & Leaderboard