DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion Paper β’ 2105.13871 β’ Published May 28, 2021
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio Paper β’ 2106.06909 β’ Published Jun 13, 2021 β’ 1
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis Paper β’ 2204.09934 β’ Published Apr 21, 2022
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity Paper β’ 2302.04023 β’ Published Feb 8, 2023
Survey of Hallucination in Natural Language Generation Paper β’ 2202.03629 β’ Published Feb 8, 2022
SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts Paper β’ 2105.03036 β’ Published May 7, 2021 β’ 2
DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis Paper β’ 2309.12792 β’ Published Sep 22, 2023 β’ 1
MM-LLMs: Recent Advances in MultiModal Large Language Models Paper β’ 2401.13601 β’ Published Jan 24, 2024 β’ 47