-
Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models
Paper • 2507.08128 • Published • 14 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 9 -
Pengi: An Audio Language Model for Audio Tasks
Paper • 2305.11834 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2311.07919
-
Qwen Technical Report
Paper • 2309.16609 • Published • 38 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 9 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 171 -
Qwen2-Audio Technical Report
Paper • 2407.10759 • Published • 64
-
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 9 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 9 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 32
-
ModaVerse: Efficiently Transforming Modalities with LLMs
Paper • 2401.06395 • Published • 3 -
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Paper • 2401.00246 • Published • 13 -
An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition
Paper • 2312.03668 • Published • 1 -
Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Paper • 2311.06753 • Published • 6
-
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 3 -
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Paper • 2403.16973 • Published • 3 -
High Fidelity Neural Audio Compression
Paper • 2210.13438 • Published • 4 -
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 9
-
Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models
Paper • 2507.08128 • Published • 14 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 9 -
Pengi: An Audio Language Model for Audio Tasks
Paper • 2305.11834 • Published • 2
-
Qwen Technical Report
Paper • 2309.16609 • Published • 38 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 9 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 171 -
Qwen2-Audio Technical Report
Paper • 2407.10759 • Published • 64
-
ModaVerse: Efficiently Transforming Modalities with LLMs
Paper • 2401.06395 • Published • 3 -
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Paper • 2401.00246 • Published • 13 -
An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition
Paper • 2312.03668 • Published • 1 -
Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
Paper • 2311.06753 • Published • 6
-
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 9 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 9 -
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Paper • 2404.14700 • Published • 32
-
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Paper • 1712.05884 • Published • 3 -
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Paper • 2403.16973 • Published • 3 -
High Fidelity Neural Audio Compression
Paper • 2210.13438 • Published • 4 -
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Paper • 2404.03204 • Published • 9