Collections
Discover the best community collections!
Collections including paper arxiv:2501.13106
-
VideoLLaMA3
💬84Frontier Foundation Models for Video Understanding
-
VideoLLaMA3-Image
💬23Frontier Foundation Models for Video Understanding
-
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
Paper • 2501.13106 • Published • 91 -
DAMO-NLP-SG/VideoLLaMA3-7B
Video-Text-to-Text • 8B • Updated • 78.1k • 75
-
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 53 -
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Paper • 2412.12094 • Published • 11 -
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Paper • 2306.07691 • Published • 13 -
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
Paper • 2203.02395 • Published • 1
-
VideoLLaMA3
💬84Frontier Foundation Models for Video Understanding
-
VideoLLaMA3-Image
💬23Frontier Foundation Models for Video Understanding
-
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding
Paper • 2501.13106 • Published • 91 -
DAMO-NLP-SG/VideoLLaMA3-7B
Video-Text-to-Text • 8B • Updated • 78.1k • 75
-
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 53 -
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Paper • 2412.12094 • Published • 11 -
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Paper • 2306.07691 • Published • 13 -
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
Paper • 2203.02395 • Published • 1