VITA-MLLM

community

AI & ML interests

Multimodal LLM

Recent Activity

shenyunhang authored a paper 10 days ago

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

BradyFU authored a paper 10 days ago

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

BradyFU authored a paper 10 days ago

A Survey on Multimodal Large Language Models

View all activity

authored a paper 10 days ago

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Paper • 2306.13394 • Published Jun 23, 2023

authored 2 papers 10 days ago

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

Paper • 2306.13394 • Published Jun 23, 2023

A Survey on Multimodal Large Language Models

Paper • 2306.13549 • Published Jun 23, 2023 • 1

authored 2 papers 10 days ago

CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes

Paper • 2310.09761 • Published Oct 15, 2023 • 1

FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning

Paper • 2212.00465 • Published Dec 1, 2022

authored 2 papers 10 days ago

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Paper • 2408.05211 • Published Aug 9, 2024 • 50

MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs

Paper • 2411.15296 • Published Nov 22, 2024 • 21

authored 4 papers 10 days ago

T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs

Paper • 2411.19951 • Published Nov 29, 2024

Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification

Paper • 2412.00876 • Published Dec 1, 2024

FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression

Paper • 2412.04317 • Published Dec 5, 2024 • 1

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Paper • 2411.00774 • Published Nov 1, 2024

authored 2 papers 10 days ago

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Paper • 2411.00774 • Published Nov 1, 2024

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Paper • 2501.01957 • Published Jan 3, 2025 • 47

authored a paper 10 days ago

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Paper • 2502.05177 • Published Feb 7, 2025 • 2

authored a paper 10 days ago

Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Paper • 2502.05177 • Published Feb 7, 2025 • 2

authored a paper 10 days ago

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Paper • 2505.03739 • Published May 6, 2025 • 10

authored a paper 10 days ago

VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model

Paper • 2505.03739 • Published May 6, 2025 • 10

authored 3 papers 10 days ago

What You Perceive Is What You Conceive: A Cognition-Inspired Framework for Open Vocabulary Image Segmentation

Paper • 2505.19569 • Published May 26, 2025

Solving the Catastrophic Forgetting Problem in Generalized Category Discovery

Paper • 2501.05272 • Published Jan 9, 2025 • 1

Aligning and Prompting Everything All at Once for Universal Visual Perception

Paper • 2312.02153 • Published Dec 4, 2023