Collections
Discover the best community collections!
Collections including paper arxiv:2504.00927
-
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Paper • 2504.00999 • Published • 96 -
Multi-Token Attention
Paper • 2504.00927 • Published • 56 -
Scaling Language-Free Visual Representation Learning
Paper • 2504.01017 • Published • 33
-
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning
Paper • 2410.06373 • Published • 36 -
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Paper • 2504.00999 • Published • 96 -
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models
Paper • 2503.24235 • Published • 55 -
MoCha: Towards Movie-Grade Talking Character Synthesis
Paper • 2503.23307 • Published • 141
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
Multi-Token Attention
Paper • 2504.00927 • Published • 56 -
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Paper • 2504.02542 • Published • 52 -
Hyperagents
Paper • 2603.19461 • Published • 50
-
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 64 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
Paper • 2411.03884 • Published • 28 -
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models
Paper • 2502.00698 • Published • 24
-
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 29 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 25 -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 9
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 60 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64
-
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Paper • 2504.00999 • Published • 96 -
Multi-Token Attention
Paper • 2504.00927 • Published • 56 -
Scaling Language-Free Visual Representation Learning
Paper • 2504.01017 • Published • 33
-
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
Paper • 2410.23743 • Published • 64 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models
Paper • 2411.03884 • Published • 28 -
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models
Paper • 2502.00698 • Published • 24
-
Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning
Paper • 2410.06373 • Published • 36 -
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Paper • 2504.00999 • Published • 96 -
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models
Paper • 2503.24235 • Published • 55 -
MoCha: Towards Movie-Grade Talking Character Synthesis
Paper • 2503.23307 • Published • 141
-
FAN: Fourier Analysis Networks
Paper • 2410.02675 • Published • 29 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
Scalable-Softmax Is Superior for Attention
Paper • 2501.19399 • Published • 25 -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 9
-
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 109 -
Multi-Token Attention
Paper • 2504.00927 • Published • 56 -
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Paper • 2504.02542 • Published • 52 -
Hyperagents
Paper • 2603.19461 • Published • 50
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 60 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 45 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 64