-
Why Fine-Tuning Encourages Hallucinations and How to Fix It
Paper • 2604.15574 • Published • 23 -
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
Paper • 2604.24763 • Published • 68 -
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
Paper • 2604.24819 • Published • 86 -
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
Paper • 2604.26752 • Published • 97
Collections
Discover the best community collections!
Collections including paper arxiv:2604.24763
-
CarePilot: A Multi-Agent Framework for Long-Horizon Computer Task Automation in Healthcare
Paper • 2603.24157 • Published • 10 -
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression
Paper • 2604.04921 • Published • 112 -
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
Paper • 2604.24763 • Published • 68
-
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 108 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 155 -
Autoregressive Diffusion Models
Paper • 2110.02037 • Published -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 9
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 177 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 72 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27
-
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Paper • 2503.19325 • Published • 73 -
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Paper • 2503.18931 • Published • 30 -
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper • 2505.18129 • Published • 62 -
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction
Paper • 2507.15852 • Published • 38
-
Why Fine-Tuning Encourages Hallucinations and How to Fix It
Paper • 2604.15574 • Published • 23 -
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
Paper • 2604.24763 • Published • 68 -
Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora
Paper • 2604.24819 • Published • 86 -
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
Paper • 2604.26752 • Published • 97
-
CarePilot: A Multi-Agent Framework for Long-Horizon Computer Task Automation in Healthcare
Paper • 2603.24157 • Published • 10 -
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression
Paper • 2604.04921 • Published • 112 -
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
Paper • 2604.24763 • Published • 68
-
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 177 -
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head
Paper • 2601.07832 • Published • 52 -
Motion Attribution for Video Generation
Paper • 2601.08828 • Published • 72 -
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Paper • 2601.19895 • Published • 27
-
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 108 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 155 -
Autoregressive Diffusion Models
Paper • 2110.02037 • Published -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 9
-
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Paper • 2503.19325 • Published • 73 -
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Paper • 2503.18931 • Published • 30 -
One RL to See Them All: Visual Triple Unified Reinforcement Learning
Paper • 2505.18129 • Published • 62 -
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction
Paper • 2507.15852 • Published • 38