TriAttention: Efficient Long Reasoning with Trigonometric KV Compression Paper • 2604.04921 • Published 9 days ago • 106
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning Paper • 2602.12099 • Published Feb 12 • 61
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion Paper • 2512.19678 • Published Dec 22, 2025 • 31
What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards Paper • 2512.00425 • Published Nov 29, 2025 • 53
Granite Docling Models Collection Models for parsing complex PDFs and structured documents, designed to complement Docling. • 4 items • Updated 13 days ago • 60
view article Article Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers +5 Sep 11, 2025 • 186
Holo1 Collection Vision-Language Action Model for use in Surfer-H web navigation agent • 6 items • Updated Jun 10, 2025 • 49
view article Article Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H Jun 3, 2025 • 71
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM Paper • 2503.04724 • Published Mar 6, 2025 • 72
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback Paper • 2502.15027 • Published Feb 20, 2025 • 7
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published Jan 10, 2025 • 53
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper • 2412.07589 • Published Dec 10, 2024 • 48