-
Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling
Paper • 2601.02346 • Published • 27 -
unsloth/alpaca-cleaned
Viewer • Updated • 51.8k • 9.83k • 8 -
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 50 -
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Paper • 2507.07955 • Published • 27
Collections
Discover the best community collections!
Collections including paper arxiv:2507.07955
-
Energy-Based Transformers are Scalable Learners and Thinkers
Paper • 2507.02092 • Published • 69 -
MOSPA: Human Motion Generation Driven by Spatial Audio
Paper • 2507.11949 • Published • 25 -
Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations
Paper • 2507.09751 • Published • 2 -
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Paper • 2507.07982 • Published • 34
-
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
Causal Diffusion Transformers for Generative Modeling
Paper • 2412.12095 • Published • 23 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 69
-
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 50 -
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Paper • 2507.07955 • Published • 27 -
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 82 -
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Paper • 2508.02193 • Published • 138
-
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization
Paper • 2503.16874 • Published • 45 -
System Prompt Optimization with Meta-Learning
Paper • 2505.09666 • Published • 71 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
Paper • 2505.23754 • Published • 15
-
Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling
Paper • 2601.02346 • Published • 27 -
unsloth/alpaca-cleaned
Viewer • Updated • 51.8k • 9.83k • 8 -
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 50 -
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Paper • 2507.07955 • Published • 27
-
Hierarchical Reasoning Model
Paper • 2506.21734 • Published • 50 -
Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Paper • 2507.07955 • Published • 27 -
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Paper • 2505.02567 • Published • 82 -
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference
Paper • 2508.02193 • Published • 138
-
Energy-Based Transformers are Scalable Learners and Thinkers
Paper • 2507.02092 • Published • 69 -
MOSPA: Human Motion Generation Driven by Spatial Audio
Paper • 2507.11949 • Published • 25 -
Sound and Complete Neuro-symbolic Reasoning with LLM-Grounded Interpretations
Paper • 2507.09751 • Published • 2 -
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
Paper • 2507.07982 • Published • 34
-
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization
Paper • 2503.16874 • Published • 45 -
System Prompt Optimization with Meta-Learning
Paper • 2505.09666 • Published • 71 -
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
Paper • 2505.23380 • Published • 22 -
DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning
Paper • 2505.23754 • Published • 15
-
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper • 2412.09871 • Published • 108 -
Causal Diffusion Transformers for Generative Modeling
Paper • 2412.12095 • Published • 23 -
Tensor Product Attention Is All You Need
Paper • 2501.06425 • Published • 90 -
TransMLA: Multi-head Latent Attention Is All You Need
Paper • 2502.07864 • Published • 69