Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2602.01734

dLLM: Simple Diffusion Language Modeling

Paper • 2602.22661 • Published Feb 26 • 152
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Paper • 2603.15594 • Published Mar 16 • 149
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence

Paper • 2603.13398 • Published Mar 11 • 153
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Paper • 2603.06569 • Published Mar 6 • 119

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 322
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process

Paper • 2512.23988 • Published Dec 30, 2025 • 19
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time

Paper • 2512.25075 • Published Dec 31, 2025 • 15
Guiding a Diffusion Transformer with the Internal Dynamics of Itself

Paper • 2512.24176 • Published Dec 30, 2025 • 8

CAME: Confidence-guided Adaptive Memory Efficient Optimization

Paper • 2307.02047 • Published Jul 5, 2023 • 2
Practical Efficiency of Muon for Pretraining

Paper • 2505.02222 • Published May 4, 2025 • 41
AdaMuon: Adaptive Muon Optimizer

Paper • 2507.11005 • Published Jul 15, 2025 • 2
Muon is Scalable for LLM Training

Paper • 2502.16982 • Published Feb 24, 2025 • 12

RedPajama: an Open Dataset for Training Large Language Models

Paper • 2411.12372 • Published Nov 19, 2024 • 58
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

Paper • 2411.10958 • Published Nov 17, 2024 • 57
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2, 2025 • 240
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration

Paper • 2602.01734 • Published Feb 2 • 32

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 107
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 80
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 43
Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 45

MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration

Paper • 2602.01734 • Published Feb 2 • 32
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published Feb 5 • 352

Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers

Paper • 2601.04890 • Published Jan 8 • 44
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration

Paper • 2602.01734 • Published Feb 2 • 32
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers

Paper • 2602.15322 • Published Feb 17 • 10
Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Paper • 2603.09229 • Published Mar 10 • 82

Papers Pertinent or Protuberant

The Cow of Rembrandt - Analyzing Artistic Prompt Interpretation in Text-to-Image Models

Paper • 2507.23313 • Published Jul 31, 2025 • 1
SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering

Paper • 2508.03448 • Published Aug 5, 2025 • 6
C3D-AD: Toward Continual 3D Anomaly Detection via Kernel Attention with Learnable Advisor

Paper • 2508.01311 • Published Aug 2, 2025 • 2
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Model

Paper • 2505.21179 • Published May 27, 2025 • 13

When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Paper • 2402.17193 • Published Feb 27, 2024 • 26
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Paper • 2410.23743 • Published Oct 31, 2024 • 64
Direct Preference Optimization Using Sparse Feature-Level Constraints

Paper • 2411.07618 • Published Nov 12, 2024 • 17
Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9, 2025 • 55

dLLM: Simple Diffusion Language Modeling

Paper • 2602.22661 • Published Feb 26 • 152
OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

Paper • 2603.15594 • Published Mar 16 • 149
Qianfan-OCR: A Unified End-to-End Model for Document Intelligence

Paper • 2603.13398 • Published Mar 11 • 153
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders

Paper • 2603.06569 • Published Mar 6 • 119

MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration

Paper • 2602.01734 • Published Feb 2 • 32
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published Feb 5 • 352

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 322
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process

Paper • 2512.23988 • Published Dec 30, 2025 • 19
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time

Paper • 2512.25075 • Published Dec 31, 2025 • 15
Guiding a Diffusion Transformer with the Internal Dynamics of Itself

Paper • 2512.24176 • Published Dec 30, 2025 • 8

Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers

Paper • 2601.04890 • Published Jan 8 • 44
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration

Paper • 2602.01734 • Published Feb 2 • 32
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers

Paper • 2602.15322 • Published Feb 17 • 10
Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Paper • 2603.09229 • Published Mar 10 • 82

CAME: Confidence-guided Adaptive Memory Efficient Optimization

Paper • 2307.02047 • Published Jul 5, 2023 • 2
Practical Efficiency of Muon for Pretraining

Paper • 2505.02222 • Published May 4, 2025 • 41
AdaMuon: Adaptive Muon Optimizer

Paper • 2507.11005 • Published Jul 15, 2025 • 2
Muon is Scalable for LLM Training

Paper • 2502.16982 • Published Feb 24, 2025 • 12

Papers Pertinent or Protuberant

The Cow of Rembrandt - Analyzing Artistic Prompt Interpretation in Text-to-Image Models

Paper • 2507.23313 • Published Jul 31, 2025 • 1
SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering

Paper • 2508.03448 • Published Aug 5, 2025 • 6
C3D-AD: Toward Continual 3D Anomaly Detection via Kernel Attention with Learnable Advisor

Paper • 2508.01311 • Published Aug 2, 2025 • 2
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Model

Paper • 2505.21179 • Published May 27, 2025 • 13

RedPajama: an Open Dataset for Training Large Language Models

Paper • 2411.12372 • Published Nov 19, 2024 • 58
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

Paper • 2411.10958 • Published Nov 17, 2024 • 57
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens

Paper • 2508.01191 • Published Aug 2, 2025 • 240
MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration

Paper • 2602.01734 • Published Feb 2 • 32

When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method

Paper • 2402.17193 • Published Feb 27, 2024 • 26
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective

Paper • 2410.23743 • Published Oct 31, 2024 • 64
Direct Preference Optimization Using Sparse Feature-Level Constraints

Paper • 2411.07618 • Published Nov 12, 2024 • 17
Transformer^2: Self-adaptive LLMs

Paper • 2501.06252 • Published Jan 9, 2025 • 55

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 107
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Paper • 2310.11511 • Published Oct 17, 2023 • 80
In-Context Learning Creates Task Vectors

Paper • 2310.15916 • Published Oct 24, 2023 • 43
Matryoshka Diffusion Models

Paper • 2310.15111 • Published Oct 23, 2023 • 45

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs