-
Diffusion Language Models Know the Answer Before Decoding
Paper • 2508.19982 • Published • 27 -
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
Paper • 2512.13586 • Published • 93 -
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following
Paper • 2601.06431 • Published • 12 -
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
Paper • 2601.09088 • Published • 63
Collections
Discover the best community collections!
Collections including paper arxiv:2510.25741
-
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 229 -
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper • 2410.20672 • Published • 7 -
The Y-Combinator for LLMs: Solving Long-Context Rot with λ-Calculus
Paper • 2603.20105 • Published • 37
-
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper • 2512.24138 • Published • 30 -
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper • 2512.24165 • Published • 52 -
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper • 2512.24617 • Published • 66 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99
-
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 229 -
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
Paper • 2511.23319 • Published • 24 -
Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information
Paper • 2511.22176 • Published • 5 -
FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning
Paper • 2511.22265 • Published • 2
-
Emu3.5: Native Multimodal Models are World Learners
Paper • 2510.26583 • Published • 114 -
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging
Paper • 2510.20479 • Published • 12 -
A Definition of AGI
Paper • 2510.18212 • Published • 36 -
Video-As-Prompt: Unified Semantic Control for Video Generation
Paper • 2510.20888 • Published • 50
-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
-
LFM2 Technical Report
Paper • 2511.23404 • Published • 56 -
Artificial Hippocampus Networks for Efficient Long-Context Modeling
Paper • 2510.07318 • Published • 32 -
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 229 -
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper • 2410.20672 • Published • 7
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 350 -
Intern-S1: A Scientific Multimodal Foundation Model
Paper • 2508.15763 • Published • 273
-
Diffusion Language Models Know the Answer Before Decoding
Paper • 2508.19982 • Published • 27 -
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding
Paper • 2512.13586 • Published • 93 -
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following
Paper • 2601.06431 • Published • 12 -
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning
Paper • 2601.09088 • Published • 63
-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
-
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 229 -
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper • 2410.20672 • Published • 7 -
The Y-Combinator for LLMs: Solving Long-Context Rot with λ-Calculus
Paper • 2603.20105 • Published • 37
-
LFM2 Technical Report
Paper • 2511.23404 • Published • 56 -
Artificial Hippocampus Networks for Efficient Long-Context Modeling
Paper • 2510.07318 • Published • 32 -
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 229 -
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Paper • 2410.20672 • Published • 7
-
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper • 2512.24138 • Published • 30 -
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper • 2512.24165 • Published • 52 -
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper • 2512.24617 • Published • 66 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99
-
Scaling Latent Reasoning via Looped Language Models
Paper • 2510.25741 • Published • 229 -
Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models
Paper • 2511.23319 • Published • 24 -
Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information
Paper • 2511.22176 • Published • 5 -
FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning
Paper • 2511.22265 • Published • 2
-
Emu3.5: Native Multimodal Models are World Learners
Paper • 2510.26583 • Published • 114 -
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging
Paper • 2510.20479 • Published • 12 -
A Definition of AGI
Paper • 2510.18212 • Published • 36 -
Video-As-Prompt: Unified Semantic Control for Video Generation
Paper • 2510.20888 • Published • 50
-
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
Paper • 2509.26507 • Published • 550 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 350 -
Intern-S1: A Scientific Multimodal Foundation Model
Paper • 2508.15763 • Published • 273