Library

JuanRafap 's Collections

Fondation model

RAG

World models

Bim

updated 2 days ago

Upvote

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8, 2025 • 421 • 99
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 39
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30, 2025 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23, 2025 • 88
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30, 2025 • 146
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Paper • 2505.24298 • Published May 30, 2025 • 34
GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

Paper • 2505.20355 • Published May 26, 2025 • 36
Interleaved Reasoning for Large Language Models via Reinforcement Learning

Paper • 2505.19640 • Published May 26, 2025 • 15
FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow

Paper • 2505.17399 • Published May 23, 2025 • 14
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles

Paper • 2505.19914 • Published May 26, 2025 • 46
One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23, 2025 • 62
Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models

Paper • 2505.14810 • Published May 20, 2025 • 62
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

Paper • 2505.16410 • Published May 22, 2025 • 58
JULI: Jailbreak Large Language Models by Self-Introspection

Paper • 2505.11790 • Published May 17, 2025 • 1
Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Paper • 2505.13438 • Published May 19, 2025 • 36
Reward Reasoning Model

Paper • 2505.14674 • Published May 20, 2025 • 37
RM-R1: Reward Modeling as Reasoning

Paper • 2505.02387 • Published May 5, 2025 • 81
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models

Paper • 2505.12504 • Published May 18, 2025 • 24
Neuro-Symbolic Query Compiler

Paper • 2505.11932 • Published May 17, 2025 • 18
Ψ-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models

Paper • 2506.01320 • Published Jun 2, 2025 • 16
Aligning Latent Spaces with Flow Priors

Paper • 2506.05240 • Published Jun 5, 2025 • 27
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

Paper • 2506.00070 • Published May 29, 2025 • 29
A Controllable Examination for Long-Context Language Models

Paper • 2506.02921 • Published Jun 3, 2025 • 34
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs

Paper • 2506.01674 • Published Jun 2, 2025 • 28
CodeContests+: High-Quality Test Case Generation for Competitive Programming

Paper • 2506.05817 • Published Jun 6, 2025 • 9
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion

Paper • 2506.01111 • Published Jun 1, 2025 • 31
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 265
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

Paper • 2506.08012 • Published Jun 9, 2025 • 7
Dreamland: Controllable World Creation with Simulator and Generative Models

Paper • 2506.08006 • Published Jun 9, 2025 • 7
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance

Paper • 2506.06444 • Published Jun 6, 2025 • 73
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

Paper • 2506.07530 • Published Jun 9, 2025 • 20
Solving Inequality Proofs with Large Language Models

Paper • 2506.07927 • Published Jun 9, 2025 • 20
Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Better

Paper • 2506.09040 • Published Jun 10, 2025 • 34
Through the Valley: Path to Effective Long CoT Training for Small Language Models

Paper • 2506.07712 • Published Jun 9, 2025 • 18
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework

Paper • 2506.02454 • Published Jun 3, 2025 • 7
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation

Paper • 2506.04614 • Published Jun 5, 2025 • 19
Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning

Paper • 2506.06205 • Published Jun 6, 2025 • 30
Magistral

Paper • 2506.10910 • Published Jun 12, 2025 • 68
Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team

Paper • 2506.14234 • Published Jun 17, 2025 • 41
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers

Paper • 2506.14702 • Published Jun 17, 2025 • 3
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation

Paper • 2506.06962 • Published Jun 8, 2025 • 28
DoTA-RAG: Dynamic of Thought Aggregation RAG

Paper • 2506.12571 • Published Jun 14, 2025 • 50
syftr: Pareto-Optimal Generative AI

Paper • 2505.20266 • Published May 26, 2025
Scaling Test-time Compute for LLM Agents

Paper • 2506.12928 • Published Jun 15, 2025 • 63
LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning

Paper • 2506.10082 • Published Jun 11, 2025 • 8
General-Reasoner: Advancing LLM Reasoning Across All Domains

Paper • 2505.14652 • Published May 20, 2025 • 24
Optimizing Length Compression in Large Reasoning Models

Paper • 2506.14755 • Published Jun 17, 2025 • 10
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation

Paper • 2506.17202 • Published Jun 20, 2025 • 10
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs

Paper • 2506.18896 • Published Jun 23, 2025 • 29
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding

Paper • 2506.16035 • Published Jun 19, 2025 • 89
Robust Reward Modeling via Causal Rubrics

Paper • 2506.16507 • Published Jun 19, 2025 • 9
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion

Paper • 2507.02813 • Published Jul 3, 2025 • 60
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model

Paper • 2507.01953 • Published Jul 2, 2025 • 18
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective

Paper • 2506.17930 • Published Jun 22, 2025 • 19
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published Jun 30, 2025 • 51
katanemo/Arch-Router-1.5B

Text Generation • 2B • Updated 10 days ago • 5.07k • • 249
Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

Paper • 2507.03336 • Published Jul 4, 2025 • 7
SingLoRA: Low Rank Adaptation Using a Single Matrix

Paper • 2507.05566 • Published Jul 8, 2025 • 116
CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8, 2025 • 45
AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

Paper • 2507.05687 • Published Jul 8, 2025 • 31
Coding Triangle: How Does Large Language Model Understand Code?

Paper • 2507.06138 • Published Jul 8, 2025 • 22
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning

Paper • 2507.05920 • Published Jul 8, 2025 • 12
RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs

Paper • 2507.03253 • Published Jul 4, 2025 • 19
Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs

Paper • 2507.07996 • Published Jul 10, 2025 • 36
Lumos-1: On Autoregressive Video Generation from a Unified Model Perspective

Paper • 2507.08801 • Published Jul 11, 2025 • 32
A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17, 2025 • 263
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization

Paper • 2507.15061 • Published Jul 20, 2025 • 60
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning

Paper • 2507.12841 • Published Jul 17, 2025 • 42
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning

Paper • 2507.16784 • Published Jul 22, 2025 • 123
MUR: Momentum Uncertainty guided Reasoning for Large Language Models

Paper • 2507.14958 • Published Jul 20, 2025 • 47
Does More Inference-Time Compute Really Help Robustness?

Paper • 2507.15974 • Published Jul 21, 2025 • 7
RefCritic: Training Long Chain-of-Thought Critic Models with Refinement Feedback

Paper • 2507.15024 • Published Jul 20, 2025 • 14
ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting

Paper • 2507.15454 • Published Jul 21, 2025 • 7
Promptomatix: An Automatic Prompt Optimization Framework for Large Language Models

Paper • 2507.14241 • Published Jul 17, 2025 • 18
TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation

Paper • 2507.18537 • Published Jul 24, 2025 • 18
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos

Paper • 2507.15597 • Published Jul 21, 2025 • 34
A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning

Paper • 2507.14295 • Published Jul 18, 2025 • 14
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction

Paper • 2507.15852 • Published Jul 21, 2025 • 38
FLEXITOKENS: Flexible Tokenization for Evolving Language Models

Paper • 2507.12720 • Published Jul 17, 2025 • 10
RiemannLoRA: A Unified Riemannian Framework for Ambiguity-Free LoRA Optimization

Paper • 2507.12142 • Published Jul 16, 2025 • 36
Replacing thinking with tool usage enables reasoning in small language models

Paper • 2507.05065 • Published Jul 7, 2025 • 16
Lizard: An Efficient Linearization Framework for Large Language Models

Paper • 2507.09025 • Published Jul 11, 2025 • 19
MemOS: A Memory OS for AI System

Paper • 2507.03724 • Published Jul 4, 2025 • 166
Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 161
Deep Researcher with Test-Time Diffusion

Paper • 2507.16075 • Published Jul 21, 2025 • 68
SmallThinker: A Family of Efficient Large Language Models Natively Trained for Local Deployment

Paper • 2507.20984 • Published Jul 28, 2025 • 58
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Paper • 2507.19478 • Published Jul 25, 2025 • 33
Geometric-Mean Policy Optimization

Paper • 2507.20673 • Published Jul 28, 2025 • 32
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities

Paper • 2507.19766 • Published Jul 26, 2025 • 15
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning

Paper • 2507.22607 • Published Jul 30, 2025 • 47
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models

Paper • 2508.00819 • Published Aug 1, 2025 • 63
Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following

Paper • 2508.02150 • Published Aug 4, 2025 • 37
Cognitive Kernel-Pro: A Framework for Deep Research Agents and Agent Foundation Models Training

Paper • 2508.00414 • Published Aug 1, 2025 • 94
On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective

Paper • 2507.23632 • Published Jul 31, 2025 • 6
Seed-Prover: Deep and Broad Reasoning for Automated Theorem Proving

Paper • 2507.23726 • Published Jul 31, 2025 • 115
SitEmb-v1.5: Improved Context-Aware Dense Retrieval for Semantic Association and Long Story Comprehension

Paper • 2508.01959 • Published Aug 3, 2025 • 60
Tool-integrated Reinforcement Learning for Repo Deep Search

Paper • 2508.03012 • Published Aug 5, 2025 • 20
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization

Paper • 2508.05731 • Published Aug 7, 2025 • 27
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh

Paper • 2508.01242 • Published Aug 2, 2025 • 11
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11, 2025 • 50
Reinforcement Learning in Vision: A Survey

Paper • 2508.08189 • Published Aug 11, 2025 • 30
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents

Paper • 2508.05954 • Published Aug 8, 2025 • 6
Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments

Paper • 2508.08791 • Published Aug 12, 2025 • 16
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning

Paper • 2508.03501 • Published Aug 5, 2025 • 59
Complex Logical Instruction Generation

Paper • 2508.09125 • Published Aug 12, 2025 • 40
Mol-R1: Towards Explicit Long-CoT Reasoning in Molecule Discovery

Paper • 2508.08401 • Published Aug 11, 2025 • 42
Diffusion LLMs Can Do Faster-Than-AR Inference via Discrete Diffusion Forcing

Paper • 2508.09192 • Published Aug 8, 2025 • 30
Inverse-LLaVA: Eliminating Alignment Pre-training Through Text-to-Vision Mapping

Paper • 2508.12466 • Published Aug 17, 2025 • 8
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study

Paper • 2508.13142 • Published Aug 18, 2025 • 34
VertexRegen: Mesh Generation with Continuous Level of Detail

Paper • 2508.09062 • Published Aug 12, 2025 • 38
XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

Paper • 2508.10395 • Published Aug 14, 2025 • 42
STream3R: Scalable Sequential 3D Reconstruction with Causal Transformer

Paper • 2508.10893 • Published Aug 14, 2025 • 31
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction

Paper • 2508.11987 • Published Aug 16, 2025 • 72
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents

Paper • 2508.13186 • Published Aug 14, 2025 • 20
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published Aug 14, 2025 • 29
UI-Venus Technical Report: Building High-performance UI Agents with RFT

Paper • 2508.10833 • Published Aug 14, 2025 • 45
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models

Paper • 2508.09968 • Published Aug 13, 2025 • 15
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search

Paper • 2508.02091 • Published Aug 4, 2025 • 13
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Paper • 2508.15760 • Published Aug 21, 2025 • 47
Deep Think with Confidence

Paper • 2508.15260 • Published Aug 21, 2025 • 90
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization

Paper • 2508.14460 • Published Aug 20, 2025 • 85
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Paper • 2508.14896 • Published Aug 20, 2025 • 22
PosterGen: Aesthetic-Aware Paper-to-Poster Generation via Multi-Agent LLMs

Paper • 2508.17188 • Published Aug 24, 2025 • 17
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement Learning for General LLM Reasoning

Paper • 2508.16949 • Published Aug 23, 2025 • 24
Visual-CoG: Stage-Aware Reinforcement Learning with Chain of Guidance for Text-to-Image Generation

Paper • 2508.18032 • Published Aug 25, 2025 • 41
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory and Test-Time Compute Scaling

Paper • 2508.16745 • Published Aug 22, 2025 • 29
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning

Paper • 2508.18756 • Published Aug 26, 2025 • 36
Do What? Teaching Vision-Language-Action Models to Reject the Impossible

Paper • 2508.16292 • Published Aug 22, 2025 • 9
MeshSplat: Generalizable Sparse-View Surface Reconstruction via Gaussian Splatting

Paper • 2508.17811 • Published Aug 25, 2025 • 7
FastMesh:Efficient Artistic Mesh Generation via Component Decoupling

Paper • 2508.19188 • Published Aug 26, 2025 • 17
Spacer: Towards Engineered Scientific Inspiration

Paper • 2508.17661 • Published Aug 25, 2025 • 32
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models

Paper • 2508.18773 • Published Aug 26, 2025 • 16
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26, 2025 • 43
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling

Paper • 2508.17445 • Published Aug 24, 2025 • 80
Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning

Paper • 2508.20751 • Published Aug 28, 2025 • 90
Provable Benefits of In-Tool Learning for Large Language Models

Paper • 2508.20755 • Published Aug 28, 2025 • 11
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models

Paper • 2508.21365 • Published Aug 29, 2025 • 29
Efficient Code Embeddings from Code Generation Models

Paper • 2508.21290 • Published Aug 29, 2025 • 20
CLIPSym: Delving into Symmetry Detection with CLIP

Paper • 2508.14197 • Published Aug 19, 2025 • 8
Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR

Paper • 2509.02522 • Published Sep 2, 2025 • 25
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

Paper • 2509.02479 • Published Sep 2, 2025 • 84
Universal Deep Research: Bring Your Own Model and Strategy

Paper • 2509.00244 • Published Aug 29, 2025 • 14
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Paper • 2509.03405 • Published Sep 3, 2025 • 24
Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30, 2025 • 72
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?

Paper • 2509.04292 • Published Sep 4, 2025 • 58
Towards a Unified View of Large Language Model Post-Training

Paper • 2509.04419 • Published Sep 4, 2025 • 76
How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on τ-bench

Paper • 2508.20931 • Published Aug 28, 2025 • 16
Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

Paper • 2509.03059 • Published Sep 3, 2025 • 25
NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings

Paper • 2509.04011 • Published Sep 4, 2025 • 29
Symbolic Graphics Programming with Large Language Models

Paper • 2509.05208 • Published Sep 5, 2025 • 47
Bootstrapping Task Spaces for Self-Improvement

Paper • 2509.04575 • Published Sep 4, 2025 • 6
Behavioral Fingerprinting of Large Language Models

Paper • 2509.04504 • Published Sep 2, 2025 • 6
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers

Paper • 2509.06493 • Published Sep 8, 2025 • 12
Reinforcement Learning Foundations for Deep Research Systems: A Survey

Paper • 2509.06733 • Published Sep 8, 2025 • 32
Reconstruction Alignment Improves Unified Multimodal Models

Paper • 2509.07295 • Published Sep 8, 2025 • 40
Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9, 2025 • 84
Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models

Paper • 2509.06949 • Published Sep 8, 2025 • 57
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9, 2025 • 105
Towards General Agentic Intelligence via Environment Scaling

Paper • 2509.13311 • Published Sep 16, 2025 • 72
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

Paper • 2509.13305 • Published Sep 16, 2025 • 91
SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation

Paper • 2509.10708 • Published Sep 12, 2025 • 18
HANRAG: Heuristic Accurate Noise-resistant Retrieval-Augmented Generation for Multi-hop Question Answering

Paper • 2509.09713 • Published Sep 8, 2025 • 25
FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18, 2025 • 118
Single-stream Policy Optimization

Paper • 2509.13232 • Published Sep 16, 2025 • 36
World Modeling with Probabilistic Structure Integration

Paper • 2509.09737 • Published Sep 10, 2025 • 14
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning

Paper • 2509.13755 • Published Sep 17, 2025 • 19
C2-Evo: Co-Evolving Multimodal Data and Model for Self-Improving Reasoning

Paper • 2507.16518 • Published Jul 22, 2025 • 2
WorldForge: Unlocking Emergent 3D/4D Generation in Video Diffusion Model via Training-Free Guidance

Paper • 2509.15130 • Published Sep 18, 2025 • 31
Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

Paper • 2509.15194 • Published Sep 18, 2025 • 33
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

Paper • 2509.13761 • Published Sep 17, 2025 • 16
A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning

Paper • 2509.15937 • Published Sep 19, 2025 • 20
BaseReward: A Strong Baseline for Multimodal Reward Model

Paper • 2509.16127 • Published Sep 19, 2025 • 21
MultiEdit: Advancing Instruction-based Image Editing on Diverse and Challenging Tasks

Paper • 2509.14638 • Published Sep 18, 2025 • 14
Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents

Paper • 2509.15233 • Published Sep 17, 2025 • 2
MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer

Paper • 2509.16197 • Published Sep 19, 2025 • 58
Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

Paper • 2509.15591 • Published Sep 19, 2025 • 45
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent

Paper • 2509.15566 • Published Sep 19, 2025 • 14
Mano Report

Paper • 2509.17336 • Published Sep 22, 2025 • 10
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback

Paper • 2501.10799 • Published Jan 18, 2025 • 15
Table as Thought: Exploring Structured Thoughts in LLM Reasoning

Paper • 2501.02152 • Published Jan 4, 2025
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning

Paper • 2412.09078 • Published Dec 12, 2024
TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge Internalization with Self-Reflection

Paper • 2412.08024 • Published Dec 11, 2024 • 1
LLM2: Let Large Language Models Harness System 2 Reasoning

Paper • 2412.20372 • Published Dec 29, 2024
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective

Paper • 2501.11110 • Published Jan 19, 2025 • 4
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning

Paper • 2412.15797 • Published Dec 20, 2024 • 18
RAG-Star: Enhancing Deliberative Reasoning with Retrieval Augmented Verification and Refinement

Paper • 2412.12881 • Published Dec 17, 2024 • 2
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models

Paper • 2509.17627 • Published Sep 22, 2025 • 66
Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation

Paper • 2509.18824 • Published Sep 23, 2025 • 23
Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory

Paper • 2509.14662 • Published Sep 18, 2025 • 13
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models

Paper • 2509.19803 • Published Sep 24, 2025 • 122
Tree Search for LLM Agent Reinforcement Learning

Paper • 2509.21240 • Published Sep 25, 2025 • 92
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

Paper • 2509.24006 • Published Sep 28, 2025 • 119
Fine-tuning Done Right in Model Editing

Paper • 2509.22072 • Published Sep 26, 2025 • 28
No Prompt Left Behind: Exploiting Zero-Variance Prompts in LLM Reinforcement Learning via Entropy-Guided Advantage Shaping

Paper • 2509.21880 • Published Sep 26, 2025 • 53
LucidFlux: Caption-Free Universal Image Restoration via a Large-Scale Diffusion Transformer

Paper • 2509.22414 • Published Sep 26, 2025 • 22
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

Paper • 2509.22601 • Published Sep 26, 2025 • 30
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

Paper • 2509.22576 • Published Sep 26, 2025 • 137
Variational Reasoning for Language Models

Paper • 2509.22637 • Published Sep 26, 2025 • 69
AutoIntent: AutoML for Text Classification

Paper • 2509.21138 • Published Sep 25, 2025 • 37
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30, 2025 • 55
Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Paper • 2509.26628 • Published Sep 30, 2025 • 17
Sequential Diffusion Language Models

Paper • 2509.24007 • Published Sep 28, 2025 • 47
ReviewScore: Misinformed Peer Review Detection with Large Language Models

Paper • 2509.21679 • Published Sep 25, 2025 • 64
ReviewRL: Towards Automated Scientific Review with RL

Paper • 2508.10308 • Published Aug 14, 2025 • 1
ReportBench: Evaluating Deep Research Agents via Academic Survey Tasks

Paper • 2508.15804 • Published Aug 14, 2025 • 15
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published Sep 29, 2025 • 148
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

Paper • 2509.25849 • Published Sep 30, 2025 • 48
BroRL: Scaling Reinforcement Learning via Broadened Exploration

Paper • 2510.01180 • Published Oct 1, 2025 • 20
GEM: A Gym for Agentic LLMs

Paper • 2510.01051 • Published Oct 1, 2025 • 91
Interactive Training: Feedback-Driven Neural Network Optimization

Paper • 2510.02297 • Published Oct 2, 2025 • 43
More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models

Paper • 2509.25848 • Published Sep 30, 2025 • 81
CLUE: Non-parametric Verification from Experience via Hidden-State Clustering

Paper • 2510.01591 • Published Oct 2, 2025 • 28
LongCodeZip: Compress Long Context for Code Language Models

Paper • 2510.00446 • Published Oct 1, 2025 • 108
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation

Paper • 2510.00515 • Published Oct 1, 2025 • 42
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models

Paper • 2510.03561 • Published Oct 3, 2025 • 25
Large Language Models as Optimizers

Paper • 2309.03409 • Published Sep 7, 2023 • 79
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

Paper • 2309.08532 • Published Sep 15, 2023 • 53
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback

Paper • 2307.14936 • Published Jul 27, 2023 • 42
Factuality Matters: When Image Generation and Editing Meet Structured Visuals

Paper • 2510.05091 • Published Oct 6, 2025 • 20
Reinforce-Ada: An Adaptive Sampling Framework for Reinforce-Style LLM Training

Paper • 2510.04996 • Published Oct 6, 2025 • 16
Paper2Video: Automatic Video Generation from Scientific Papers

Paper • 2510.05096 • Published Oct 6, 2025 • 120
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs

Paper • 2510.05069 • Published Oct 6, 2025 • 13
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information

Paper • 2510.03632 • Published Oct 4, 2025 • 42
Large Reasoning Models Learn Better Alignment from Flawed Thinking

Paper • 2510.00938 • Published Oct 1, 2025 • 60
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 513
Multi-Agent Tool-Integrated Policy Optimization

Paper • 2510.04678 • Published Oct 6, 2025 • 31
Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 276
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Paper • 2510.05034 • Published Oct 6, 2025 • 51
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3, 2025 • 76
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13, 2025 • 182
PhysToolBench: Benchmarking Physical Tool Understanding for MLLMs

Paper • 2510.09507 • Published Oct 10, 2025 • 11
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published Oct 6, 2025 • 131
Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models

Paper • 2510.08492 • Published Oct 9, 2025 • 10
Dyna-Mind: Learning to Simulate from Experience for Better AI Agents

Paper • 2510.09577 • Published Oct 10, 2025 • 8
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9, 2025 • 39
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

Paper • 2510.09201 • Published Oct 10, 2025 • 50
Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13, 2025 • 170
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

Paper • 2510.13515 • Published Oct 15, 2025 • 12
Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training

Paper • 2510.12586 • Published Oct 14, 2025 • 115
Understanding DeepResearch via Reports

Paper • 2510.07861 • Published Oct 9, 2025 • 7
RAG-Anything: All-in-One RAG Framework

Paper • 2510.12323 • Published Oct 14, 2025 • 73
The Art of Scaling Reinforcement Learning Compute for LLMs

Paper • 2510.13786 • Published Oct 15, 2025 • 33
Glyph: Scaling Context Windows via Visual-Text Compression

Paper • 2510.17800 • Published Oct 20, 2025 • 69
LoongRL:Reinforcement Learning for Advanced Reasoning over Long Contexts

Paper • 2510.19363 • Published Oct 22, 2025 • 63
Unified Reinforcement and Imitation Learning for Vision-Language Models

Paper • 2510.19307 • Published Oct 22, 2025 • 32
Attention Is All You Need for KV Cache in Diffusion LLMs

Paper • 2510.14973 • Published Oct 16, 2025 • 42
Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn LLM Agents

Paper • 2510.14967 • Published Oct 16, 2025 • 34
Video Reasoning without Training

Paper • 2510.17045 • Published Oct 19, 2025 • 8
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

Paper • 2510.19779 • Published Oct 22, 2025 • 62
Loopholing Discrete Diffusion: Deterministic Bypass of the Sampling Wall

Paper • 2510.19304 • Published Oct 22, 2025 • 24
Every Question Has Its Own Value: Reinforcement Learning with Explicit Human Values

Paper • 2510.20187 • Published Oct 23, 2025 • 19
ReCode: Unify Plan and Action for Universal Granularity Control

Paper • 2510.23564 • Published Oct 27, 2025 • 123
Reasoning with Sampling: Your Base Model is Smarter Than You Think

Paper • 2510.14901 • Published Oct 16, 2025 • 48
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning

Paper • 2510.23473 • Published Oct 27, 2025 • 86
World Simulation with Video Foundation Models for Physical AI

Paper • 2511.00062 • Published Oct 28, 2025 • 46
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows

Paper • 2510.24411 • Published Oct 28, 2025 • 72
The End of Manual Decoding: Towards Truly End-to-End Language Models

Paper • 2510.26697 • Published Oct 30, 2025 • 119
The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms

Paper • 2511.04217 • Published Nov 6, 2025 • 17
Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 132
Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published Oct 29, 2025 • 229
DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

Paper • 2511.06307 • Published Nov 9, 2025 • 53
Black-Box On-Policy Distillation of Large Language Models

Paper • 2511.10643 • Published Nov 13, 2025 • 52
DoPE: Denoising Rotary Position Embedding

Paper • 2511.09146 • Published Nov 12, 2025 • 98
Orion: A Unified Visual Agent for Multimodal Perception, Advanced Visual Reasoning and Execution

Paper • 2511.14210 • Published Nov 18, 2025 • 21
SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models

Paper • 2511.15605 • Published Nov 19, 2025 • 25
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

Paper • 2511.16664 • Published Nov 20, 2025 • 29
TiDAR: Think in Diffusion, Talk in Autoregression

Paper • 2511.08923 • Published Nov 12, 2025 • 128
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning

Paper • 2511.06805 • Published Nov 10, 2025 • 13
The Path Not Taken: RLVR Provably Learns Off the Principals

Paper • 2511.08567 • Published Nov 11, 2025 • 35
Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning

Paper • 2510.25992 • Published Oct 29, 2025 • 48
FARMER: Flow AutoRegressive Transformer over Pixels

Paper • 2510.23588 • Published Oct 27, 2025 • 59
Parallel Loop Transformer for Efficient Test-Time Computation Scaling

Paper • 2510.24824 • Published Oct 28, 2025 • 17
LLM-guided Hierarchical Retrieval

Paper • 2510.13217 • Published Oct 15, 2025 • 21
DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning

Paper • 2510.15110 • Published Oct 16, 2025 • 18
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Paper • 2510.20579 • Published Oct 23, 2025 • 56
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms

Paper • 2511.17592 • Published Nov 17, 2025 • 121
Virtual Width Networks

Paper • 2511.11238 • Published Nov 14, 2025 • 39
Flow Map Distillation Without Data

Paper • 2511.19428 • Published Nov 24, 2025 • 6
Monet: Reasoning in Latent Visual Space Beyond Images and Language

Paper • 2511.21395 • Published Nov 26, 2025 • 19
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs

Paper • 2511.19773 • Published Nov 24, 2025 • 10
SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space

Paper • 2511.20102 • Published Nov 25, 2025 • 28
Architecture Decoupling Is Not All You Need For Unified Multimodal Model

Paper • 2511.22663 • Published Nov 27, 2025 • 29
SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs

Paper • 2512.00722 • Published Nov 30, 2025 • 16
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published Dec 1, 2025 • 106
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models

Paper • 2512.02014 • Published Dec 1, 2025 • 74
OneThinker: All-in-one Reasoning Model for Image and Video

Paper • 2512.03043 • Published Dec 2, 2025 • 34
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Paper • 2512.05591 • Published Dec 5, 2025 • 17
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows

Paper • 2512.05150 • Published Dec 3, 2025 • 76
UltraImage: Rethinking Resolution Extrapolation in Image Diffusion Transformers

Paper • 2512.04504 • Published Dec 4, 2025 • 18
On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

Paper • 2512.04220 • Published Dec 3, 2025 • 16
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling

Paper • 2512.03000 • Published Dec 2, 2025 • 37
PromptBridge: Cross-Model Prompt Transfer for Large Language Models

Paper • 2512.01420 • Published Dec 1, 2025 • 11
PretrainZero: Reinforcement Active Pretraining

Paper • 2512.03442 • Published Dec 3, 2025 • 49
SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

Paper • 2512.02807 • Published Dec 2, 2025 • 9
Semantics Lead the Way: Harmonizing Semantic and Texture Modeling with Asynchronous Latent Diffusion

Paper • 2512.04926 • Published Dec 4, 2025 • 42
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning

Paper • 2512.07461 • Published Dec 8, 2025 • 79
Distribution Matching Variational AutoEncoder

Paper • 2512.07778 • Published Dec 8, 2025 • 29
TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models

Paper • 2512.08153 • Published Dec 9, 2025 • 8
InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models

Paper • 2512.08829 • Published Dec 9, 2025 • 21
Self-Improving VLM Judges Without Human Annotations

Paper • 2512.05145 • Published Dec 2, 2025 • 20
Rethinking Training Dynamics in Scale-wise Autoregressive Generation

Paper • 2512.06421 • Published Dec 6, 2025 • 7
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

Paper • 2512.07802 • Published Dec 8, 2025 • 46
unsloth/Devstral-2-123B-Instruct-2512-GGUF

125B • Updated Dec 15, 2025 • 6.05k • 51
Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforcement Learning

Paper • 2512.10534 • Published Dec 11, 2025 • 32
BEAVER: An Efficient Deterministic LLM Verifier

Paper • 2512.05439 • Published Dec 5, 2025 • 36
Vector Quantization using Gaussian Variational Autoencoder

Paper • 2512.06609 • Published Dec 7, 2025 • 1
Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

Paper • 2512.07525 • Published Dec 8, 2025 • 60
VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction

Paper • 2511.23386 • Published Nov 28, 2025 • 16
Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving

Paper • 2512.10739 • Published Dec 11, 2025 • 47
OmniPSD: Layered PSD Generation with Diffusion Transformer

Paper • 2512.09247 • Published Dec 10, 2025 • 50
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Paper • 2512.13586 • Published Dec 15, 2025 • 93
KlingAvatar 2.0 Technical Report

Paper • 2512.13313 • Published Dec 15, 2025 • 44
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

Paper • 2512.14067 • Published Dec 16, 2025 • 16
Towards Scalable Pre-training of Visual Tokenizers for Generation

Paper • 2512.13687 • Published Dec 15, 2025 • 106
HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices

Paper • 2512.14052 • Published Dec 16, 2025 • 42
Universal Reasoning Model

Paper • 2512.14693 • Published Dec 16, 2025 • 44
Image Diffusion Preview with Consistency Solver

Paper • 2512.13592 • Published Dec 15, 2025 • 8
End-to-End Training for Autoregressive Video Diffusion via Self-Resampling

Paper • 2512.15702 • Published Dec 17, 2025 • 16
STeCa: Step-level Trajectory Calibration for LLM Agent Learning

Paper • 2502.14276 • Published Feb 20, 2025 • 1
Step-GUI Technical Report

Paper • 2512.15431 • Published Dec 17, 2025 • 133
Differences That Matter: Auditing Models for Capability Gap Discovery and Rectification

Paper • 2512.16921 • Published Dec 18, 2025 • 8
Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection

Paper • 2512.16905 • Published Dec 18, 2025 • 32
DiffusionBrowser: Interactive Diffusion Previews via Multi-Branch Decoders

Paper • 2512.13690 • Published Dec 15, 2025 • 3
Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Paper • 2512.13607 • Published Dec 15, 2025 • 38
Adaptation of Agentic AI

Paper • 2512.16301 • Published Dec 18, 2025 • 108
QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management

Paper • 2512.12967 • Published Dec 15, 2025 • 111
CoSPlan: Corrective Sequential Planning via Scene Graph Incremental Updates

Paper • 2512.10342 • Published Dec 11, 2025 • 1
UAGLNet: Uncertainty-Aggregated Global-Local Fusion Network with Cooperative CNN-Transformer for Building Extraction

Paper • 2512.12941 • Published Dec 15, 2025 • 2
TraPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning

Paper • 2512.13106 • Published Dec 15, 2025 • 4
Comparative Analysis of LLM Abliteration Methods: A Cross-Architecture Evaluation

Paper • 2512.13655 • Published Dec 15, 2025 • 3
Janus: Disaggregating Attention and Experts for Scalable MoE Inference

Paper • 2512.13525 • Published Dec 15, 2025 • 6
RePo: Language Models with Context Re-Positioning

Paper • 2512.14391 • Published Dec 16, 2025 • 12
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse

Paper • 2512.14531 • Published Dec 16, 2025 • 15
ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement

Paper • 2512.13303 • Published Dec 15, 2025 • 17
Differentiable Evolutionary Reinforcement Learning

Paper • 2512.13399 • Published Dec 15, 2025 • 22
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives

Paper • 2512.14699 • Published Dec 16, 2025 • 28
RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics

Paper • 2512.13660 • Published Dec 15, 2025 • 37
MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 121
Hybrid Attribution Priors for Explainable and Robust Model Training

Paper • 2512.14719 • Published Dec 9, 2025 • 3
WAY: Estimation of Vessel Destination in Worldwide AIS Trajectory

Paper • 2512.13190 • Published Dec 15, 2025 • 8
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Paper • 2512.19693 • Published Dec 22, 2025 • 67
Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published Dec 18, 2025 • 42
google/gemma-scope-2

Updated Dec 19, 2025 • 80
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

Paper • 2512.19673 • Published Dec 22, 2025 • 66
QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmented Generation

Paper • 2512.19134 • Published Dec 22, 2025 • 32
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published Dec 18, 2025 • 222
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows

Paper • 2512.16969 • Published Dec 18, 2025 • 120
LongVideoAgent: Multi-Agent Reasoning with Long Videos

Paper • 2512.20618 • Published Dec 23, 2025 • 56
Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs

Paper • 2512.17206 • Published Dec 19, 2025 • 20
Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs

Paper • 2512.17008 • Published Dec 18, 2025 • 11
InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training

Paper • 2510.15859 • Published Oct 17, 2025 • 13
Fast and Accurate Causal Parallel Decoding using Jacobi Forcing

Paper • 2512.14681 • Published Dec 16, 2025 • 42
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Paper • 2512.17351 • Published Dec 19, 2025 • 28
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning

Paper • 2512.15687 • Published Dec 17, 2025 • 22
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

Paper • 2512.13874 • Published Dec 15, 2025 • 17
SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations

Paper • 2512.14080 • Published Dec 16, 2025 • 9
Understanding and Improving Hyperbolic Deep Reinforcement Learning

Paper • 2512.14202 • Published Dec 16, 2025 • 6
SCOPE: Prompt Evolution for Enhancing Agent Effectiveness

Paper • 2512.15374 • Published Dec 17, 2025 • 6
VOYAGER: A Training Free Approach for Generating Diverse Datasets using LLMs

Paper • 2512.12072 • Published Dec 12, 2025 • 18
DEER: Draft with Diffusion, Verify with Autoregressive Models

Paper • 2512.15176 • Published Dec 17, 2025 • 45
TabReX : Tabular Referenceless eXplainable Evaluation

Paper • 2512.15907 • Published Dec 17, 2025 • 2
Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers

Paper • 2512.16615 • Published Dec 18, 2025 • 5
AdaTooler-V: Adaptive Tool-Use for Images and Videos

Paper • 2512.16918 • Published Dec 18, 2025 • 14
REGLUE Your Latents with Global and Local Semantics for Entangled Diffusion

Paper • 2512.16636 • Published Dec 18, 2025 • 26
Kling-Omni Technical Report

Paper • 2512.16776 • Published Dec 18, 2025 • 173
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

Paper • 2512.20605 • Published Dec 23, 2025 • 62
Multi-hop Reasoning via Early Knowledge Alignment

Paper • 2512.20144 • Published Dec 23, 2025 • 7
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models

Paper • 2512.19995 • Published Dec 23, 2025 • 16
TimeBill: Time-Budgeted Inference for Large Language Models

Paper • 2512.21859 • Published Dec 26, 2025 • 25
Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

Paper • 2512.22615 • Published Dec 27, 2025 • 50
Training AI Co-Scientists Using Rubric Rewards

Paper • 2512.23707 • Published Dec 29, 2025 • 21
Masking Teacher and Reinforcing Student for Distilling Vision-Language Models

Paper • 2512.22238 • Published Dec 23, 2025 • 30
LLM Swiss Round: Aggregating Multi-Benchmark Performance via Competitive Swiss-System Dynamics

Paper • 2512.21010 • Published Dec 24, 2025 • 4
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Paper • 2512.23447 • Published Dec 29, 2025 • 99
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 321
Evaluating Parameter Efficient Methods for RLVR

Paper • 2512.23165 • Published Dec 29, 2025 • 28
SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent

Paper • 2511.16108 • Published Nov 20, 2025
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models

Paper • 2601.03425 • Published Jan 6 • 17
MMFormalizer: Multimodal Autoformalization in the Wild

Paper • 2601.03017 • Published Jan 6 • 106
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs

Paper • 2601.03559 • Published Jan 7 • 14
Token-Level LLM Collaboration via FusionRoute

Paper • 2601.05106 • Published Jan 8 • 40
VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Paper • 2601.05175 • Published Jan 8 • 36
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking

Paper • 2601.06487 • Published Jan 10 • 54
Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers

Paper • 2601.04890 • Published Jan 8 • 44
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head

Paper • 2601.07832 • Published Jan 12 • 52
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 230
RelayLLM: Efficient Reasoning via Collaborative Decoding

Paper • 2601.05167 • Published Jan 8 • 31
LaViT: Aligning Latent Visual Thoughts for Multi-modal Reasoning

Paper • 2601.10129 • Published Jan 15 • 13
Language of Thought Shapes Output Diversity in Large Language Models

Paper • 2601.11227 • Published Jan 16 • 9
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Paper • 2601.11004 • Published Jan 16 • 30
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Paper • 2601.15892 • Published Jan 22 • 53
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models

Paper • 2601.15165 • Published Jan 21 • 73
Learning to Discover at Test Time

Paper • 2601.16175 • Published Jan 22 • 44
ReGuLaR: Variational Latent Reasoning Guided by Rendered Chain-of-Thought

Paper • 2601.23184 • Published Jan 30 • 36
AudioSAE: Towards Understanding of Audio-Processing Models with Sparse AutoEncoders

Paper • 2602.05027 • Published Feb 4 • 63
Think Longer to Explore Deeper: Learn to Explore In-Context via Length-Incentivized Reinforcement Learning

Paper • 2602.11748 • Published Feb 12 • 35
Voxtral Realtime

Paper • 2602.11298 • Published Feb 11 • 24
DFlash: Block Diffusion for Flash Speculative Decoding

Paper • 2602.06036 • Published Feb 5 • 46
InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions

Paper • 2602.06035 • Published Feb 5 • 23
Experiential Reinforcement Learning

Paper • 2602.13949 • Published Feb 15 • 72
REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents

Paper • 2602.14234 • Published Feb 15 • 26
Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality

Paper • 2602.14080 • Published Feb 15 • 21
On Surprising Effectiveness of Masking Updates in Adaptive Optimizers

Paper • 2602.15322 • Published Feb 17 • 10
OPT-R: Exploring the Role of Explanations in Finetuning and Prompting for Reasoning Skills of Large Language Models

Paper • 2305.12001 • Published May 19, 2023 • 1
SELF: Language-Driven Self-Evolution for Large Language Model

Paper • 2310.00533 • Published Oct 1, 2023 • 2
DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation

Paper • 2601.22904 • Published Jan 30 • 15
EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots

Paper • 2602.18071 • Published Feb 20 • 22
VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

Paper • 2601.02256 • Published Jan 5 • 33
GARDO: Reinforcing Diffusion Models without Reward Hacking

Paper • 2512.24138 • Published Dec 30, 2025 • 30
Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling

Paper • 2601.02346 • Published Jan 5 • 27
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

Paper • 2512.20578 • Published Dec 23, 2025 • 86
Recursive Language Models

Paper • 2512.24601 • Published Dec 31, 2025 • 94
COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs

Paper • 2601.01836 • Published Jan 5 • 10
Toward Stable Semi-Supervised Remote Sensing Segmentation via Co-Guidance and Co-Fusion

Paper • 2512.23035 • Published Dec 28, 2025 • 5
Confidence Estimation for LLMs in Multi-turn Interactions

Paper • 2601.02179 • Published Jan 5 • 17
SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

Paper • 2601.01426 • Published Jan 4 • 24
OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment

Paper • 2601.01576 • Published Jan 4 • 19
Project Ariadne: A Structural Causal Framework for Auditing Faithfulness in LLM Agents

Paper • 2601.02314 • Published Jan 5 • 2
M-ErasureBench: A Comprehensive Multimodal Evaluation Benchmark for Concept Erasure in Diffusion Models

Paper • 2512.22877 • Published Dec 28, 2025 • 2
Nested Learning: The Illusion of Deep Learning Architectures

Paper • 2512.24695 • Published Dec 31, 2025 • 45
Deep Delta Learning

Paper • 2601.00417 • Published Jan 1 • 34
The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving

Paper • 2601.00747 • Published Jan 2 • 20
InfoSynth: Information-Guided Benchmark Synthesis for LLMs

Paper • 2601.00575 • Published Jan 2 • 3
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Paper • 2512.24617 • Published Dec 31, 2025 • 66
A unified framework for detecting point and collective anomalies in operating system logs via collaborative transformers

Paper • 2512.23380 • Published Dec 29, 2025 • 45
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems

Paper • 2512.24385 • Published Dec 30, 2025 • 8
Scaling Open-Ended Reasoning to Predict the Future

Paper • 2512.25070 • Published Dec 31, 2025 • 20
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process

Paper • 2512.23988 • Published Dec 30, 2025 • 19
Detecting Anomalies in Machine Learning Infrastructure via Hardware Telemetry

Paper • 2510.26008 • Published Oct 29, 2025
CodeLSI: Leveraging Foundation Models for Automated Code Generation with Low-Rank Optimization and Domain-Specific Instruction Tuning

Paper • 2509.14373 • Published Sep 17, 2025
Big data analysis and distributed deep learning for next-generation intrusion detection system optimization

Paper • 2209.13961 • Published Sep 28, 2022
Qwen/DeepPlanning

Viewer • Updated Mar 3 • 2.14k • 631 • 194
ManCAR: Manifold-Constrained Latent Reasoning with Adaptive Test-Time Computation for Sequential Recommendation

Paper • 2602.20093 • Published Feb 23 • 29
tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction

Paper • 2602.20160 • Published Feb 23 • 10
MTraining: Distributed Dynamic Sparse Attention for Efficient Ultra-Long Context Training

Paper • 2510.18830 • Published Oct 21, 2025
Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Paper • 2603.04257 • Published Mar 4 • 19
AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

Paper • 2603.01236 • Published Mar 1 • 11
V_1: Unifying Generation and Self-Verification for Parallel Reasoners

Paper • 2603.04304 • Published Mar 4 • 14
MUSE: A Run-Centric Platform for Multimodal Unified Safety Evaluation of Large Language Models

Paper • 2603.02482 • Published Mar 3 • 3
BeamPERL: Parameter-Efficient RL with Verifiable Rewards Specializes Compact LLMs for Structured Beam Mechanics Reasoning

Paper • 2603.04124 • Published Mar 4 • 1
Specificity-aware reinforcement learning for fine-grained open-world classification

Paper • 2603.03197 • Published Mar 3 • 16
FireRedTeam/FireRed-OCR

Image-to-Text • 2B • Updated Mar 4 • 25.5k • 152
Dynamic Chunking Diffusion Transformer

Paper • 2603.06351 • Published Mar 6 • 15
π-StepNFT: Wider Space Needs Finer Steps in Online RL for Flow-based VLAs

Paper • 2603.02083 • Published Mar 2 • 9
Lost in Stories: Consistency Bugs in Long Story Generation by LLMs

Paper • 2603.05890 • Published Mar 6 • 92
K-Search: LLM Kernel Generation via Co-Evolving Intrinsic World Model

Paper • 2602.19128 • Published Feb 22 • 7
MM-Zero: Self-Evolving Multi-Model Vision Language Models From Zero Data

Paper • 2603.09206 • Published Mar 10 • 53
On-Policy Self-Distillation for Reasoning Compression

Paper • 2603.05433 • Published Mar 5 • 8
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs

Paper • 2603.09095 • Published Mar 10 • 29
CARE-Edit: Condition-Aware Routing of Experts for Contextual Image Editing

Paper • 2603.08589 • Published Mar 9 • 38
Believe Your Model: Distribution-Guided Confidence Calibration

Paper • 2603.03872 • Published Mar 4 • 40
Decoupling Reasoning and Confidence: Resurrecting Calibration in Reinforcement Learning from Verifiable Rewards

Paper • 2603.09117 • Published Mar 10 • 10
OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published Mar 10 • 150
In-Context Reinforcement Learning for Tool Use in Large Language Models

Paper • 2603.08068 • Published Mar 9 • 43
EVATok: Adaptive Length Video Tokenization for Efficient Visual Autoregressive Generation

Paper • 2603.12267 • Published about 1 month ago • 13
WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing

Paper • 2603.11593 • Published Mar 12 • 25
Internalizing Meta-Experience into Memory for Guided Reinforcement Learning in Large Language Models

Paper • 2602.10224 • Published Feb 10 • 19
Multimodal OCR: Parse Anything from Documents

Paper • 2603.13032 • Published about 1 month ago • 43
Guiding a Diffusion Transformer with the Internal Dynamics of Itself

Paper • 2512.24176 • Published Dec 30, 2025 • 8
PETS: A Principled Framework Towards Optimal Trajectory Allocation for Efficient Test-Time Self-Consistency

Paper • 2602.16745 • Published Feb 18 • 8
Recursive Language Models Meet Uncertainty: The Surprising Effectiveness of Self-Reflective Program Search for Long Context

Paper • 2603.15653 • Published Mar 7 • 12
Complementary Reinforcement Learning

Paper • 2603.17621 • Published 25 days ago • 37
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models

Paper • 2603.15557 • Published 27 days ago • 29
The Art of Efficient Reasoning: Data, Reward, and Optimization

Paper • 2602.20945 • Published Feb 24 • 7
TAPE: Tool-Guided Adaptive Planning and Constrained Execution in Language Model Agents

Paper • 2602.19633 • Published Feb 23 • 8
Matryoshka Gaussian Splatting

Paper • 2603.19234 • Published 24 days ago • 11
Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

Paper • 2603.19227 • Published 24 days ago • 42
VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Paper • 2603.15030 • Published 27 days ago • 21
Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs

Paper • 2603.09906 • Published Mar 10 • 75
The Reasoning Trap -- Logical Reasoning as a Mechanistic Pathway to Situational Awareness

Paper • 2603.09200 • Published Mar 10 • 5
Multi-Head Low-Rank Attention

Paper • 2603.02188 • Published Mar 2 • 3
Mario: Multimodal Graph Reasoning with Large Language Models

Paper • 2603.05181 • Published Mar 5 • 9
How Far Can Unsupervised RLVR Scale LLM Training?

Paper • 2603.08660 • Published Mar 9 • 57
Heterogeneous Agent Collaborative Reinforcement Learning

Paper • 2603.02604 • Published Mar 3 • 193
Alignment Makes Language Models Normative, Not Descriptive

Paper • 2603.17218 • Published 26 days ago • 46
Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

Paper • 2603.20155 • Published 23 days ago • 10
mSFT: Addressing Dataset Mixtures Overfiting Heterogeneously in Multi-task SFT

Paper • 2603.21606 • Published 20 days ago • 38
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs

Paper • 2603.18004 • Published 25 days ago • 13
Reasoning as Compression: Unifying Budget Forcing via the Conditional Information Bottleneck

Paper • 2603.08462 • Published Mar 9 • 22
On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

Paper • 2603.22117 • Published 20 days ago • 29
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis

Paper • 2603.20278 • Published 26 days ago • 94
ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model

Paper • 2603.22281 • Published 20 days ago • 17
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Paper • 2603.23483 • Published 19 days ago • 61
Generalized Discrete Diffusion from Snapshots

Paper • 2603.21342 • Published 21 days ago • 11
From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents

Paper • 2603.22386 • Published 20 days ago • 54
Rethinking Token-Level Policy Optimization for Multimodal Chain-of-Thought

Paper • 2603.22847 • Published 19 days ago • 25
RealChart2Code: Advancing Chart-to-Code Generation with Real Data and Multi-Task Evaluation

Paper • 2603.25804 • Published 17 days ago • 29
Natural-Language Agent Harnesses

Paper • 2603.25723 • Published 17 days ago • 24
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference

Paper • 2603.25730 • Published 17 days ago • 51
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills

Paper • 2603.25158 • Published 17 days ago • 49
On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models

Paper • 2603.27481 • Published 15 days ago • 35
Make Geometry Matter for Spatial Reasoning

Paper • 2603.26639 • Published 16 days ago • 32
TAPS: Task Aware Proposal Distributions for Speculative Sampling

Paper • 2603.27027 • Published 16 days ago • 141
Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design

Paper • 2603.28376 • Published 13 days ago • 22
Embarrassingly Simple Self-Distillation Improves Code Generation

Paper • 2604.01193 • Published 11 days ago • 34
CORAL: Towards Autonomous Multi-Agent Evolution for Open-Ended Discovery

Paper • 2604.01658 • Published 11 days ago • 52
Swift-SVD: Theoretical Optimality Meets Practical Efficiency in Low-Rank LLM Compression

Paper • 2604.01609 • Published 11 days ago • 11
Communicating about Space: Language-Mediated Spatial Integration Across Partial Views

Paper • 2603.27183 • Published 15 days ago • 20
AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks

Paper • 2604.01487 • Published 12 days ago • 9
Test-Time Scaling Makes Overtraining Compute-Optimal

Paper • 2604.01411 • Published 12 days ago • 26
Self-Distilled RLVR

Paper • 2604.03128 • Published 10 days ago • 154
Paper Espresso: From Paper Overload to Research Insight

Paper • 2604.04562 • Published 7 days ago • 9
Can LLMs Learn to Reason Robustly under Noisy Supervision?

Paper • 2604.03993 • Published 8 days ago • 39
Learning to Learn-at-Test-Time: Language Agents with Learnable Adaptation Policies

Paper • 2604.00830 • Published 11 days ago • 14
PLUME: Latent Reasoning Based Universal Multimodal Embedding

Paper • 2604.02073 • Published 11 days ago • 15
Self-Execution Simulation Improves Coding Models

Paper • 2604.03253 • Published Mar 11 • 30
SkillX: Automatically Constructing Skill Knowledge Bases for Agents

Paper • 2604.04804 • Published 7 days ago • 28
TriAttention: Efficient Long Reasoning with Trigonometric KV Compression

Paper • 2604.04921 • Published 7 days ago • 101
MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale

Paper • 2604.04771 • Published 7 days ago • 114
How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings

Paper • 2604.04323 • Published 7 days ago • 37
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

Paper • 2604.08377 • Published 4 days ago • 256
Experience Transfer for Multimodal LLM Agents in Minecraft Game

Paper • 2604.05533 • Published 6 days ago • 12
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning

Paper • 2604.05404 • Published 6 days ago • 39
Paper Circle: An Open-source Multi-agent Research Discovery and Analysis Framework

Paper • 2604.06170 • Published 6 days ago • 24

Upvote

Collection guide
Browse collections