papers
updated
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Paper
• 2503.14734
• Published • 7
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost
Whole-Body Teleoperation
Paper
• 2401.02117
• Published • 33
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient
Robotics
Paper
• 2506.01844
• Published • 158
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal
Document Understanding
Paper
• 2506.16035
• Published • 89
Deep Researcher with Test-Time Diffusion
Paper
• 2507.16075
• Published • 68
The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane
Algorithm
Paper
• 2507.18553
• Published • 41
MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI
Agents
Paper
• 2507.19478
• Published • 33
CLEAR: Error Analysis via LLM-as-a-Judge Made Easy
Paper
• 2507.18392
• Published • 20
PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving
Paper
• 2507.17596
• Published • 7
Specification Self-Correction: Mitigating In-Context Reward Hacking
Through Test-Time Refinement
Paper
• 2507.18742
• Published • 6
Chat with AI: The Surprising Turn of Real-time Video Communication from
Human to AI
Paper
• 2507.10510
• Published • 5
GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning
Paper
• 2507.19457
• Published • 30
Frontier AI Risk Management Framework in Practice: A Risk Analysis
Technical Report
Paper
• 2507.16534
• Published • 9
A Survey of Context Engineering for Large Language Models
Paper
• 2507.13334
• Published • 263
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable
Reinforcement Learning
Paper
• 2507.01006
• Published • 254
Group Sequence Policy Optimization
Paper
• 2507.18071
• Published • 320
Scaling RL to Long Videos
Paper
• 2507.07966
• Published • 162
MemOS: A Memory OS for AI System
Paper
• 2507.03724
• Published • 166
Kwai Keye-VL Technical Report
Paper
• 2507.01949
• Published • 132
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper
• 2507.15846
• Published • 135
Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
Paper
• 2507.16784
• Published • 123
T-LoRA: Single Image Diffusion Model Customization Without Overfitting
Paper
• 2507.05964
• Published • 121
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via
Context-Aware Multi-Stage Policy Optimization
Paper
• 2507.14683
• Published • 136
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive
Memory
Paper
• 2410.10813
• Published • 16
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive
Programming?
Paper
• 2506.11928
• Published • 25
Defeating Prompt Injections by Design
Paper
• 2503.18813
• Published • 25
Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents
Paper
• 2505.22954
• Published • 15
Questioning Representational Optimism in Deep Learning: The Fractured
Entangled Representation Hypothesis
Paper
• 2505.11581
• Published • 3
The AI Scientist: Towards Fully Automated Open-Ended Scientific
Discovery
Paper
• 2408.06292
• Published • 128
Evaluating Large Language Models Trained on Code
Paper
• 2107.03374
• Published • 10
Self-Refine: Iterative Refinement with Self-Feedback
Paper
• 2303.17651
• Published • 2
Gorilla: Large Language Model Connected with Massive APIs
Paper
• 2305.15334
• Published • 6
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace
Paper
• 2303.17580
• Published • 15
Communicative Agents for Software Development
Paper
• 2307.07924
• Published • 6
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Framework
Paper
• 2308.08155
• Published • 11
The Illusion of Diminishing Returns: Measuring Long Horizon Execution in
LLMs
Paper
• 2509.09677
• Published • 37
In-the-Flow Agentic System Optimization for Effective Planning and Tool
Use
Paper
• 2510.05592
• Published • 110
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
• 2505.03335
• Published • 191
Inference-Time Scaling for Generalist Reward Modeling
Paper
• 2504.02495
• Published • 58
BAP v2: An Enhanced Task Framework for Instruction Following in
Minecraft Dialogues
Paper
• 2501.10836
• Published • 1
Executable Code Actions Elicit Better LLM Agents
Paper
• 2402.01030
• Published • 192
DynaSaur: Large Language Agents Beyond Predefined Actions
Paper
• 2411.01747
• Published • 37
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
Empowers Large Language Models to Serve as Intelligent Agents
Paper
• 2401.00812
• Published • 12
Agent Data Protocol: Unifying Datasets for Diverse, Effective
Fine-tuning of LLM Agents
Paper
• 2510.24702
• Published • 31
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM
Paper
• 2509.18058
• Published • 12
Speculative Safety-Aware Decoding
Paper
• 2508.17739
• Published
Latent Fusion Jailbreak: Blending Harmful and Harmless Representations
to Elicit Unsafe LLM Outputs
Paper
• 2508.10029
• Published
Context Misleads LLMs: The Role of Context Filtering in Maintaining Safe
Alignment of LLMs
Paper
• 2508.10031
• Published
Poison Once, Refuse Forever: Weaponizing Alignment for Injecting Bias in
LLMs
Paper
• 2508.20333
• Published
Mitigating Jailbreaks with Intent-Aware LLMs
Paper
• 2508.12072
• Published
D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language
Models
Paper
• 2509.17938
• Published • 4
A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness
Paper
• 2509.14297
• Published
Less is More: Recursive Reasoning with Tiny Networks
Paper
• 2510.04871
• Published • 513
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on
Self-invoking Code Generation
Paper
• 2412.21199
• Published • 13
Solving Inequality Proofs with Large Language Models
Paper
• 2506.07927
• Published • 20
ReForm: Reflective Autoformalization with Prospective Bounded Sequence
Optimization
Paper
• 2510.24592
• Published • 17
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data
Processing to Every Language
Paper
• 2506.20920
• Published • 78
GAIA: a benchmark for General AI Assistants
Paper
• 2311.12983
• Published • 247
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial
Asset Operations and Maintenance
Paper
• 2506.03828
• Published • 20
MMGR: Multi-Modal Generative Reasoning
Paper
• 2512.14691
• Published • 121
Next-Embedding Prediction Makes Strong Vision Learners
Paper
• 2512.16922
• Published • 89
mHC: Manifold-Constrained Hyper-Connections
Paper
• 2512.24880
• Published • 322
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
• 2502.02737
• Published • 258
Helios: Real Real-Time Long Video Generation Model
Paper
• 2603.04379
• Published • 186
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation
Paper
• 2604.11804
• Published • 69
ATANT: An Evaluation Framework for AI Continuity
Paper
• 2604.06710
• Published • 1
IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs
Paper
• 2604.10539
• Published • 2
SHARE: Social-Humanities AI for Research and Education
Paper
• 2604.11152
• Published • 1
How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models
Paper
• 2604.04385
• Published • 1
SPASM: Stable Persona-driven Agent Simulation for Multi-turn Dialogue Generation
Paper
• 2604.09212
• Published • 3
Counting to Four is still a Chore for VLMs
Paper
• 2604.10039
• Published • 2
ADD for Multi-Bit Image Watermarking
Paper
• 2604.11491
• Published • 3
Continuous Adversarial Flow Models
Paper
• 2604.11521
• Published • 8
Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation
Paper
• 2604.11290
• Published • 2
CocoaBench: Evaluating Unified Digital Agents in the Wild
Paper
• 2604.11201
• Published • 33
CodeTracer: Towards Traceable Agent States
Paper
• 2604.11641
• Published • 38
Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models
Paper
• 2604.10949
• Published • 38
Zero-shot World Models Are Developmentally Efficient Learners
Paper
• 2604.10333
• Published • 7
Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models
Paper
• 2604.02340
• Published • 7
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks
Paper
• 2604.11778
• Published • 8
SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding
Paper
• 2604.09557
• Published • 10
Efficient RL Training for LLMs with Experience Replay
Paper
• 2604.08706
• Published • 17
From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models
Paper
• 2604.09459
• Published • 12
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator
Paper
• 2604.08121
• Published • 42
Strips as Tokens: Artist Mesh Generation with Native UV Segmentation
Paper
• 2604.09132
• Published • 50
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation
Paper
• 2604.10098
• Published • 74
Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization
Paper
• 2604.11259
• Published • 12
Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks
Paper
• 2604.11753
• Published • 14
TRACE: Capability-Targeted Agentic Training
Paper
• 2604.05336
• Published • 13
Panoptic Pairwise Distortion Graph
Paper
• 2604.11004
• Published • 2
Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory
Paper
• 2604.11544
• Published • 3
TAIHRI: Task-Aware 3D Human Keypoints Localization for Close-Range Human-Robot Interaction
Paper
• 2604.08921
• Published • 2
SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?
Paper
• 2604.10718
• Published • 4
DiningBench: A Hierarchical Multi-view Benchmark for Perception and Reasoning in the Dietary Domain
Paper
• 2604.10425
• Published • 3
BMdataset: A Musicologically Curated LilyPond Dataset
Paper
• 2604.10628
• Published • 2
Learning Long-term Motion Embeddings for Efficient Kinematics Generation
Paper
• 2604.11737
• Published • 6
Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration
Paper
• 2604.11446
• Published • 4
SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context
Paper
• 2604.11716
• Published • 4
Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind
Paper
• 2604.11666
• Published • 4
Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series
Paper
• 2604.10799
• Published • 6
Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach
Paper
• 2604.11547
• Published • 5
TorchUMM: A Unified Multimodal Model Codebase for Evaluation, Analysis, and Post-training
Paper
• 2604.10784
• Published • 6
SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting
Paper
• 2604.10688
• Published • 25
Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation
Paper
• 2604.10030
• Published • 14
Solving Physics Olympiad via Reinforcement Learning on Physics Simulators
Paper
• 2604.11805
• Published • 16
Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs
Paper
• 2604.10480
• Published • 20
Introspective Diffusion Language Models
Paper
• 2604.11035
• Published • 20
Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music
Paper
• 2604.10905
• Published • 26
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping
Paper
• 2604.11297
• Published • 135
QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation
Paper
• 2604.08570
• Published • 121