daily paper - a tyh382596868 Collection

tyh382596868 's Collections

daily paper

updated Feb 3

daily paper

ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

Paper • 2512.02835 • Published Dec 2, 2025 • 10
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

Paper • 2512.05044 • Published Dec 4, 2025 • 17
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Paper • 2512.05591 • Published Dec 5, 2025 • 17
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling

Paper • 2512.05343 • Published Dec 5, 2025 • 25
World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty

Paper • 2512.05927 • Published Dec 5, 2025 • 12
Voxify3D: Pixel Art Meets Volumetric Rendering

Paper • 2512.07834 • Published Dec 8, 2025 • 45
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing

Paper • 2512.06065 • Published Dec 5, 2025 • 29
Vector Quantization using Gaussian Variational Autoencoder

Paper • 2512.06609 • Published Dec 7, 2025 • 1
Relational Visual Similarity

Paper • 2512.07833 • Published Dec 8, 2025 • 25
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published Dec 9, 2025 • 134
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory

Paper • 2512.07802 • Published Dec 8, 2025 • 46
ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models

Paper • 2512.07843 • Published Nov 24, 2025 • 22
TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models

Paper • 2512.08153 • Published Dec 9, 2025 • 8
SAM-Body4D: Training-Free 4D Human Body Mesh Recovery from Videos

Paper • 2512.08406 • Published Dec 9, 2025 • 3
MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos

Paper • 2512.10881 • Published Dec 11, 2025 • 30
Evaluating Gemini Robotics Policies in a Veo World Simulator

Paper • 2512.10675 • Published Dec 11, 2025 • 20
Qwen3-VL Technical Report

Paper • 2511.21631 • Published Nov 26, 2025 • 161
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published Dec 2, 2025 • 265
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 304
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

Paper • 2512.02589 • Published Dec 2, 2025 • 73
Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression

Paper • 2512.05081 • Published Dec 4, 2025 • 33
RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics

Paper • 2512.13660 • Published Dec 15, 2025 • 37
MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 121
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives

Paper • 2512.14699 • Published Dec 16, 2025 • 28
Olmo 3

Paper • 2512.13961 • Published Dec 15, 2025 • 32
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed

Paper • 2512.14067 • Published Dec 16, 2025 • 16
The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text

Paper • 2512.16924 • Published Dec 18, 2025 • 27
Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers

Paper • 2512.16615 • Published Dec 18, 2025 • 5
JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

Paper • 2512.16649 • Published Dec 18, 2025 • 27
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

Paper • 2512.16561 • Published Dec 18, 2025 • 20
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

Paper • 2512.16793 • Published Dec 18, 2025 • 76
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times

Paper • 2512.16093 • Published Dec 18, 2025 • 97
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models

Paper • 2512.20557 • Published Dec 23, 2025 • 51
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

Paper • 2512.20848 • Published Dec 23, 2025 • 42
NVIDIA Nemotron 3: Efficient and Open Intelligence

Paper • 2512.20856 • Published Dec 24, 2025 • 43
4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Paper • 2512.17012 • Published Dec 18, 2025 • 48
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation

Paper • 2601.02204 • Published Jan 5 • 63
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer

Paper • 2601.01425 • Published Jan 4 • 53
VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation

Paper • 2601.02256 • Published Jan 5 • 33
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits

Paper • 2512.20578 • Published Dec 23, 2025 • 86
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas

Paper • 2601.21558 • Published Jan 29 • 60