tyh382596868 's Collections daily paper
updated
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
Paper
• 2512.02835
• Published • 10
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Paper
• 2512.05044
• Published • 17
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper
• 2512.05591
• Published • 17
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
Paper
• 2512.05343
• Published • 25
World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty
Paper
• 2512.05927
• Published • 12
Voxify3D: Pixel Art Meets Volumetric Rendering
Paper
• 2512.07834
• Published • 45
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing
Paper
• 2512.06065
• Published • 29
Vector Quantization using Gaussian Variational Autoencoder
Paper
• 2512.06609
• Published • 1
Relational Visual Similarity
Paper
• 2512.07833
• Published • 25
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Paper
• 2512.08765
• Published • 134
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Paper
• 2512.07802
• Published • 46
ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models
Paper
• 2512.07843
• Published • 22
TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models
Paper
• 2512.08153
• Published • 8
SAM-Body4D: Training-Free 4D Human Body Mesh Recovery from Videos
Paper
• 2512.08406
• Published • 3
MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos
Paper
• 2512.10881
• Published • 30
Evaluating Gemini Robotics Policies in a Veo World Simulator
Paper
• 2512.10675
• Published • 20
Qwen3-VL Technical Report
Paper
• 2511.21631
• Published • 161
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Paper
• 2512.02556
• Published • 265
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper
• 2511.18538
• Published • 304
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing
Paper
• 2512.02589
• Published • 73
Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression
Paper
• 2512.05081
• Published • 33
RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics
Paper
• 2512.13660
• Published • 37
MMGR: Multi-Modal Generative Reasoning
Paper
• 2512.14691
• Published • 121
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives
Paper
• 2512.14699
• Published • 28
Paper
• 2512.13961
• Published • 32
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed
Paper
• 2512.14067
• Published • 16
The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text
Paper
• 2512.16924
• Published • 27
Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers
Paper
• 2512.16615
• Published • 5
JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
Paper
• 2512.16649
• Published • 27
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
Paper
• 2512.16561
• Published • 20
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence
Paper
• 2512.16793
• Published • 76
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
Paper
• 2512.16093
• Published • 97
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models
Paper
• 2512.20557
• Published • 51
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Paper
• 2512.20848
• Published • 42
NVIDIA Nemotron 3: Efficient and Open Intelligence
Paper
• 2512.20856
• Published • 43
4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation
Paper
• 2512.17012
• Published • 48
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
Paper
• 2601.02204
• Published • 63
DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer
Paper
• 2601.01425
• Published • 53
VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation
Paper
• 2601.02256
• Published • 33
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
Paper
• 2512.20578
• Published • 86
ASTRA: Automated Synthesis of agentic Trajectories and Reinforcement Arenas
Paper
• 2601.21558
• Published • 60