VideoVLA: Video Generators Can Be Generalizable Robot Manipulators Paper • 2512.06963 • Published Dec 7, 2025 • 5
Spatia: Video Generation with Updatable Spatial Memory Paper • 2512.15716 • Published Dec 17, 2025 • 35
InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation Paper • 2605.14333 • Published 13 days ago • 34
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond Paper • 1904.11492 • Published Apr 25, 2019
A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars Paper • 2401.04730 • Published Jan 9, 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty Paper • 2401.15077 • Published Jan 26, 2024 • 20
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls Paper • 2402.04253 • Published Feb 6, 2024
Improving Continuous Sign Language Recognition with Cross-Lingual Signs Paper • 2308.10809 • Published Aug 21, 2023
RAIN: Your Language Models Can Align Themselves without Finetuning Paper • 2309.07124 • Published Sep 13, 2023 • 3
Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model Paper • 2203.14940 • Published Mar 28, 2022
AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections Paper • 2309.02186 • Published Sep 5, 2023 • 23
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension Paper • 2403.07874 • Published Mar 12, 2024
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension Paper • 2403.07872 • Published Mar 12, 2024
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees Paper • 2406.16858 • Published Jun 24, 2024 • 1
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% Paper • 2406.11837 • Published Jun 17, 2024