Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling Paper • 2111.14819 • Published Nov 29, 2021
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published 8 days ago • 178
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training Paper • 2603.12255 • Published Mar 12 • 91
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction Paper • 2508.11987 • Published Aug 16, 2025 • 73
Noise-robust Speech Separation with Fast Generative Correction Paper • 2406.07461 • Published Jun 11, 2024 • 1
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline Paper • 2505.19314 • Published May 25, 2025 • 5
CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech Paper • 2506.02863 • Published Jun 3, 2025 • 8
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis Paper • 2409.07556 • Published Sep 11, 2024 • 2
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer Paper • 2409.08425 • Published Sep 12, 2024 • 10
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published Sep 17, 2024 • 18