25 24 7

Ho Kei Cheng PRO

hkchengrex

https://hkchengrex.com/

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 days ago

WildDet3D: Scaling Promptable 3D Detection in the Wild

updated a Space 10 days ago

hkchengrex/MMAudio

new activity 10 days ago

hkchengrex/MMAudio:Fix type annotation

View all activity

Organizations

upvoted a paper 4 days ago

WildDet3D: Scaling Promptable 3D Detection in the Wild

Paper • 2604.08626 • Published 9 days ago • 237

upvoted a paper 18 days ago

ImagenWorld: Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks

Paper • 2603.27862 • Published 19 days ago • 30

upvoted a paper 21 days ago

UniGRPO: Unified Policy Optimization for Reasoning-Driven Visual Generation

Paper • 2603.23500 • Published 24 days ago • 35

upvoted a paper 24 days ago

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published 26 days ago • 123

upvoted a paper about 2 months ago

VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction

Paper • 2602.13294 • Published Feb 9 • 13

upvoted a paper 3 months ago

VideoMaMa: Mask-Guided Video Matting via Generative Prior

Paper • 2601.14255 • Published Jan 20 • 15

upvoted 2 papers 4 months ago

SAM Audio: Segment Anything in Audio

Paper • 2512.18099 • Published Dec 19, 2025 • 24

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

Paper • 2512.17012 • Published Dec 18, 2025 • 48

upvoted 2 papers 5 months ago

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published Nov 20, 2025 • 134

SAM 3D: 3Dfy Anything in Images

Paper • 2511.16624 • Published Nov 20, 2025 • 114

upvoted a paper 6 months ago

Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13, 2025 • 170

upvoted a paper 10 months ago

The Diffusion Duality

Paper • 2506.10892 • Published Jun 12, 2025 • 37

upvoted 2 papers 12 months ago

Perception Encoder: The best visual embeddings are not at the output of the network

Paper • 2504.13181 • Published Apr 17, 2025 • 36

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17, 2025 • 51

upvoted 6 papers about 1 year ago

The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation

Paper • 2503.10636 • Published Mar 13, 2025 • 3

TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding

Paper • 2502.19400 • Published Feb 26, 2025 • 47

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20, 2025 • 164

Ho Kei Cheng PRO

AI & ML interests

Recent Activity

Organizations

hkchengrex's activity