Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2505.14357

Vid2World: Crafting Video Diffusion Models to Interactive World Models

Paper • 2505.14357 • Published May 20, 2025 • 27
thuml/Vid2World-RT1

Updated Nov 10, 2025
thuml/Vid2World-CSGO

Updated Nov 10, 2025
thuml/Vid2World-RECON

Updated Nov 10, 2025

ByteDance/Sa2VA-8B

Image-Text-to-Text • 8B • Updated Sep 8, 2025 • 2.72k • 65
Vid2World: Crafting Video Diffusion Models to Interactive World Models

Paper • 2505.14357 • Published May 20, 2025 • 27

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 30
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2, 2025 • 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10, 2025 • 44
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14, 2025 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14, 2025 • 85

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 98
IamCreateAI/Ruyi-Mini-7B

Image-to-Video • Updated Dec 25, 2024 • 206 • 610
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Paper • 2412.06016 • Published Dec 8, 2024 • 20
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108

Vid2World: Crafting Video Diffusion Models to Interactive World Models

Paper • 2505.14357 • Published May 20, 2025 • 27
thuml/Vid2World-RT1

Updated Nov 10, 2025
thuml/Vid2World-CSGO

Updated Nov 10, 2025
thuml/Vid2World-RECON

Updated Nov 10, 2025

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2, 2025 • 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10, 2025 • 44
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14, 2025 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14, 2025 • 85

ByteDance/Sa2VA-8B

Image-Text-to-Text • 8B • Updated Sep 8, 2025 • 2.72k • 65
Vid2World: Crafting Video Diffusion Models to Interactive World Models

Paper • 2505.14357 • Published May 20, 2025 • 27

GenEx: Generating an Explorable World

Paper • 2412.09624 • Published Dec 12, 2024 • 98
IamCreateAI/Ruyi-Mini-7B

Image-to-Video • Updated Dec 25, 2024 • 206 • 610
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

Paper • 2412.06016 • Published Dec 8, 2024 • 20
Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published Dec 13, 2024 • 108

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 30
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs