Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2601.10332

about 17 hours ago

AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation

Paper • 2602.17100 • Published Feb 19 • 4
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant

Paper • 2603.01059 • Published Mar 1 • 1
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models

Paper • 2603.00618 • Published Feb 28
Heterogeneous Agent Collaborative Reinforcement Learning

Paper • 2603.02604 • Published Mar 3 • 194

GARDO: Reinforcing Diffusion Models without Reward Hacking

Paper • 2512.24138 • Published Dec 30, 2025 • 30
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Paper • 2512.24165 • Published Dec 30, 2025 • 52
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Paper • 2512.24617 • Published Dec 31, 2025 • 66
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Paper • 2512.23447 • Published Dec 29, 2025 • 99

Adversarial Flow Models

Paper • 2511.22475 • Published Nov 27, 2025 • 24
DiP: Taming Diffusion Models in Pixel Space

Paper • 2511.18822 • Published Nov 24, 2025 • 29
Asking like Socrates: Socrates helps VLMs understand remote sensing images

Paper • 2511.22396 • Published Nov 27, 2025 • 5
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Paper • 2512.05591 • Published Dec 5, 2025 • 17

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Paper • 2503.09573 • Published Mar 12, 2025 • 77
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

Paper • 2505.15045 • Published May 21, 2025 • 56
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding

Paper • 2505.16990 • Published May 22, 2025 • 22
D-AR: Diffusion via Autoregressive Models

Paper • 2505.23660 • Published May 29, 2025 • 34

Txt2img research

HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models

Paper • 2312.00079 • Published Nov 30, 2023 • 17
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

Paper • 2402.05195 • Published Feb 7, 2024 • 19
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Paper • 2403.12015 • Published Mar 18, 2024 • 70
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Paper • 2601.10332 • Published Jan 15 • 31

Compositional Generative Modeling: A Single Model is Not All You Need

Paper • 2402.01103 • Published Feb 2, 2024 • 1
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

Paper • 2601.10061 • Published Jan 15 • 32
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Paper • 2601.10332 • Published Jan 15 • 31
What Happens Next? Next Scene Prediction with a Unified Video Model

Paper • 2512.13015 • Published Dec 15, 2025

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 121
KlingAvatar 2.0 Technical Report

Paper • 2512.13313 • Published Dec 15, 2025 • 44
SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published Dec 23, 2025 • 95
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published Dec 18, 2025 • 222

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9, 2025 • 84
Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26, 2025 • 70
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Paper • 2510.05034 • Published Oct 6, 2025 • 51
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Paper • 2601.10332 • Published Jan 15 • 31

facebook/w2v-bert-2.0

Feature Extraction • 0.6B • Updated Jan 25, 2024 • 3.1M • 213
facebook/metaclip-h14-fullcc2.5b

Zero-Shot Image Classification • 1.0B • Updated Jan 11, 2024 • 10.8k • 49
openai/clip-vit-large-patch14

Zero-Shot Image Classification • 0.4B • Updated Sep 15, 2023 • 21.1M • 1.99k
Salesforce/blip-image-captioning-large

Image-to-Text • 0.5B • Updated Feb 3, 2025 • 1.4M • 1.47k

Running

Agents

9

Hot Or Not

🏢

9

Evaluate hotness, beauty, and attractiveness of an image
Runtime error

Agents

Featured

816

Audioldm Text To Audio Generation

🔊

816

Generate audio from text descriptions
openai/whisper-large-v3

Automatic Speech Recognition • 2B • Updated Aug 12, 2024 • 4.86M • • 5.6k
Running on Zero

MCP

Featured

827

Whisper Large V3

🤫

827

Transcribe or translate audio and YouTube videos to text

about 17 hours ago

AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation

Paper • 2602.17100 • Published Feb 19 • 4
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant

Paper • 2603.01059 • Published Mar 1 • 1
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models

Paper • 2603.00618 • Published Feb 28
Heterogeneous Agent Collaborative Reinforcement Learning

Paper • 2603.02604 • Published Mar 3 • 194

Compositional Generative Modeling: A Single Model is Not All You Need

Paper • 2402.01103 • Published Feb 2, 2024 • 1
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

Paper • 2601.10061 • Published Jan 15 • 32
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Paper • 2601.10332 • Published Jan 15 • 31
What Happens Next? Next Scene Prediction with a Unified Video Model

Paper • 2512.13015 • Published Dec 15, 2025

GARDO: Reinforcing Diffusion Models without Reward Hacking

Paper • 2512.24138 • Published Dec 30, 2025 • 30
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models

Paper • 2512.24165 • Published Dec 30, 2025 • 52
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Paper • 2512.24617 • Published Dec 31, 2025 • 66
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Paper • 2512.23447 • Published Dec 29, 2025 • 99

MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 121
KlingAvatar 2.0 Technical Report

Paper • 2512.13313 • Published Dec 15, 2025 • 44
SemanticGen: Video Generation in Semantic Space

Paper • 2512.20619 • Published Dec 23, 2025 • 95
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI

Paper • 2512.16676 • Published Dec 18, 2025 • 222

Adversarial Flow Models

Paper • 2511.22475 • Published Nov 27, 2025 • 24
DiP: Taming Diffusion Models in Pixel Space

Paper • 2511.18822 • Published Nov 24, 2025 • 29
Asking like Socrates: Socrates helps VLMs understand remote sensing images

Paper • 2511.22396 • Published Nov 27, 2025 • 5
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Paper • 2512.05591 • Published Dec 5, 2025 • 17

Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9, 2025 • 84
Language Models Can Learn from Verbal Feedback Without Scalar Rewards

Paper • 2509.22638 • Published Sep 26, 2025 • 70
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Paper • 2510.05034 • Published Oct 6, 2025 • 51
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Paper • 2601.10332 • Published Jan 15 • 31

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Paper • 2503.09573 • Published Mar 12, 2025 • 77
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective

Paper • 2505.15045 • Published May 21, 2025 • 56
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding

Paper • 2505.16990 • Published May 22, 2025 • 22
D-AR: Diffusion via Autoregressive Models

Paper • 2505.23660 • Published May 29, 2025 • 34

facebook/w2v-bert-2.0

Feature Extraction • 0.6B • Updated Jan 25, 2024 • 3.1M • 213
facebook/metaclip-h14-fullcc2.5b

Zero-Shot Image Classification • 1.0B • Updated Jan 11, 2024 • 10.8k • 49
openai/clip-vit-large-patch14

Zero-Shot Image Classification • 0.4B • Updated Sep 15, 2023 • 21.1M • 1.99k
Salesforce/blip-image-captioning-large

Image-to-Text • 0.5B • Updated Feb 3, 2025 • 1.4M • 1.47k

Txt2img research

HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models

Paper • 2312.00079 • Published Nov 30, 2023 • 17
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

Paper • 2402.05195 • Published Feb 7, 2024 • 19
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Paper • 2403.12015 • Published Mar 18, 2024 • 70
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders

Paper • 2601.10332 • Published Jan 15 • 31

Running

Agents

9

Hot Or Not

🏢

9

Evaluate hotness, beauty, and attractiveness of an image
Runtime error

Agents

Featured

816

Audioldm Text To Audio Generation

🔊

816

Generate audio from text descriptions
openai/whisper-large-v3

Automatic Speech Recognition • 2B • Updated Aug 12, 2024 • 4.86M • • 5.6k
Running on Zero

MCP

Featured

827

Whisper Large V3

🤫

827

Transcribe or translate audio and YouTube videos to text

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs