-
AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation
Paper • 2602.17100 • Published • 4 -
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant
Paper • 2603.01059 • Published • 1 -
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models
Paper • 2603.00618 • Published -
Heterogeneous Agent Collaborative Reinforcement Learning
Paper • 2603.02604 • Published • 194
Collections
Discover the best community collections!
Collections including paper arxiv:2601.10332
-
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper • 2512.24138 • Published • 30 -
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper • 2512.24165 • Published • 52 -
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper • 2512.24617 • Published • 66 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99
-
Adversarial Flow Models
Paper • 2511.22475 • Published • 24 -
DiP: Taming Diffusion Models in Pixel Space
Paper • 2511.18822 • Published • 29 -
Asking like Socrates: Socrates helps VLMs understand remote sensing images
Paper • 2511.22396 • Published • 5 -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 17
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 77 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 56 -
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
Paper • 2505.16990 • Published • 22 -
D-AR: Diffusion via Autoregressive Models
Paper • 2505.23660 • Published • 34
-
HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models
Paper • 2312.00079 • Published • 17 -
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
Paper • 2402.05195 • Published • 19 -
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
Paper • 2403.12015 • Published • 70 -
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders
Paper • 2601.10332 • Published • 31
-
Compositional Generative Modeling: A Single Model is Not All You Need
Paper • 2402.01103 • Published • 1 -
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation
Paper • 2601.10061 • Published • 32 -
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders
Paper • 2601.10332 • Published • 31 -
What Happens Next? Next Scene Prediction with a Unified Video Model
Paper • 2512.13015 • Published
-
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 121 -
KlingAvatar 2.0 Technical Report
Paper • 2512.13313 • Published • 44 -
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 95 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 222
-
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 84 -
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Paper • 2509.22638 • Published • 70 -
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Paper • 2510.05034 • Published • 51 -
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders
Paper • 2601.10332 • Published • 31
-
facebook/w2v-bert-2.0
Feature Extraction • 0.6B • Updated • 3.1M • 213 -
facebook/metaclip-h14-fullcc2.5b
Zero-Shot Image Classification • 1.0B • Updated • 10.8k • 49 -
openai/clip-vit-large-patch14
Zero-Shot Image Classification • 0.4B • Updated • 21.1M • 1.99k -
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 1.4M • 1.47k
-
Hot Or Not
🏢9Evaluate hotness, beauty, and attractiveness of an image
-
Audioldm Text To Audio Generation
🔊816Generate audio from text descriptions
-
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 4.86M • • 5.6k -
Whisper Large V3
🤫827Transcribe or translate audio and YouTube videos to text
-
AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation
Paper • 2602.17100 • Published • 4 -
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant
Paper • 2603.01059 • Published • 1 -
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models
Paper • 2603.00618 • Published -
Heterogeneous Agent Collaborative Reinforcement Learning
Paper • 2603.02604 • Published • 194
-
Compositional Generative Modeling: A Single Model is Not All You Need
Paper • 2402.01103 • Published • 1 -
CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation
Paper • 2601.10061 • Published • 32 -
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders
Paper • 2601.10332 • Published • 31 -
What Happens Next? Next Scene Prediction with a Unified Video Model
Paper • 2512.13015 • Published
-
GARDO: Reinforcing Diffusion Models without Reward Hacking
Paper • 2512.24138 • Published • 30 -
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models
Paper • 2512.24165 • Published • 52 -
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space
Paper • 2512.24617 • Published • 66 -
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
Paper • 2512.23447 • Published • 99
-
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 121 -
KlingAvatar 2.0 Technical Report
Paper • 2512.13313 • Published • 44 -
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 95 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 222
-
Adversarial Flow Models
Paper • 2511.22475 • Published • 24 -
DiP: Taming Diffusion Models in Pixel Space
Paper • 2511.18822 • Published • 29 -
Asking like Socrates: Socrates helps VLMs understand remote sensing images
Paper • 2511.22396 • Published • 5 -
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper • 2512.05591 • Published • 17
-
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 84 -
Language Models Can Learn from Verbal Feedback Without Scalar Rewards
Paper • 2509.22638 • Published • 70 -
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
Paper • 2510.05034 • Published • 51 -
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders
Paper • 2601.10332 • Published • 31
-
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 77 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 56 -
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding
Paper • 2505.16990 • Published • 22 -
D-AR: Diffusion via Autoregressive Models
Paper • 2505.23660 • Published • 34
-
facebook/w2v-bert-2.0
Feature Extraction • 0.6B • Updated • 3.1M • 213 -
facebook/metaclip-h14-fullcc2.5b
Zero-Shot Image Classification • 1.0B • Updated • 10.8k • 49 -
openai/clip-vit-large-patch14
Zero-Shot Image Classification • 0.4B • Updated • 21.1M • 1.99k -
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 1.4M • 1.47k
-
HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models
Paper • 2312.00079 • Published • 17 -
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space
Paper • 2402.05195 • Published • 19 -
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
Paper • 2403.12015 • Published • 70 -
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders
Paper • 2601.10332 • Published • 31
-
Hot Or Not
🏢9Evaluate hotness, beauty, and attractiveness of an image
-
Audioldm Text To Audio Generation
🔊816Generate audio from text descriptions
-
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 4.86M • • 5.6k -
Whisper Large V3
🤫827Transcribe or translate audio and YouTube videos to text