-
UCSC-VLAA/gpt-image-edit-training
Image-to-Image • Updated • 26 -
UCSC-VLAA/GPT-Image-Edit-1.5M
Viewer • Updated • 2.78M • 22.9k • 77 -
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Paper • 2507.21033 • Published • 23 -
UCSC-VLAA/gpt-image-edit-benchmark-results
Viewer • Updated • 1.21k • 134 • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2507.21033
-
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
Paper • 2502.04416 • Published • 12 -
Competitive Programming with Large Reasoning Models
Paper • 2502.06807 • Published • 69 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Paper • 2507.21033 • Published • 23
-
BrushEdit: All-In-One Image Inpainting and Editing
Paper • 2412.10316 • Published • 36 -
ColorFlow: Retrieval-Augmented Image Sequence Colorization
Paper • 2412.11815 • Published • 26 -
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers
Paper • 2412.09611 • Published • 11 -
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing
Paper • 2412.07517 • Published • 11
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 30 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
OmniGen2: Exploration to Advanced Multimodal Generation
Paper • 2506.18871 • Published • 78 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 115 -
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation
Paper • 2502.05415 • Published • 20 -
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Paper • 2408.12528 • Published • 51
-
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Paper • 2412.14475 • Published • 58 -
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 53 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
WavePulse: Real-time Content Analytics of Radio Livestreams
Paper • 2412.17998 • Published • 11
-
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Paper • 2407.09413 • Published • 11 -
MAVIS: Mathematical Visual Instruction Tuning
Paper • 2407.08739 • Published • 32 -
Kvasir-VQA: A Text-Image Pair GI Tract Dataset
Paper • 2409.01437 • Published • 71 -
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct
Paper • 2409.05840 • Published • 49
-
UCSC-VLAA/gpt-image-edit-training
Image-to-Image • Updated • 26 -
UCSC-VLAA/GPT-Image-Edit-1.5M
Viewer • Updated • 2.78M • 22.9k • 77 -
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Paper • 2507.21033 • Published • 23 -
UCSC-VLAA/gpt-image-edit-benchmark-results
Viewer • Updated • 1.21k • 134 • 1
-
OmniGen2: Exploration to Advanced Multimodal Generation
Paper • 2506.18871 • Published • 78 -
OmniGen: Unified Image Generation
Paper • 2409.11340 • Published • 115 -
Show-o Turbo: Towards Accelerated Unified Multimodal Understanding and Generation
Paper • 2502.05415 • Published • 20 -
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Paper • 2408.12528 • Published • 51
-
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
Paper • 2502.04416 • Published • 12 -
Competitive Programming with Large Reasoning Models
Paper • 2502.06807 • Published • 69 -
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling
Paper • 2502.06703 • Published • 152 -
GPT-IMAGE-EDIT-1.5M: A Million-Scale, GPT-Generated Image Dataset
Paper • 2507.21033 • Published • 23
-
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
Paper • 2412.14475 • Published • 58 -
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 53 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
WavePulse: Real-time Content Analytics of Radio Livestreams
Paper • 2412.17998 • Published • 11
-
BrushEdit: All-In-One Image Inpainting and Editing
Paper • 2412.10316 • Published • 36 -
ColorFlow: Retrieval-Augmented Image Sequence Colorization
Paper • 2412.11815 • Published • 26 -
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers
Paper • 2412.09611 • Published • 11 -
FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing
Paper • 2412.07517 • Published • 11
-
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Paper • 2407.09413 • Published • 11 -
MAVIS: Mathematical Visual Instruction Tuning
Paper • 2407.08739 • Published • 32 -
Kvasir-VQA: A Text-Image Pair GI Tract Dataset
Paper • 2409.01437 • Published • 71 -
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct
Paper • 2409.05840 • Published • 49
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 30 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23