Papers
arxiv:2411.17945

MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation

Published on Nov 26, 2024
· Submitted by
Mohammad Sadil Khan
on Nov 27, 2024

Abstract

A novel dataset MARVEL-40M+ with extensive text annotations and a two-stage pipeline for high-fidelity 3D content generation from text prompts outperforms existing datasets in annotation quality and diversity.

AI-generated summary

Generating high-fidelity 3D content from text prompts remains a significant challenge in computer vision due to the limited size, diversity, and annotation depth of the existing datasets. To address this, we introduce MARVEL-40M+, an extensive dataset with 40 million text annotations for over 8.9 million 3D assets aggregated from seven major 3D datasets. Our contribution is a novel multi-stage annotation pipeline that integrates open-source pretrained multi-view VLMs and LLMs to automatically produce multi-level descriptions, ranging from detailed (150-200 words) to concise semantic tags (10-20 words). This structure supports both fine-grained 3D reconstruction and rapid prototyping. Furthermore, we incorporate human metadata from source datasets into our annotation pipeline to add domain-specific information in our annotation and reduce VLM hallucinations. Additionally, we develop MARVEL-FX3D, a two-stage text-to-3D pipeline. We fine-tune Stable Diffusion with our annotations and use a pretrained image-to-3D network to generate 3D textured meshes within 15s. Extensive evaluations show that MARVEL-40M+ significantly outperforms existing datasets in annotation quality and linguistic diversity, achieving win rates of 72.41% by GPT-4 and 73.40% by human evaluators.

Community

Paper author Paper submitter
This comment has been hidden

I'd like to thank the authors for their work. A question: do you plan to release MARVEL- 40M+?

·
Paper author

We will soon release the dataset, model checkpoints, and outputs from the paper.

Paper author Paper submitter

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2411.17945
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2411.17945 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2411.17945 in a Space README.md to link it from this page.

Collections including this paper 3