Papers
arxiv:2405.08748

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Published on May 14, 2024
Β· Submitted by
AK
on May 15, 2024
#3 Paper of the day
Authors:
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

Hunyuan-DiT, a text-to-image diffusion transformer, achieves state-of-the-art in Chinese-to-image generation by incorporating fine-grained language understanding and multi-turn dialogue capabilities.

AI-generated summary

We present Hunyuan-DiT, a text-to-image diffusion transformer with fine-grained understanding of both English and Chinese. To construct Hunyuan-DiT, we carefully design the transformer structure, text encoder, and positional encoding. We also build from scratch a whole data pipeline to update and evaluate data for iterative model optimization. For fine-grained language understanding, we train a Multimodal Large Language Model to refine the captions of the images. Finally, Hunyuan-DiT can perform multi-turn multimodal dialogue with users, generating and refining images according to the context. Through our holistic human evaluation protocol with more than 50 professional human evaluators, Hunyuan-DiT sets a new state-of-the-art in Chinese-to-image generation compared with other open-source models. Code and pretrained models are publicly available at github.com/Tencent/HunyuanDiT

Community

The first open Stable Diffusion 3-like architecture modelπŸ‘€ Image quality is good!
Screenshot 2024-05-15 at 19.33.08.png

That's a fascinating study with impressive details. πŸ‘πŸ»
Are you considering expanding to other languages as well?

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2405.08748
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 11

Browse 11 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2405.08748 in a dataset README.md to link it from this page.

Spaces citing this paper 212

Collections including this paper 3