How to use from the
Use from the
Transformers library
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("hardcoremoore/SenseNova-U1-8B-MoT-Infographic", trust_remote_code=True, dtype="auto")
Quick Links

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

English | ็ฎ€ไฝ“ไธญๆ–‡

arXiv HuggingFace Model ModelScope-ๆจกๅž‹ SenseNova-U1 Demo License Discord

SenseNova-U1

๐Ÿ“ฃ Updated News

โœจ Click to expand older news

๐ŸŒŸ Overview

๐Ÿš€ SenseNova U1 is a new series of native multimodal models that unifies multimodal understanding, reasoning, and generation within a monolithic architecture. It marks a fundamental paradigm shift in multimodal AI: from modality integration to true unification. Rather than relying on adapters to translate between modalities, SenseNova U1 models think-and-act across language and vision natively.

โœจ Click to expand architecture details

Unifying visual understanding and generation in an end-to-end architecture from pixel to word opens tremendous possibilities, enabling highly efficient and strong understanding, generation, and interleaved reasoning in a natively multimodal manner.

radar plot

๐Ÿ—๏ธ Key Pillars:

At the core of SenseNova U1 is NEO-unify, a novel architecture designed from the first principles for multimodal AI: It eliminates both Visual Encoder (VE) and Variational Auto-Encoder (VAE) where pixel-word information are inherently and deeply correlated. Several important features are as follows:

  • ๐Ÿ”— Model language and visual information end-to-end as a unified compound.
  • ๐Ÿ–ผ๏ธ Preserve semantic richness while maintaining pixel-level visual fidelity.
  • ๐Ÿง  Reason across modalities with high efficiency & minimal conflict via native MoTs.

Powered by this new core architecture, SenseNova U1-8B-MoT-Infographic (infographic-specifically enhanced version of SenseNova U1-8B-MoT) delivers exceptional efficiency and state-of-the-art infographic performance:


Generation Latency vs. Averaging Performance on Infographic Benchmarks (BizGenEval, IGenBench).

Generation Latency vs. Averaging Performance on general benchmarks (OneIG, LongText, CVTG).
  • Benchmark Performance: Compared with the base SenseNova-U1-8B-MoT model, BizGenEval hard/easy increased from 39.8 / 61.1 to 46.6 / 65.4 (+6.8 / +4.3 points), and IGenBench Q-ACC/I-ACC increased from 51.3 / 4.2 to 69.5 / 17.0 (+18.2 / +12.8 points), while maintaining robust visual understanding capabilities without substantial degradation.
  • Generation Quality: The model produces complex infographics across 100+ styles and layouts, with improved visual aesthetics and text rendering โ€” including dense small text such as arXiv-style pages.
โœจ Click to expand Benchmark Details
Model BizGenEval Avg. (hard / easy) โ†‘ IGenBench Q-ACCโ†‘ IGenBench I-ACC โ†‘ OneIG(EN) โ†‘ OneIG(ZH) โ†‘
Commercial Models
Nano-Banana-Pro 76.7 / 93.7 90.6 48.8 58.1 56.8
Nano-Banana-2.0 68.5 / 92.5 85.6 34.4 54.0 54.9
GPT-Image-1.5 35.9 / 81.6 55.0 12.0 - -
Qwen-Image-2.0 45.5 / 65.8 50.0 3.0 54.1 50.9
Seedream-4.5 30.1 / 66.2 61.0 6.0 56.4 55.0
Open-source Models
SenseNova-U1-8B-MoT-Infographic 46.6 / 65.4 69.5 17.0 55.6 53.3
SenseNova-U1-8B-MoT 39.8 / 61.1 51.3 4.2 54.5 53.8
Z-Image 8.2 / 43.8 30.0 1.0 54.6 53.5
Qwen-Image-2512 6.3 / 41.0 32.2 1.0 53.0 51.5
Qwen-Image 2.8 / 23.8 36.0 0.0 53.9 54.8
Bagel 2.0 / 3.7 4.9 0.0 36.1 37.0

IGenBench scores are reported as percentages. Models are ordered by the arithmetic mean of BizGenEval hard, BizGenEval easy, IGenBench Q-ACC, and IGenBench I-ACC within the commercial and open-source groups separately. OneIG is included as a general generation reference. Full per-category results are intended for the Hugging Face model card.

  • ๐Ÿ“ฐ High-density information rendering (Specialized): This specific model demonstrates strong capabilities in dense visual communication, generating richly structured layouts for knowledge illustrations, posters, presentations, comics, resumes, and other information-rich formats.

  • ๐Ÿ† Open-source SoTA: SenseNova U1 sets a new standard for unified multimodal understanding and generation, achieving state-of-the-art infographic performance among open-source models.

๐ŸŽจ Infographic Showcases

๐Ÿ“ธ More generation samples: see โœจ Infographic Showcases.

โœจ Click to collapse infographic showcases
infographic 004 infographic 005 infographic 006
infographic 066 infographic 036 infographic 023
infographic 057 infographic 072 infographic 078
infographic 064 infographic 037 infographic 032 infographic 042
infographic 085 infographic 098 infographic 056 infographic 019
infographic 048 infographic 070 infographic 021 infographic 034
infographic 089 infographic 060 infographic 083 infographic 045

Qualitative Comparison

We present a qualitative comparison between the base SenseNova-U1-8B-MoT and the fine-tuned SenseNova-U1-8B-MoT-Infographic model across five key dimensions: background stability, chart accuracy, text Rendering Accuracy and size appropriateness, arXiv paper rendering quality, and overall layout and content understanding. For the full comparison, please refer to โœจ Comparation Infographic Cases.

โœจ Click to collapse qualitative comparison

Background Stability

U1-8B-MoT 8B-MoT-Infographic U1-8B-MoT 8B-MoT-Infographic
Prompt
่ฏฅไฟกๆฏๅ›พ้ข˜ไธบโ€œ็‰ˆๆƒ่ง†่ง‰ๆฆ‚่งˆโ€๏ผŒๆ•ดไฝ“้‡‡็”จๆจชๅ‘ๅˆ†ๆ ๅธƒๅฑ€๏ผŒๅˆ†ไธบไธŠไธ‹ไธคไธชไธป่ฆ้ƒจๅˆ†ใ€‚ไธŠๅŠ้ƒจๅˆ†ไธบ่ง†่ง‰ๅŒ–ๆฆ‚่งˆๅŒบ๏ผŒ็”ฑๅ››ไธชๅฝฉ่‰ฒ็ŸฉๅฝขๅŒบๅ—ๅนถๅˆ—็ป„ๆˆ๏ผŒๆฏไธชๅŒบๅ—้€š่ฟ‡ๅ›พๆ ‡ๅ’Œ็ฎ€็Ÿญๆ ‡้ข˜ไผ ่พพไธ€ไธชๆ ธๅฟƒๆฆ‚ๅฟต๏ผ›ไธ‹ๅŠ้ƒจๅˆ†ไธบโ€œใ€็‰ˆๆƒๅŸบ็ก€ๅธธ่ฏ†ใ€‘โ€่ฏฆ็ป†่งฃ้‡ŠๅŒบ๏ผŒๅŒ…ๅซๅ››ไธช็ผ–ๅทๆก็›ฎ๏ผŒๅฏนๅบ”ไธŠๅŠ้ƒจๅˆ†็š„ๅ››ไธชไธป้ข˜๏ผŒๆไพ›ๆ›ด่ฏฆๅฐฝ็š„ๆ–‡ๅญ—่ฏดๆ˜Žใ€‚

**ไธŠๅŠ้ƒจๅˆ†๏ผš็‰ˆๆƒ่ง†่ง‰ๆฆ‚่งˆ**

ๆญคๅŒบๅŸŸ็”ฑๅ››ไธชๆฐดๅนณๆŽ’ๅˆ—็š„ๅฝฉ่‰ฒๆ–นๅ—ๆž„ๆˆ๏ผŒไปŽๅทฆ่‡ณๅณไพๆฌกไธบๆต…่“่‰ฒใ€ๆต…้ป„่‰ฒใ€ๆต…็ปฟ่‰ฒๅ’Œๆต…็ดซ่‰ฒ๏ผŒๆฏไธชๆ–นๅ—ๅ†…ๅซไธ€็ป„ๅ›พๆ ‡ๅ’Œไธ‹ๆ–น็š„ไธญๆ–‡ๆ ‡้ข˜ใ€‚

1. **็ฌฌไธ€ๅ—๏ผˆๆต…่“่‰ฒ๏ผ‰๏ผšๅˆ›ไฝœๅณไบง็”Ÿ**
* **ๅ›พๆ ‡**๏ผšๅทฆไพงๆ˜ฏไธ€ไธชๅ‘ๅ…‰็š„็ฏๆณก๏ผŒไธญ้—ดๆ˜ฏไธ€ไธชๅธฆๆœ‰็ฌ”็š„ๆ–‡ๆกฃๅ›พๆ ‡๏ผŒๅณไพงๆ˜ฏไธ€ไธช้”ๅคดๅ›พๆ ‡๏ผŒไธ‰่€…ไน‹้—ด็”จ็ฎญๅคด่ฟžๆŽฅ๏ผŒ่กจ็คบโ€œๅˆ›ๆ„ โ†’ ๅˆ›ไฝœ โ†’ ไฟๆŠคโ€็š„ๆต็จ‹ใ€‚
* **ๆ–‡ๅญ—**๏ผš
* ๅ›พๆ ‡ไธ‹ๆ–นๆœ‰ๅฐๅญ—โ€œ่‡ชๅŠจไฟๆŠคโ€ใ€‚
* ๆ–นๅ—ๅบ•้ƒจๆœ‰ๅคงๅญ—ๆ ‡้ข˜โ€œๅˆ›ไฝœๅณไบง็”Ÿโ€ใ€‚

2. **็ฌฌไบŒๅ—๏ผˆๆต…้ป„่‰ฒ๏ผ‰๏ผšๆ ธๅฟƒๆƒๅˆฉ**
* **ๅ›พๆ ‡**๏ผšไธญๅฟƒๆ˜ฏไธ€ๅชๆ‰‹ๆŽŒๅ‘ไธŠๆ‰˜ไธพ๏ผŒไธŠๆ–นๆœ‰ๅคšไธชๅ…ƒ็ด ๅ›ด็ป•๏ผšไธ€ไธชๅธฆยฉ็ฌฆๅท็š„ๅœ†ๅœˆใ€ไธ€ไธชๅ–‡ๅญใ€ไธ€ๅ †้‡‘ๅธๅ’Œ็พŽๅ…ƒ็ฌฆๅทใ€ไปฅๅŠๅคšไธชๆŒ‡ๅ‘ไธๅŒๆ–นๅ‘็š„็ฎญๅคด๏ผŒ่ฑกๅพๆƒๅˆฉ็š„ๅคš็ง่กจ็Žฐๅฝขๅผๅ’Œๆ”ถ็›Šใ€‚
* **ๆ–‡ๅญ—**๏ผš
* ๅ›พๆ ‡ไธ‹ๆ–นๆ— ้ขๅค–ๅฐๅญ—ใ€‚
* ๆ–นๅ—ๅบ•้ƒจๆœ‰ๅคงๅญ—ๆ ‡้ข˜โ€œๆ ธๅฟƒๆƒๅˆฉโ€ใ€‚

3. **็ฌฌไธ‰ๅ—๏ผˆๆต…็ปฟ่‰ฒ๏ผ‰๏ผš็‰นๅฎšๆกไปถๅนณ่กก**
* **ๅ›พๆ ‡**๏ผšไธ€ไธชๅคฉๅนณ๏ผŒๅทฆไพงๆ‰˜็›˜ไธŠๆœ‰ๆ‰“ๅผ€็š„ไนฆๆœฌๅ’Œๆ ‡ๆœ‰โ€œNEWSโ€็š„้บฆๅ…‹้ฃŽ๏ผŒไปฃ่กจโ€œๅˆ็†ไฝฟ็”จโ€๏ผ›ๅณไพงๆ‰˜็›˜ไธŠๆœ‰ไธ€ไธชๅธฆ้”็š„ๆ–‡ไปถๅคน๏ผŒไปฃ่กจโ€œๅ—ๆŽงไฝœๅ“โ€ใ€‚ๅคฉๅนณๅ‘ๅณไพงๅ€พๆ–œใ€‚
* **ๆ–‡ๅญ—**๏ผš
* ๅทฆไพงๆ‰˜็›˜ไธ‹ๆ–นๆ ‡ๆณจโ€œๅˆ็†ไฝฟ็”จโ€ใ€‚
* ๅณไพงๆ‰˜็›˜ไธ‹ๆ–นๆ ‡ๆณจโ€œๅ—ๆŽงไฝœๅ“โ€ใ€‚
* ๆ–นๅ—ๅบ•้ƒจๆœ‰ๅคงๅญ—ๆ ‡้ข˜โ€œ็‰นๅฎšๆกไปถๅนณ่กกโ€ใ€‚

4. **็ฌฌๅ››ๅ—๏ผˆๆต…็ดซ่‰ฒ๏ผ‰๏ผšไฟๆŠคๆœŸ้™**
* **ๅ›พๆ ‡**๏ผšๅทฆไพงๆ˜ฏไธ€ไธชๆฒ™ๆผ๏ผŒไธญ้—ดๆ˜ฏไธ€ไธชๅ‘ๅณ็š„็ฒ—็ฎญๅคด๏ผŒๅณไพงๆ˜ฏไธ€ไธชๅข“็ข‘๏ผˆ้กถ้ƒจๆœ‰ๅๅญ—ๆžถ๏ผ‰ใ€‚ๆฒ™ๆผไธ‹ๆ–น่ฟ˜ๆœ‰ไธ€ไธชๆ—ถ้’Ÿๅ›พๆ ‡ใ€‚
* **ๆ–‡ๅญ—**๏ผš
* ๅข“็ข‘ๆ—ๆ ‡ๆณจโ€œไฝœ่€…ๆœ‰็”Ÿไน‹ๅนด + Xๅนดโ€ใ€‚
* ๆ–นๅ—ๅบ•้ƒจๆœ‰ๅคงๅญ—ๆ ‡้ข˜โ€œไฟๆŠคๆœŸ้™โ€ใ€‚

**ไธ‹ๅŠ้ƒจๅˆ†๏ผšใ€็‰ˆๆƒๅŸบ็ก€ๅธธ่ฏ†ใ€‘**

ๆญคๅŒบๅŸŸไฝไบŽไธŠๅŠ้ƒจๅˆ†ไธ‹ๆ–น๏ผŒ่ƒŒๆ™ฏไธบ็™ฝ่‰ฒ๏ผŒๅŒ…ๅซๅ››ไธช็‹ฌ็ซ‹็š„ๆ–‡ๆœฌๆก†๏ผŒๆฏไธชๆ–‡ๆœฌๆก†้ƒฝๆœ‰ไธ€ไธชๅฝฉ่‰ฒๆ ‡้ข˜ๆ ๅ’Œไธ‹ๆ–น็š„่ฏฆ็ป†่ฏดๆ˜Žๆ–‡ๅญ—๏ผŒ้ขœ่‰ฒไธŽไธŠๅŠ้ƒจๅˆ†ๅฏนๅบ”ใ€‚

1. **1. ่‡ชๅŠจ่Žทๅพ—ไฟๆŠค**
* **ๆ ‡้ข˜ๆ **๏ผš่“่‰ฒ่ƒŒๆ™ฏ๏ผŒ็™ฝ่‰ฒๆ–‡ๅญ—โ€œ1. ่‡ชๅŠจ่Žทๅพ—ไฟๆŠคโ€ใ€‚
* **ๆญฃๆ–‡**๏ผšโ€œไฝœๅ“ๅˆ›ไฝœๅฎŒๆˆไน‹ๆ—ถ่ตท๏ผŒๅณ่‡ชๅŠจไบซๆœ‰็‰ˆๆƒ๏ผŒๆ— ้œ€็™ป่ฎฐ๏ผˆ็™ป่ฎฐไธป่ฆๆ˜ฏไธพ่ฏ๏ผ‰ใ€‚โ€

2. **2. ๆ ธๅฟƒๆƒๅˆฉ**
* **ๆ ‡้ข˜ๆ **๏ผšๆฉ™้ป„่‰ฒ่ƒŒๆ™ฏ๏ผŒ็™ฝ่‰ฒๆ–‡ๅญ—โ€œ2. ๆ ธๅฟƒๆƒๅˆฉโ€ใ€‚
* **ๆญฃๆ–‡**๏ผšโ€œๅŒ…ๆ‹ฌไบบ่บซๆƒ๏ผˆๅฆ‚็ฝฒๅๆƒใ€ไฟฎๆ”นๆƒ๏ผ‰ๅ’Œ่ดขไบงๆƒ๏ผˆๅฆ‚ๅคๅˆถๆƒใ€ๅ‘่กŒๆƒใ€ไฟกๆฏ็ฝ‘็ปœไผ ๆ’ญๆƒ๏ผŒๅฏ่ฎธๅฏๆˆ–่ฝฌ่ฎฉ่Žทๅˆฉ๏ผ‰ใ€‚โ€

3. **3. ๅˆ็†ไฝฟ็”จ**
* **ๆ ‡้ข˜ๆ **๏ผš็ปฟ่‰ฒ่ƒŒๆ™ฏ๏ผŒ็™ฝ่‰ฒๆ–‡ๅญ—โ€œ3. ๅˆ็†ไฝฟ็”จโ€ใ€‚
* **ๆญฃๆ–‡**๏ผšโ€œๅœจ็‰นๅฎšๆกไปถไธ‹๏ผˆๅฆ‚ๆ•™ๅญฆใ€ๆ–ฐ้—ปๆŠฅ้“ใ€ไธชไบบๅญฆไน ็ญ‰๏ผ‰๏ผŒๅฏไปฅไธ็ป่ฎธๅฏใ€ไธๆ”ฏไป˜ๆŠฅ้…ฌไฝฟ็”จ๏ผŒไฝ†้œ€ๆŒ‡ๆ˜Žไฝœ่€…ๅ’Œๅ‡บๅค„๏ผŒไธ”ไธๅพ—ไพต็Šฏๅ…ถไป–ๆƒๅˆฉใ€‚โ€

4. **4. ไฟๆŠคๆœŸ้™**
* **ๆ ‡้ข˜ๆ **๏ผš็ดซ่‰ฒ่ƒŒๆ™ฏ๏ผŒ็™ฝ่‰ฒๆ–‡ๅญ—โ€œ4. ไฟๆŠคๆœŸ้™โ€ใ€‚
* **ๆญฃๆ–‡**๏ผšโ€œไธ€่ˆฌไธบไฝœ่€…ๆœ‰็”Ÿไน‹ๅนดๅŠ ๆญปๅŽ50ๅนด๏ผˆไธญๅ›ฝๅคง้™†็ญ‰ๅคšๆ•ฐๅœฐๅŒบ๏ผ‰๏ผŒๆœŸ้™ๅฑŠๆปกๅŽ่ฟ›ๅ…ฅๅ…ฌๆœ‰้ข†ๅŸŸใ€‚โ€

**ๆ•ดไฝ“้ฃŽๆ ผไธŽๆ•ฐๆฎ็ผ–็ **๏ผš
่ฏฅไฟกๆฏๅ›พ้‡‡็”จๆ‰ๅนณๅŒ–่ฎพ่ฎก้ฃŽๆ ผ๏ผŒ่‰ฒๅฝฉ้ฒœๆ˜Žไธ”ๅˆ†ๅŒบๆธ…ๆ™ฐใ€‚้€š่ฟ‡้ขœ่‰ฒ็ผ–็ ๏ผˆ่“ใ€้ป„ใ€็ปฟใ€็ดซ๏ผ‰ๅฐ†ๅ››ไธชไธป้ข˜่ฟ›่กŒ่ง†่ง‰ๅŒบๅˆ†๏ผŒๅนถๅœจไธŠไธ‹ไธค้ƒจๅˆ†ไฟๆŒไธ€่‡ดใ€‚ๅ›พๆ ‡ไฝœไธบไธป่ฆ็š„ๆ•ฐๆฎๅฏ่ง†ๅŒ–ๆ‰‹ๆฎต๏ผŒ็›ด่ง‚ๅœฐ่กจ่พพไบ†ๆŠฝ่ฑกๆฆ‚ๅฟตใ€‚ๆ‰€ๆœ‰ๆ–‡ๅญ—ๅ‡ไธบ็ฎ€ไฝ“ไธญๆ–‡๏ผŒๅ†…ๅฎน็ป“ๆž„ไธฅ่ฐจ๏ผŒ้€ป่พ‘ๆธ…ๆ™ฐ๏ผŒๆ—จๅœจไปฅๅ›พๆ–‡็ป“ๅˆ็š„ๆ–นๅผๆ™ฎๅŠ็‰ˆๆƒๅŸบ็ก€็Ÿฅ่ฏ†ใ€‚
Prompt
่ฏฅไฟกๆฏๅ›พไปฅไธญๆ–‡ไธบไธป่ฆ่ฏญ่จ€๏ผŒ้‡‡็”จๆจชๅ‘ๅ››ๆ ผๅธƒๅฑ€๏ผŒๆธ…ๆ™ฐๅ‘ˆ็Žฐไธ€ไธชๅ“็‰ŒไปŽ่กฐ่ฝๅˆฐๅคๅ…ด็š„ๅ››ไธชๅ…ณ้”ฎ้˜ถๆฎตใ€‚ๆ•ดไฝ“้ฃŽๆ ผไธบๆ‰‹็ป˜ๅก้€šๆ’็”ป๏ผŒ่‰ฒๅฝฉๆŸ”ๅ’Œ๏ผŒ็บฟๆก็ฎ€ๆด๏ผŒๅ…ทๆœ‰ไบฒๅ’ŒๅŠ›ๅ’Œๅ™ไบ‹ๆ€งใ€‚ๆฏไธช้˜ถๆฎต็”ฑไธŠๆ–น็š„ๆ ‡้ข˜ใ€ไธญ้—ด็š„ๆ’ๅ›พๅ’Œไธ‹ๆ–น็š„ๆ–‡ๅญ—่ฏดๆ˜Žไธ‰้ƒจๅˆ†ๆž„ๆˆ๏ผŒ้€š่ฟ‡่™š็บฟๅˆ†้š”๏ผŒ็ป“ๆž„ๅˆ†ๆ˜Žใ€‚

็ฌฌไธ€้˜ถๆฎตๆ ‡้ข˜ไธบโ€œ1. ๆ›พ็ป็š„่พ‰็…ŒไธŽๆฒก่ฝโ€๏ผŒๆ’ๅ›พๆ็ป˜ไบ†ไธ€ๅบง็ ด่ดฅ็š„ๅŸŽๅ ก๏ผŒๅŸŽๅ กไธŠๆŒ‚็€ๆ‚ฒไผค็š„่กจๆƒ…๏ผŒๅ‘จๅ›ดๆ•ฃ่ฝ็€็š‡ๅ† ๏ผŒ่ฑกๅพๆ˜”ๆ—ฅ่ฃ่€€็š„ๆถˆ้€๏ผ›ๆ—่พน็ซ‹ๆœ‰ๆ ‡็‰Œโ€œOLD BRANDโ€๏ผŒ่ƒŒๆ™ฏไธญๅฏ่งๅคงๆœฌ้’Ÿ๏ผŒๆš—็คบไผ ็ปŸๆˆ–ๅކๅฒๅ“็‰Œใ€‚ไธ‹ๆ–นๆ–‡ๅญ—่ฏดๆ˜Ž๏ผšโ€œๆ›พ็ปๆ˜ฏๅธ‚ๅœบ้ข†ๅฏผ่€…๏ผŒไฝ†ๆœช่ƒฝ่ทŸไธŠๆ—ถไปฃๆญฅไผ๏ผŒ้€ๆธ่ขซ้—ๅฟ˜๏ผŒ้ขไธด็”Ÿๅญ˜ๅฑๆœบใ€‚โ€

็ฌฌไบŒ้˜ถๆฎตๆ ‡้ข˜ไธบโ€œ2. ๅˆ›ๆ–ฐไธŽ้‡ๅก‘โ€๏ผŒๆ’ๅ›พๅฑ•็คบๅ››ไบบๅ›ข้˜Ÿๅ›ดๅ่ฎจ่ฎบ๏ผŒๅ…ถไธญไธ€ไบบๆŒ‡ๅ‘็™ฝๆฟไธŠ็š„็ปฟ่‰ฒๅถๅญๆ ‡ๅฟ—่ฎพ่ฎก๏ผŒๅ‘จๅ›ด็Žฏ็ป•้ฝฟ่ฝฎใ€็ฏๆณก๏ผˆไปฃ่กจๅˆ›ๆ„๏ผ‰ๅ’Œๆ ‡็‰Œโ€œNEW IDEASโ€ใ€‚ไธ‹ๆ–นๆ–‡ๅญ—่ฏดๆ˜Ž๏ผšโ€œ่ฟ›่กŒๆทฑๅบฆๅธ‚ๅœบ่ฐƒ็ ”๏ผŒ้‡ๆ–ฐๅฎšไฝๅ“็‰Œ๏ผŒๅผ•ๅ…ฅๅˆ›ๆ–ฐ่ฎพ่ฎกๅ’Œๆ•ฐๅญ—ๅŒ–็ญ–็•ฅ๏ผŒ้‡ๅก‘ๆ ธๅฟƒไปทๅ€ผใ€‚โ€

็ฌฌไธ‰้˜ถๆฎตๆ ‡้ข˜ไธบโ€œ3. ๆˆๅŠŸ็ฟป็›˜โ€๏ผŒๆ’ๅ›พๅŒ…ๅซไธ€ๅชๆตด็ซ้‡็”Ÿ็š„ๅ‡คๅ‡ฐ๏ผŒ่ฑกๅพๆถ…ๆงƒ๏ผ›ๅณไพงๆ˜ฏไธŠๅ‡่ถ‹ๅŠฟ็š„ๆŸฑ็Šถๅ›พ๏ผŒไธ‹ๆ–นๆ˜ฏไธ€ไธชๅธฆๆœ‰็ˆฑๅฟƒ็š„ๅŒ…่ฃน๏ผŒไปฃ่กจไบงๅ“ไบคไป˜๏ผ›ไธ€็พคๆฌขๅ‘ผ็š„ไบบ็พค่กจ่พพๅ–œๆ‚ฆใ€‚ไธ‹ๆ–นๆ–‡ๅญ—่ฏดๆ˜Ž๏ผšโ€œๅ‡ญๅ€Ÿๆ–ฐไบงๅ“ๅ’Œๆ–ฐๅฝข่ฑก้‡่Žทๆถˆ่ดน่€…ไฟกไปป๏ผŒไธš็ปฉ้€†ๅŠฟไธŠๆ‰ฌ๏ผŒ้‡ๆ–ฐ่ตขๅพ—ๅธ‚ๅœบไปฝ้ขใ€‚โ€

็ฌฌๅ››้˜ถๆฎตๆ ‡้ข˜ไธบโ€œ4. ๆœชๆฅๅฑ•ๆœ›โ€๏ผŒๆ’ๅ›พๆ็ป˜ไธ€ๆžš็ซ็ฎญไปŽๅœฐ็ƒ่ฝจ้“ๅ‘ๅฐ„ๅ‡็ฉบ๏ผŒๅ‘จๅ›ดๆœ‰ๆ˜Ÿๆ˜Ÿใ€ไบ‘ๆœตๅ’Œไธ€็‰‡็ปฟๅถ๏ผŒ่ฑกๅพๅฏๆŒ็ปญๅ‘ๅฑ•๏ผ›ไธ‹ๆ–นๆจชๅน…ๅ†™็€โ€œFUTURE READYโ€ใ€‚ไธ‹ๆ–นๆ–‡ๅญ—่ฏดๆ˜Ž๏ผšโ€œๆŒ็ปญๅˆ›ๆ–ฐ๏ผŒๅ…ณๆณจๅฏๆŒ็ปญๅ‘ๅฑ•ๅ’Œ็”จๆˆท่ฟžๆŽฅ๏ผŒ็ซ‹ๅฟ—ๆˆไธบๆ›ดๅ…ทๅฝฑๅ“ๅŠ›็š„ๆœชๆฅๅ“็‰Œใ€‚โ€

ๆ•ดไธชไฟกๆฏๅ›พ้€š่ฟ‡่ง†่ง‰้šๅ–ป๏ผˆๅฆ‚ๅŸŽๅ กใ€ๅ‡คๅ‡ฐใ€็ซ็ฎญ๏ผ‰ๅ’Œๆ•ฐๆฎๅ›พ่กจ๏ผˆๆŸฑ็Šถๅ›พ๏ผ‰็ป“ๅˆ๏ผŒ็”ŸๅŠจ่ฎฒ่ฟฐไบ†ไธ€ไธชๅ“็‰ŒไปŽๅฑๆœบๅˆฐๅคๅ…ด็š„ๅฎŒๆ•ดๆ•…ไบ‹๏ผŒๅผบ่ฐƒๅˆ›ๆ–ฐใ€็”จๆˆทไฟกไปปๅ’ŒๅฏๆŒ็ปญๅ‘ๅฑ•็š„้‡่ฆๆ€งใ€‚ๆ‰€ๆœ‰ๆ–‡ๆœฌๅ‡ไธบ็ฎ€ไฝ“ไธญๆ–‡๏ผŒๆ— ่‹ฑๆ–‡ไปฅๅค–็š„ๅ…ถไป–่ฏญ่จ€ใ€‚
Prompt
The infographic titled "College Entrance Pathway Reforce Comparison" presents a structured comparison of key aspects for prospective students in Guangdong, China, aiming to enter college through a specialized entrance examination. The layout is organized as a multi-column table with four main columns: "Content Item / Evaluation Criteria", "Statistics", "Quotes", and "Key Terms". Each row corresponds to a distinct evaluation criterion or step in the preparation process, with visual icons, text, and data points enhancing clarity.

The infographic uses a clean, minimalist design with black line art icons on a light beige background. Text is primarily in bold sans-serif font, with headings emphasized for readability. Data is encoded using icons (e.g., graduation cap, calendar, books, target, rocket) to visually represent concepts, while numerical values are explicitly labeled for precision.

The first row addresses **Eligibility Criteria**:
- In the "Statistics" column, it features an icon of a person checking a map of Guangdong with the text: "Official Eligibility Requirements Confirm if you qualify to register".
- The "Quotes" column lists three eligible groups with corresponding icons: "Final-Year Guangdong Junior College Student", "Guangdong Resident <2 Years Post Graduation", and "Eligible Retired Military Personnel".
- The "Key Terms" column shows a magnifying glass over a document with the label: "Eligibility Verification".

The second row covers **Exam Structure & Scoring Breakdown**:
- "Statistics" displays icons representing different test types and scores: 100 pts (graduation cap), 200 pts (person at desk), 1000 pts (document with pen), 150 pts (document with pen). Below: "Total 500 points across 4 test papers".
- "Quotes" lists four subject components in document-shaped boxes: "Political Theory (100 pts)", "Major-Aligned Public Subject (100 pts)", "Professional Subject 1 (150 pts)", "Professional Subject 2 (150 pts)".
- "Key Terms" includes a balance scale icon with "Score Distribution".

The third row details the **Official Annual Exam Timeline**:
- "Statistics" contains a horizontal timeline with icons of a calendar and clock, labeled "Annual Key Timeline".
- "Quotes" provides a detailed timeline: Jan: Registration Open โ†’ Jan: Admission Open โ†’ Mid-Mar: Exam Date โ†’ Mid-Apr: Score Release โ†’ May-Jun: Admission Offers.
- "Key Terms" shows a calendar and clock with "Critical Dates".

The next three rows outline a three-step preparation strategy:

**Step 1 - Confirm Target Major & Institution**:
- "Statistics": Icon of a person holding a map with a target, text: "Confirm your target 6 months in advance".
- "Quotes": Two bullet points: "Download official exam syllabi and past professional subject papers from the target institutionโ€™s admission portal" and "Cross-verify that your junior college major meets the target majorโ€™s prerequisite requirements".
- "Key Terms": Clock and books with "Target Selection".

**Step 2 - Public Subject Foundation Building**:
- "Statistics": Icon of a person studying with books and a coffee cup, text: "Complete 3 months of structured public subject study".
- "Quotes": Two bullet points: "Complete 5+ years of past public subject exam papers to identify recurring test points" and "Political Theory allocates 30% of total score to current affairs from the past calendar year".
- "Key Terms": Box with lightbulb and "Core Knowledge".

**Step 3 - Professional Subject Sprint Revision**:
- "Statistics": Icon of a running person with a book and clock, text: "Focus on high-weight professional subjects in the final 2 months".
- "Quotes": Two bullet points: "Practice past professional subject papers from your target institution and review core major textbooks" and "60% of professional subject questions are repeated or adapted from past 3 years of papers for most institutions".
- "Key Terms": Trophy and gears with "Intensive Review".

Red horizontal lines separate the first three criteria from the three-step strategy, while a blue line separates Step 1 from Steps 2 and 3, visually grouping related content. All textual information is preserved exactly as presented, including spelling variations like "Oficial" (likely intended as "Official"). The infographic serves as a strategic roadmap combining official requirements, scoring details, timelines, and actionable preparation steps for candidates.
Prompt
The infographic titled "12-Month Market Performance: US vs. Asia" presents a structured, puzzle-piece-based visual analysis comparing the performance of US and Asian equity markets over a 12-month period. The layout is organized into three main steps, arranged in a central vertical flow with interconnected puzzle pieces, emphasizing a modular, analytical approach to market comparison. The design uses clean black-and-white line art with light blue accents for key sections, icons for visual representation, and clear typography for readability.

**Step 1** (top center) introduces the scope of the analysis. It features an illustration of four people examining charts, symbolizing data analysis. To the right, it defines the market indices being compared:
- **US Markets**: S&P 500, NASDAQ
- **Asian Markets**: Nikkei 225, Hang Seng, KOSPI, CSI 300

It also lists the types of data analyzed:
- Trailing Return (represented by a rising bar chart icon)
- Average Daily Volume (represented by a stacked bar chart icon)
- Top Sector Return (represented by a pie chart icon)

**Step 2** (left side, labeled "Metrics that account for 72% of short-term S&P 500 volatility") focuses on US Market Core Driving Indicators. This section contains icons representing industry (factory), finance (bank building), money (hand holding dollar sign), and labor (worker in hard hat). Below these icons, a light blue banner reads "US Market Core Driving Indicators". Specific metrics are listed with red warning triangle icons:
- CPI YoY: 3.2%
- Federal Funds Rate: 5.25โ€“5.5%
- Non-farm Payrolls: +187k July 2024

**Step 3** (right side, labeled "Metrics that predict 68% of MSCI Asia Ex-Japan 3-month forward returns") focuses on Asian Market Core Leading Indicators. This section includes icons for shipping (container), manufacturing (gears), and calculation (calculator). A light blue banner below reads "Asian Market Core Leading Indicators". Specific metrics are listed:
- Manufacturing PMI: 51.2 (with red warning triangle)
- Q2 Export Growth: +6.8% YoY (with red warning triangle)
- Avg Policy Rate: 3.1% (with information circle icon)

At the bottom center, a large puzzle piece titled "Policy Shifts & Market Volatility Correlation" displays a line graph with two fluctuating lines:
- **US VIX (navy line)** โ€” representing US market volatility
- **Asian Avg Volatility (green line)** โ€” representing average Asian market volatility

Arrows connect the two lines, indicating correlation. Below the graph, key insights are provided with red warning triangles:
- Rate hike impact: +27% US VIX
- Trade policy impact: +34% Asian VIX
- Cross-regional sell-off correlation: 0.68

The overall structure visually represents how US and Asian market performances are driven by distinct but interrelated economic indicators, with a central focus on their volatility dynamics and policy impacts. The use of puzzle pieces metaphorically suggests that these components fit together to form a complete picture of global market trends. The infographic employs consistent iconography, color-coding (red for warnings, blue for core sections), and clear textual labeling to convey complex financial data in an accessible format.

Chart Accuracy

U1-8B-MoT 8B-MoT-Infographic U1-8B-MoT 8B-MoT-Infographic
Prompt
Create an infographic that features a title and a subtitle centered at the top, reading 'Fastest Cuisines to Prepare' and 'Average Ghost Kitchen Handover Time by Item Type (Minutes)' respectively. The main visual is a horizontal grouped bar chart combining a Fast-food neon visual style with checkerboard borders along the edges, featuring a centered legend above the chart area for 'QuickEats' (cyan neon border) and 'DashNow' (orange neon border). To the bottom right of the bar chart, there is a simple illustration of two mopeds waiting for orders. The chart's vertical axis lists four categories, each preceded by a simple icon, while the horizontal axis represents handover time in minutes with numerical labels at 0, 5, 10, 15, and 20, supplemented by dotted vertical gridlines. Each category features a pair of black bars representing the two platforms, with exact values displayed directly inside the right end of each bar. For 'Classic Tacos', QuickEats takes 10.0 minutes while DashNow takes 11.5 minutes. 'Supreme Burritos' require the longest preparation, with 17.5 minutes for QuickEats and 19.0 minutes for DashNow. 'Spicy Nachos' take 9.5 minutes on QuickEats and 10.0 minutes on DashNow. Finally, 'Mini Quesadillas' are the fastest, taking 8.0 minutes for QuickEats and 8.5 minutes for DashNow. The given data is : [{"category": "Classic Tacos", "platform": "QuickEats", "unit": "Minutes", "value": 10.0}, {"category": "Classic Tacos", "platform": "DashNow", "unit": "Minutes", "value": 11.5}, {"category": "Supreme Burritos", "platform": "QuickEats", "unit": "Minutes", "value": 17.5}, {"category": "Supreme Burritos", "platform": "DashNow", "unit": "Minutes", "value": 19.0}, {"category": "Spicy Nachos", "platform": "QuickEats", "unit": "Minutes", "value": 9.5}, {"category": "Spicy Nachos", "platform": "DashNow", "unit": "Minutes", "value": 10.0}, {"category": "Mini Quesadillas", "platform": "QuickEats", "unit": "Minutes", "value": 8.0}, {"category": "Mini Quesadillas", "platform": "DashNow", "unit": "Minutes", "value": 8.5}]
Prompt
Create an infographic that presents a centered title at the top, stating "รœbertaktet vs. Standard-Takt", with the subtitle "Temperaturanstieg bei langen Gaming-Sessions" directly below it. The main visual is a line chart spanning the width of the infographic on a dark background, embodying a Gamer Aesthetic with vibrant RGB neon accents. This chart has a vertical axis on the left labeled with numerical values in increments of 10 from 30ยฐC to 100ยฐC, and a horizontal axis at the bottom with time labels: '0m', '15m', '30m', '45m', '60m', '75m', '90m', '105m', and '120m'. Horizontal grid lines mark each 10ยฐC increment. A horizontal legend is positioned under the subtitle, containing a cyan circular marker and line for "Standard-Takt" and a magenta circular marker and line for "รœbertaktet (+150MHz)". Two data series are plotted as glowing neon lines with hollow circular markers at each data point, accompanied by gradient shading below each line. The cyan "Standard-Takt" line shows a steep rise from 38ยฐC at 0m to 68ยฐC at 15m, followed by a flat plateau reaching 73.5ยฐC at 120m. The magenta "รœbertaktet" line displays a similar initial spike from 42ยฐC to 75ยฐC, but continues with a gradual linear creep up to 93ยฐC at 120m. Spike annotations (callout boxes) point to the final data points on the right, highlighting the peak temperatures: a magenta box reads "Peak: 93ยฐC" and a cyan box reads "Peak: 73.5ยฐC". A stylized thermometer line-art icon is subtly placed in the center of the chart's background. The given data is : [{"profile": "Standard-Takt", "temperature": 38, "time": "0m"}, {"profile": "รœbertaktet", "temperature": 42, "time": "0m"}, {"profile": "Standard-Takt", "temperature": 68, "time": "15m"}, {"profile": "รœbertaktet", "temperature": 75, "time": "15m"}, {"profile": "Standard-Takt", "temperature": 71, "time": "30m"}, {"profile": "รœbertaktet", "temperature": 79, "time": "30m"}, {"profile": "Standard-Takt", "temperature": 72, "time": "45m"}, {"profile": "รœbertaktet", "temperature": 82, "time": "45m"}, {"profile": "Standard-Takt", "temperature": 72.5, "time": "60m"}, {"profile": "รœbertaktet", "temperature": 85, "time": "60m"}, {"profile": "Standard-Takt", "temperature": 73, "time": "75m"}, {"profile": "รœbertaktet", "temperature": 87, "time": "75m"}, {"profile": "Standard-Takt", "temperature": 73, "time": "90m"}, {"profile": "รœbertaktet", "temperature": 89, "time": "90m"}, {"profile": "Standard-Takt", "temperature": 73.5, "time": "105m"}, {"profile": "รœbertaktet", "temperature": 91, "time": "105m"}, {"profile": "Standard-Takt", "temperature": 73.5, "time": "120m"}, {"profile": "รœbertaktet", "temperature": 93, "time": "120m"}]
Prompt
Create an infographic that displays data in a vertical diverging bar chart format. At the top left of the visualization, there is a title: 'Anomalie de l'Atlantique Sud : Dรฉrive magnรฉtique', and a subtitle: 'Vecteurs de dรฉrive vers l'est et l'ouest en kilomรจtres par rapport ร  la ligne de base historique'. In the upper left area below the text, an icon of a compass rose is placed within a magnetic field line curve. The main chart features a horizontal zero-axis line, labeled with a '0' on the far left, representing the historical coordinate baseline. The x-axis at the bottom displays the decades '1980', '1990', '2000', '2010', and '2020', each marked with a small vertical tick. For each decade, a vertical bar extends from the zero-axis, with its corresponding data label positioned directly at the end of the bar. The data shows westward drift represented by blue bars extending below the axis for '1980' with a value of '-15 km' and '1990' with a value of '-32 km'. Eastward drift is represented by red bars extending above the axis for '2000' with a value of '+10 km', '2010' with a value of '+45 km', and '2020' with a value of '+68 km'. The overall visual style mimics a geophysical science journal, utilizing compass red and blue color tones. The given data is : [{"decade": "1980", "drift_km": -15}, {"decade": "1990", "drift_km": -32}, {"decade": "2000", "drift_km": 10}, {"decade": "2010", "drift_km": 45}, {"decade": "2020", "drift_km": 68}]
Prompt
Create an infographic in a corporate report minimalism style with muted corporate grays and blues, featuring a large title, 'Seasonal Fluctuations in 15-Year Mortgages', at the top. Directly below it is a subtitle, 'Historical prepayment velocities showing seasonal housing market trends'. Underneath the subtitle, a horizontal legend identifies two categories with small square icons: 'Spring/Summer Originations' in lighter gray-blue and 'Fall/Winter Originations' in darker gray-blue. The main visual is a multi-line chart in a wide landscape orientation. The vertical axis has numeric labels at 0.0, 5.0, 10.0, 15.0, and 20.0, with horizontal grid lines extending across the plot. The horizontal axis features labels: 'Jan 2018', 'Apr', 'Jul', 'Oct', 'Jan 2019', 'Apr', and 'Jul'. An icon depicting a sleek house silhouette is positioned in the upper left corner of the chart's plotting area. Two distinct lines represent the categories, characterized by cyclical seasonal bumps in the summer months. Both lines have square markers at each data point, with numerical values displayed near them. The lighter line for 'Spring/Summer Originations' plots a value of 8.0 in Jan 2018, rising to 12.5 in Apr, peaking at 16.0 in Jul, dipping to 11.0 in Oct, dropping further to 7.5 in Jan 2019, climbing to 13.0 in Apr, and reaching 17.5 in Jul. The darker line for 'Fall/Winter Originations' mirrors this pattern, starting at 6.5 in Jan 2018, increasing to 9.0 in Apr, hitting 14.5 in Jul, falling to 10.0 in Oct, bottoming out at 6.0 in Jan 2019, rising to 10.5 in Apr, and ending at 15.0 in Jul. The given data is : [{"category": "Spring/Summer Originations", "date": "2018-01", "value": 8.0}, {"category": "Fall/Winter Originations", "date": "2018-01", "value": 6.5}, {"category": "Spring/Summer Originations", "date": "2018-04", "value": 12.5}, {"category": "Fall/Winter Originations", "date": "2018-04", "value": 9.0}, {"category": "Spring/Summer Originations", "date": "2018-07", "value": 16.0}, {"category": "Fall/Winter Originations", "date": "2018-07", "value": 14.5}, {"category": "Spring/Summer Originations", "date": "2018-10", "value": 11.0}, {"category": "Fall/Winter Originations", "date": "2018-10", "value": 10.0}, {"category": "Spring/Summer Originations", "date": "2019-01", "value": 7.5}, {"category": "Fall/Winter Originations", "date": "2019-01", "value": 6.0}, {"category": "Spring/Summer Originations", "date": "2019-04", "value": 13.0}, {"category": "Fall/Winter Originations", "date": "2019-04", "value": 10.5}, {"category": "Spring/Summer Originations", "date": "2019-07", "value": 17.5}, {"category": "Fall/Winter Originations", "date": "2019-07", "value": 15.0}]

Text Rendering Accuracy and Size Appropriateness

U1-8B-MoT 8B-MoT-Infographic U1-8B-MoT 8B-MoT-Infographic
Prompt
่ฏฅไฟกๆฏๅ›พไปฅๆ‰‹็ป˜็ฌ”่ฎฐๆœฌ้ฃŽๆ ผๅ‘ˆ็Žฐ๏ผŒๆ ‡้ข˜ไธบโ€œๅ‰ไผŠๅกๅ“‡ๅธฆไฝ ๆธธ๏ผšๅŠ ๆณฐ็ฝ—ๅฐผไบšๅ›ฝๅฎถ่‰บๆœฏๅš็‰ฉ้ฆ†๏ผˆMNAC๏ผ‰ไธ‰ๅคฉไธคๅคœไธ็ป•่ทฏๆ”ป็•ฅโ€๏ผŒๅ‰ฏๆ ‡้ข˜ไธบโ€œ่กŒ็จ‹่ทฏ็บฟไธŽๆ—ถ้—ดๅฎ‰ๆŽ’๏ผˆไธญๆ–‡ๆธ…ๆ™ฐ็‰ˆ๏ผ‰โ€ใ€‚ๆ•ดไฝ“้‡‡็”จๆš–้ป„่‰ฒ่ฐƒ่ƒŒๆ™ฏ๏ผŒๆญ้…ๆฃ•่‰ฒ่พนๆก†ๅ’Œ่žบๆ—‹่ฃ…่ฎข็บฟ่ฎพ่ฎก๏ผŒ่ฅ้€ ๅ‡บๆธฉ้ฆจๅฏ็ˆฑ็š„ๆ—…่กŒๆ‰‹ๅ†Œๆฐ›ๅ›ดใ€‚ๅ†…ๅฎนๅˆ†ไธบไธ‰ไธชไธป่ฆๅž‚็›ดๅŒบๅ—๏ผŒๅˆ†ๅˆซๅฏนๅบ”DAY 1ใ€DAY 2ใ€DAY 3๏ผŒๆฏไธชๅŒบๅ—้กถ้ƒจๆœ‰ๅœ†ๅฝขๆ—ถ้’Ÿๅ›พๆ ‡ๅ’Œโ€œDAY Xโ€ๆ ‡็ญพ๏ผŒ็ป“ๆž„ๆธ…ๆ™ฐใ€‚

ๆฏไธชๆ—ฅๆœŸๅŒบๅ—ๅ†…ๅ‡ไปฅๆ—ถ้—ด่ฝดๅฝขๅผๅˆ—ๅ‡บๅ…ทไฝ“่กŒ็จ‹๏ผŒไฝฟ็”จๅœ†็‚น่ฟžๆŽฅๆ—ถ้—ด็‚นไธŽๆดปๅŠจๆ่ฟฐ๏ผŒๅณไพง้…ๆœ‰ๅ‰ไผŠๅกๅ“‡็ณปๅˆ—็š„ๅฏ็ˆฑๅก้€šๅฝข่ฑกๆ’็”ป๏ผˆๅฆ‚็™ฝ็†Šใ€่“็Œซใ€ๅ…”ๅญ็ญ‰๏ผ‰๏ผŒๅขžๅผบ่ถฃๅ‘ณๆ€งใ€‚ๆ‰€ๆœ‰ๆ–‡ๅญ—ๅ‡ไธบ็ฎ€ไฝ“ไธญๆ–‡๏ผŒๅญ—ไฝ“ๆธ…ๆ™ฐๆ˜“่ฏป๏ผŒ่ง†่ง‰ๅฑ‚ๆฌกๅˆ†ๆ˜Žใ€‚

---

**DAY 1๏ผšๆŠต่พพไธŽๅˆๆŽข**
- **10:00** ๆŠต่พพๅทดๅกž็ฝ—้‚ฃ๏ผŒ้…’ๅบ—ๅŠž็†ๅ…ฅไฝ (Poble SecๅŒบ) โ€”โ€” ้…ๆœ‰็™ฝ็†Šๆ‹–็€่กŒๆŽ็ฎฑ็š„ๆ’็”ปใ€‚
- **12:00** ๅˆ้ค๏ผš่ฅฟ็ญ็‰™Tapas โ€”โ€” ๆ’็”ปๆœชๆ˜พ็คบใ€‚
- **14:00** ๅ‰ๅพ€่ฅฟ็ญ็‰™ๅนฟๅœบ (Plaza de Espaรฑa)๏ผŒ่ฟœ็œบMNACๅ…จๆ™ฏ โ€”โ€” ้…ๆœ‰่ฅฟ็ญ็‰™ๅนฟๅœบๅปบ็ญ‘ๆ’็”ปๅŠๅœฐๅ›พ็ฎญๅคดใ€‚
- **16:00** ๅ‚่ง‚MNACๅค–้ƒจๅปบ็ญ‘ไธŽๅ‘จๅ›ด่Šฑๅ›ญ โ€”โ€” ้…ๆœ‰่“็Œซๅœจ่Šฑไธ›ไธญ่ทณ่ทƒ็š„ๆ’็”ปใ€‚
- **19:00** ๆฌฃ่ต้ญ”ๅนปๅ–ทๆณ‰่กจๆผ” (Magic Fountain) โ€”โ€” ้…ๆœ‰ๅธฆ้—ชๅ…‰ๆ•ˆๆžœ็š„็™ฝ็†Šๆ’็”ปใ€‚
- **20:30** ๆ™š้ค๏ผš้™„่ฟ‘้คๅŽ… โ€”โ€” ๆ’็”ปๆœชๆ˜พ็คบใ€‚

---

**DAY 2๏ผšMNACๆทฑๅบฆ่‰บๆœฏไน‹ๆ—…**
- **09:30** ๆ—ฉ้ค๏ผŒๆญฅ่กŒ่‡ณMNACๅ…ฅๅฃ โ€”โ€” ้…ๆœ‰็™ฝ็†Šๅƒ้ขๅŒ…็š„ๆ’็”ปใ€‚
- **10:00** ่ฟ›ๅ…ฅMNAC (ๅปบ่ฎฎๆๅ‰่ดญ็ฅจ)๏ผŒๅ‚่ง‚็ฝ—้ฉฌๅผ่‰บๆœฏ้ฆ†่— โ€”โ€” ้…ๆœ‰ๅคๅ…ธๆฒน็”ปๆ’็”ปใ€‚
- **12:30** ้ฆ†ๅ†…็ฎ€้คๆˆ–้™„่ฟ‘ๅˆไผ‘ โ€”โ€” ๆ’็”ปๆœชๆ˜พ็คบใ€‚
- **14:00** ๅ‚่ง‚ๅ“ฅ็‰นๅผใ€ๆ–‡่‰บๅคๅ…ดๅŠๅทดๆด›ๅ…‹่‰บๆœฏ้ฆ†่— โ€”โ€” ้…ๆœ‰่’™ๅจœไธฝ่ŽŽ้ฃŽๆ ผ่‚–ๅƒ็”ปๆ’็”ปๅŠ่“็Œซๅฝข่ฑกใ€‚
- **16:30** ๆŽข็ดข็Žฐไปฃ่‰บๆœฏ้ฆ†่— (ๅŠ ๆณฐ็ฝ—ๅฐผไบš็Žฐไปฃไธปไน‰) โ€”โ€” ้…ๆœ‰ๆŠฝ่ฑก่‰บๆœฏ้ฃŽๆ ผๆ’็”ปใ€‚
- **18:30** ๅ‰ๅพ€MNACๅฑ‹้กถ่ง‚ๆ™ฏๅฐ๏ผŒไฟฏ็žฐๅŸŽๅธ‚ๆ—ฅ่ฝ โ€”โ€” ้…ๆœ‰ๅ…”ๅญไธพๆ‰‹ๆœบๆ‹็…ง็š„ๆ’็”ปใ€‚
- **20:00** ๆ™š้ค๏ผšArenasๅ•†ๅœบ้™„่ฟ‘ โ€”โ€” ๆ’็”ปๆœชๆ˜พ็คบใ€‚

---

**DAY 3๏ผš่’™็‰นๆƒ ๅฅ‡ๅฑฑๅ‘จ่พนไธŽ่ฟ”็จ‹**
- **09:00** ๆ—ฉ้ค๏ผŒ้€€ๆˆฟๅฏ„ๅญ˜่กŒๆŽ โ€”โ€” ๆ’็”ปๆœชๆ˜พ็คบใ€‚
- **10:00** ไน˜ๅ็ผ†่ฝฆๅ‰ๅพ€่’™็‰นๆƒ ๅฅ‡ๅŸŽๅ ก (Montjuรฏc Castle) โ€”โ€” ้…ๆœ‰็ผ†่ฝฆๆ’็”ป๏ผŒๅ†…ๅซไธ‰ๅชๅก้€šๅŠจ็‰ฉใ€‚
- **12:00** ๅ‚่ง‚็ฑณ็ฝ—ๅŸบ้‡‘ไผš (Joan Mirรณ Foundation) โ€”โ€” ้…ๆœ‰็ฑณ็ฝ—้ฃŽๆ ผๆŠฝ่ฑก้›•ๅก‘ๆ’็”ปใ€‚
- **13:30** ๅˆ้ค๏ผšๅฅฅๆž—ๅŒนๅ…‹ๆธฏ้™„่ฟ‘ๆตท้ฒœ้ฅญ โ€”โ€” ๆ’็”ปๆœชๆ˜พ็คบใ€‚
- **15:00** ๆผซๆญฅๅฅฅๆž—ๅŒนๅ…‹ๅ…ฌๅ›ญ โ€”โ€” ๆ’็”ปๆœชๆ˜พ็คบใ€‚
- **16:30** ๆๅ–่กŒๆŽ๏ผŒๅ‰ๅพ€ๆœบๅœบ/่ฝฆ็ซ™่ฟ”็จ‹ โ€”โ€” ้…ๆœ‰ๅผ€ๅฟƒๆŒฅๆ‰‹็š„็™ฝ็†Šๆ’็”ปใ€‚

---

**ๅบ•้ƒจไบค้€š่ดดๅฃซๆ **๏ผš
้…ๆœ‰ๅ…ฌไบค่ฝฆใ€ๅœฐ้“ใ€ๆญฅ่กŒ้ž‹ๅ›พๆ ‡๏ผŒๆ–‡ๅญ—ไธบ๏ผšโ€œไบค้€š่ดดๅฃซ๏ผšๅ–„็”จT-casualไบค้€šๅก๏ผŒๆญฅ่กŒๆŽข็ดขๆ›ดไฝณ๏ผโ€

---

ๆ•ดไฝ“ๅ›พ่กจ็ฑปๅž‹ไธบๆ—ถ้—ดๅบๅˆ—ๆต็จ‹ๅ›พ๏ผŒ้€š่ฟ‡ๅž‚็›ดๅˆ†ๆ ไธŽๆฐดๅนณๆ—ถ้—ด่ฝด็ป“ๅˆ็š„ๆ–นๅผ็ป„็ป‡ไฟกๆฏใ€‚ๆ•ฐๆฎ็ผ–็ ๆ–นๅผๅŒ…ๆ‹ฌๆ—ถ้—ด็‚น๏ผˆ็ฒพ็กฎๅˆฐๅˆ†้’Ÿ๏ผ‰ใ€ๅœฐ็‚นๅ็งฐใ€ๆดปๅŠจๆ่ฟฐๅŠ้…ๅฅ—ๆ’็”ป๏ผŒๆ‰€ๆœ‰ไฟกๆฏๅ‡ๆŒ‰้€ป่พ‘้กบๅบๆŽ’ๅˆ—๏ผŒไพฟไบŽ็”จๆˆทๅฟซ้€Ÿ็†่งฃๅนถๆ‰ง่กŒไธ‰ๅคฉ่กŒ็จ‹่ฎกๅˆ’ใ€‚่ง†่ง‰ๅ…ƒ็ด ไธฐๅฏŒ๏ผŒๅ…ผๅ…ทๅฎž็”จๆ€งๅ’Œ่ถฃๅ‘ณๆ€ง๏ผŒ้€‚ๅˆๆ—…ๆธธๆ”ป็•ฅ็ฑปๅ†…ๅฎนไผ ๆ’ญใ€‚
Prompt
The infographic presents a comprehensive architectural and structural analysis of the Temple of Kom Ombo, an ancient Egyptian temple located on the west bank of the Nile River. The title "TEMPLE OF KOM OMBO" is prominently displayed in a hand-drawn, white-bordered box in the lower-right corner of the image, set against a brown background that mimics sandstone or earth tones. The overall layout is divided into multiple sections: a central photographic image of the temple ruins under a clear blue sky, surrounded by illustrative technical diagrams, annotated floor plans, and textual data blocks, all rendered in white line art and text for high contrast.

The central photograph shows the main hypostyle hall and surrounding structures of the temple, with visitors walking among the columns and courtyards, providing a sense of scale. In the background, the Nile River and palm trees are visible, situating the temple in its natural environment. The ruins are constructed from light-colored sandstone blocks, consistent with the material noted in the text.

In the upper-left quadrant, a 3D axonometric diagram illustrates the overall dimensions of the temple complex: approximately 62 meters by 51 meters, labeled along the axes. Adjacent to this, a list of key structural facts is presented in bullet points:
- TEMPLE AXIS: DOUBLE SANCTUARY FOR SOBEK & HORUS
- OVERALL DIMENSIONS (APPROX. 62M x 51M)
- CONSTRUCTION MATERIAL: SANDSTONE BLOCKS
- COLUMN HEIGHTS: UP TO 12 METERS

Above the central photo, two schematic diagrams illustrate architectural details:
- A top-down view of the hypostyle hall showing 30 columns arranged in a grid, labeled โ€œHYPOSTYLE HALL (30 COLUMNS)โ€ and pointing to โ€œTWO SANCTUARIES.โ€
- A cross-section labeled โ€œPYLON AND HYPOSTYLE SECTION,โ€ which includes a detailed vertical cutaway showing the roofing system supported by columns, with arrows indicating load paths down to foundations.

To the right of the central image, text notes โ€œTWO ENTRANCES SYMBOLIZING DUALITY,โ€ emphasizing the templeโ€™s unique dual dedication. This concept is reinforced in the lower section of the infographic, where a detailed floor plan is overlaid on the brown ground area.

The floor plan, drawn in white lines, is annotated with various features:
- INNER TEMPLE (FOR SOBEK) โ€” marked with a rectangular inner sanctum.
- INNER TEMPLE (FOR HAROERIS) โ€” another distinct inner sanctum, indicating the dual religious function.
- NILOMETER โ€” a structure used to measure the Nileโ€™s water level.
- BIRTH HOUSE (MAMMISHI) โ€” a smaller chamber associated with fertility rituals.
- MUMMIFIED CROCODILE MUSEUM SITE โ€” indicating a location within the temple complex for sacred crocodile mummies.
- TWO ENTRANCES SYMBOLIZING DUALITY โ€” shown as two separate entryways on the plan.

Surrounding the floor plan are inset images of relief carvings, each labeled:
- MEDICAL INSTRUMENT RELIEFS โ€” depicting figures with tools.
- TWO ENTRANCES RELIEFS โ€” showing doorways flanked by deities.
- CALENDAR RELIEFS โ€” illustrating scenes related to timekeeping or agricultural cycles.

Additional annotations point to structural aspects:
- โ€œSTRUCTURAL LOAD PATHS FROM COLUMNS TO FOUNDATIONSโ€ โ€” illustrated with curved arrows tracing the force transfer from columns through the walls to the ground.
- The pylon and hypostyle section diagram also labels โ€œROOFING SYSTEMโ€ and shows how the roof beams rest on column capitals.

All textual content is in English, using a clean, sans-serif font that enhances readability. The visual style blends real photography with technical illustrations and hand-drawn elements, creating an educational and engaging format suitable for tourists, students, or archaeologists. The infographic effectively communicates both the physical characteristics and symbolic significance of the Temple of Kom Ombo, highlighting its duality, engineering, and cultural importance.
Prompt
่ฏฅไฟกๆฏๅ›พไปฅ้ป‘ๆฟ้ฃŽๆ ผ่ฎพ่ฎก๏ผŒๆ ‡้ข˜ไธบโ€œๅœฐๆ–น็‰น่‰ฒ&ๆดปๅŠจๅพฎไฟกๅ…ฌไผ—ๅทๆŽจๅนฟๅ…จๆŒ‡ๅ—โ€๏ผŒๆ•ดไฝ“้‡‡็”จๆ‰‹็ป˜็ฒ‰็ฌ”ๅญ—ๆ•ˆๆžœ๏ผŒ้…ไปฅๅฝฉ่‰ฒๅ›พๆ ‡ๅ’Œ็ฎญๅคด๏ผŒ่ง†่ง‰ไธŠๆจกๆ‹Ÿ็œŸๅฎž้ป‘ๆฟไนฆๅ†™ๅœบๆ™ฏใ€‚ๅ†…ๅฎน็ป“ๆž„ๆธ…ๆ™ฐ๏ผŒๅˆ†ไธบไธ‰ไธชไธป่ฆ้ƒจๅˆ†๏ผŒ้€š่ฟ‡็ฐ่‰ฒๅผงๅฝข็ฎญๅคด่ฟžๆŽฅ๏ผŒๅฝขๆˆ้€ป่พ‘้€’่ฟ›ๅ…ณ็ณป๏ผšไปŽๆŽจๅนฟๅ†…ๅฎนๆ ธๅฟƒๆ–นๅ‘ โ†’ ้ซ˜่ฝฌๅŒ–ๆดปๅŠจๆŽจๅนฟ็Žฉๆณ• โ†’ ๅพฎไฟกๅ…ฌไผ—ๅท็”Ÿๆ€้€‚้…ๆŽจๅนฟๆŠ€ๅทงใ€‚

็ฌฌไธ€้ƒจๅˆ†๏ผšโ€œๆŽจๅนฟๅ†…ๅฎนๆ ธๅฟƒๆ–นๅ‘๏ผšๆทฑๆŒ–ๆœฌๅœฐ็‰น่‰ฒ่ฎฐๅฟ†็‚นโ€๏ผŒๅผบ่ฐƒ้€š่ฟ‡ไธ‰็ฑป้ซ˜ๆต้‡ๆœฌๅœฐๅ†…ๅฎนๅธๅผ•็”จๆˆทๅ…ฑ้ธฃๅนถๅธๅผ•ๅค–ๅœฐๆธธๅฎขๆ‰“ๅก๏ผš
- **ๆœฌๅœŸ็พŽ้ฃŸ**๏ผˆ้ป„่‰ฒๆคญๅœ†ๆ ‡็ญพ๏ผ‰๏ผšๅŒ…ๅซ่€ๅญ—ๅทๅฐๅƒใ€ๅญฃ่Š‚ๆ€ง็‰น่‰ฒ้ฃŸไฟ—ใ€็คพๅŒบ้š่—ๅฐๅบ—ๆŽขๅบ—ๅ†…ๅฎน๏ผŒ้…ๆœ‰็ƒญๆฑค็ข—ไธŽ็ญทๅญๅ›พๆ ‡ใ€‚
- **ไบบๆ–‡้ฃŽ็‰ฉ**๏ผˆๆฃ•่‰ฒๆคญๅœ†ๆ ‡็ญพ๏ผ‰๏ผšๆถต็›–้ž้—ๆŠ€่‰บไผ ๆ‰ฟๆ•…ไบ‹ใ€่€่ก—่€ๅททๅކๅฒใ€ๆœฌๅœฐๅไบบๆ—งๅฑ…ๆŽข่ฎฟๅ†…ๅฎน๏ผŒ้…ๆœ‰ไผ ็ปŸๅปบ็ญ‘ไธŽๅธƒ้ž‹ๅ›พๆ ‡ใ€‚
- **ไพฟๆฐ‘็ฆๅˆฉ**๏ผˆ็ฒ‰่‰ฒๆคญๅœ†ๆ ‡็ญพ๏ผ‰๏ผšๅŒ…ๆ‹ฌๆœฌๅœฐไธ“ๅฑžๆถˆ่ดนๅˆธใ€ๆ™ฏๅŒบๅ…็ฅจๆ”ฟ็ญ–ใ€่Š‚ๅบ†ๆดปๅŠจ้ข„ๅ‘Š็ญ‰ๅ†…ๅฎน๏ผŒ้…ๆœ‰ไผ˜ๆƒ ๅˆธไธŽ็คผ็›’ๅ›พๆ ‡ใ€‚

็ฌฌไบŒ้ƒจๅˆ†๏ผšโ€œ้ซ˜่ฝฌๅŒ–ๆดปๅŠจๆŽจๅนฟ3็งๅฎž็”จ็Žฉๆณ•โ€๏ผŒๆ—จๅœจๆ‹‰ๆปกๅ‚ไธŽ่ฝฌๅŒ–็އ๏ผš
- **่Š‚ๅบ†ๅธ‚้›†็Žฉๆณ•**๏ผˆๆฉ™่‰ฒๆคญๅœ†ๆ ‡็ญพ๏ผ‰๏ผšๅ…ฌไผ—ๅท้ข„็ƒญๅ‘ๆ—ฉ้ธŸ็ฅจ+็•™่จ€ๆŠฝๅ…่ดนๅ‚ไธŽๅ้ข+็Žฐๅœบๆ‰“ๅก่ฟ”็Žฐ๏ผŒ้…ๆœ‰็ฏ็ฌผไธŽๆ‘Šไฝๅ›พๆ ‡ใ€‚
- **้ž้—ไฝ“้ชŒ็Žฉๆณ•**๏ผˆ็ปฟ่‰ฒๆคญๅœ†ๆ ‡็ญพ๏ผ‰๏ผšๅผ€ๆ”พๅ…ฌไผ—ๅทไธ“ๅฑžๆŠฅๅ้€š้“+ๆๅ‰ๅ‘ๅธƒไฝ“้ชŒๅฎ˜้ข„ๅ‘Šๅ†…ๅฎน+ๆดปๅŠจๅŽ็”จๆˆทๆŠ•็จฟ่ฟ”็Žฐ๏ผŒ้…ๆœ‰้™ถ่‰บไธŽ็ป‡ๅธƒๆœบๅ›พๆ ‡ใ€‚
- **ๆถˆ่ดนไฟƒ่ฟ›็Žฉๆณ•**๏ผˆ็ดซ่‰ฒๆคญๅœ†ๆ ‡็ญพ๏ผ‰๏ผš่”ๅˆๆœฌๅœฐๅ•†ๅฎถๆŽจๅ‡บๅ…ฌไผ—ๅทไธ“ๅฑžๆถˆ่ดนๅˆธๅŒ…+ๅˆฐๅบ—ๆ ธ้”€้€ๅฎšๅˆถๅ‘จ่พน๏ผŒ้…ๆœ‰่ดญ็‰ฉ่ข‹ไธŽ้“ถ่กŒๅกๅ›พๆ ‡ใ€‚

็ฌฌไธ‰้ƒจๅˆ†๏ผšโ€œๅพฎไฟกๅ…ฌไผ—ๅท็”Ÿๆ€้€‚้…ๆŽจๅนฟๆŠ€ๅทงโ€๏ผŒ่š็„ฆ้™ไฝŽๆŽจๅนฟๆˆๆœฌ๏ผš
- **ๅ†…ๅฎนๅ‘ˆ็ŽฐๆŠ€ๅทง**๏ผˆ่“่‰ฒๆคญๅœ†ๆ ‡็ญพ๏ผ‰๏ผšๅฐ้ขๅ›พ็”จๆœฌๅœฐๆ ‡ๅฟ—ๆ€งๅปบ็ญ‘/็พŽ้ฃŸๅš่ง†่ง‰็ฌฆๅท๏ผŒ้ฆ–ๅ›พๆ”พ็ฝฎๆดปๅŠจๅ€’่ฎกๆ—ถๆตทๆŠฅ๏ผŒๆ–‡ๆœซๅŠ ไธ€้”ฎๆŠฅๅ่ทณ่ฝฌ้“พๆŽฅ๏ผŒ้…ๆœ‰ๆ‰‹ๆœบๅ›พๆ ‡ใ€‚
- **ๆธ ้“่”ๅŠจๆŠ€ๅทง**๏ผˆ้ป„่‰ฒๆคญๅœ†ๆ ‡็ญพ๏ผ‰๏ผš่ง†้ข‘ๅทๅ‘ๅธƒๆดปๅŠจ่Šฑ็ตฎๆŒ‚่ฝฝๅ…ฌไผ—ๅท้“พๆŽฅ๏ผŒๆœ‹ๅ‹ๅœˆๅนฟๅ‘Šๅฎšๅ‘ๆŽจ้€็ป™ๆœฌๅœฐ18-60ๅฒไบบ็พค๏ผŒๆœฌๅœฐ็คพ็พค่ฝฌๅ‘ๅธฆไธ“ๅฑžๆŠฝๅฅ–็ ๏ผŒ้…ๆœ‰ไธ‰ไบบ็คพไบค็ฝ‘็ปœๅ›พๆ ‡ใ€‚
- **็งๅŸŸ็•™ๅญ˜ๆŠ€ๅทง**๏ผˆ็ปฟ่‰ฒๆคญๅœ†ๆ ‡็ญพ๏ผ‰๏ผšๆดปๅŠจๅ‚ไธŽ่€…ๅผ•ๅฏผๆทปๅŠ ไผไธšๅพฎไฟก๏ผŒๆ‹‰ๅ…ฅๆœฌๅœฐ็ฆๅˆฉ็พคๅŽ็ปญๆŒ็ปญๆŽจ้€ๆดปๅŠจไฟกๆฏ๏ผŒ้…ๆœ‰ๅพฎไฟกๅฏน่ฏๆฐ”ๆณกๅ›พๆ ‡ใ€‚

ๆ•ดไธชไฟกๆฏๅ›พๅธƒๅฑ€ๅ‘ˆๅž‚็›ดๆต็บฟๅž‹๏ผŒๅ„ๆจกๅ—ไน‹้—ดไปฅๆ›ฒ็บฟ็ฎญๅคด่ฟžๆŽฅ๏ผŒๅณไพง็‚น็ผ€ๆœ‰็ฎ€็ฌ”ๅฐไบบๅ’Œๆ„Ÿๅนๅท็ญ‰่ฃ…้ฅฐๅ…ƒ็ด ๏ผŒๅขžๅผบ่ถฃๅ‘ณๆ€งๅ’Œๅฏ่ฏปๆ€งใ€‚ๆ–‡ๅญ—ๆŽ’็‰ˆๅฑ‚ๆฌกๅˆ†ๆ˜Ž๏ผŒไธปๆ ‡้ข˜็™ฝ่‰ฒ็ฒ—ไฝ“๏ผŒๅ‰ฏๆ ‡้ข˜ไธŽๆ ธๅฟƒๆฆ‚ๅฟตไฝฟ็”จ้ป„่‰ฒๆˆ–ๅฝฉ่‰ฒ็ชๅ‡บ๏ผŒ็ป†่Š‚่ฏดๆ˜Žๅˆ™ไธบ็™ฝ่‰ฒๅธธ่ง„ๅญ—ไฝ“ใ€‚ๆ‰€ๆœ‰ๆ–‡ๆœฌๅ‡ไธบไธญๆ–‡๏ผŒๆ— ่‹ฑๆ–‡ๆˆ–ๅ…ถไป–่ฏญ่จ€ๅ†…ๅฎนใ€‚
Prompt
่ฏฅไฟกๆฏๅ›พ้ข˜ไธบใ€Šๅ„ฟ็ซฅ่ฅๅ…ป่กฅๅ……ๅ…จๆŒ‡ๅ—๏ผš็ง‘ๅญฆๅปบ่ฎฎ+ไบงๅ“้€‰่ดญ่ฆ็‚นใ€‹๏ผŒ้‡‡็”จๆผซ็”ป้ฃŽๆ ผ่ฎพ่ฎก๏ผŒ่‰ฒๅฝฉ้ฒœๆ˜Ž๏ผŒไปฅ็บขใ€้ป„ใ€่“ไธบไธป่‰ฒ่ฐƒ๏ผŒๅธƒๅฑ€ๆธ…ๆ™ฐๅˆ†ไธบๅทฆๅณไธคๅคงๆฟๅ—๏ผŒๆฏไธชๆฟๅ—ๅˆ็ป†ๅˆ†ไธบๅคšไธชๆจกๅ—๏ผŒๅ›พๆ–‡ๅนถ่Œ‚ๅœฐๅ‘ˆ็Žฐไบ†ๅ„ฟ็ซฅ่ฅๅ…ป่กฅๅ……็š„็ง‘ๅญฆๆŒ‡ๅฏผไธŽๅฎž็”จๅปบ่ฎฎใ€‚

ๆ•ดไฝ“็ป“ๆž„ๅˆ†ไธบโ€œ็ง‘ๅญฆๅ‚่€ƒๆŒ‡ๅผ•โ€ๅ’Œโ€œๅฎžๆ“ๅบ”็”จๆŒ‡ๅ—โ€ไธคๅคงๆ ธๅฟƒ้ƒจๅˆ†๏ผŒ้€š่ฟ‡ๅก้€šๆ’ๅ›พใ€ๅ›พๆ ‡ใ€็ˆ†็‚ธๅผๅฏน่ฏๆก†ใ€ๆ ‡็ญพ็ญ‰่ง†่ง‰ๅ…ƒ็ด ๅขžๅผบๅฏ่ฏปๆ€งไธŽๅธๅผ•ๅŠ›ใ€‚

---

**็ฌฌไธ€้ƒจๅˆ†๏ผš็ง‘ๅญฆๅ‚่€ƒๆŒ‡ๅผ•**

1. **ๅˆ†้พ„่ฅๅ…ป่กฅๅ……้‡็‚นๆธ…ๅ•**
- ๆ ‡้ข˜๏ผšโ€œๅˆ†้พ„่ฅๅ…ป่กฅๅ……้‡็‚นๆธ…ๅ•โ€๏ผŒๅ‰ฏๆ ‡้ข˜๏ผšโ€œๅˆ†้พ„่กฅ่ฅๅ…ป๏ผŒ็ฒพๅ‡†ๆ›ด้ซ˜ๆ•ˆ๏ผ›ๅฏนๅบ”ๅนด้พ„ๆฎตๆŒ‰้œ€่กฅๅ……๏ผŒ้ฟๅ…่ฟ‡ๅบฆๆ‘„ๅ…ฅโ€
- ๅ†…ๅฎนๆŒ‰ๅนด้พ„ๅˆ†ไธ‰ไธช้˜ถๆฎต๏ผš
- **0-6ๆœˆ้พ„**๏ผšๆฏๆ—ฅๅธธ่ง„่กฅๅ……็ปด็”Ÿ็ด D 400IU๏ผŒ็บฏๆฏไนณๅ–‚ๅ…ปๅฎๅฎ้œ€้ขๅค–่กฅๅ……็ปด็”Ÿ็ด Kใ€‚้…ๅ›พ๏ผšๅฉดๅ„ฟๅคดๅƒใ€Vit Dๆณจๅฐ„ๅ™จใ€Vit K่ƒถๅ›Šใ€‚
- **7ๆœˆ้พ„-3ๅฒ**๏ผš้‡็‚น่กฅๅ……้“๏ผˆFe๏ผ‰ใ€้”Œ๏ผˆZn๏ผ‰ใ€DHA๏ผŒๆฏๆ—ฅ็ปด็”Ÿ็ด D่กฅๅ……้‡็ปดๆŒๅœจ400-600IUใ€‚้…ๅ›พ๏ผšๅนผๅ„ฟๅคดๅƒใ€ๆ”พๅคง้•œ่ง‚ๅฏŸ่ƒถๅ›Šใ€Feๅ’ŒZn็ฌฆๅทใ€‚
- **4-12ๅฒ**๏ผš้‡็‚น่กฅๅ……้’™๏ผˆCa๏ผ‰ใ€็ปด็”Ÿ็ด Aใ€Bๆ—็ปด็”Ÿ็ด ๏ผˆB_B๏ผ‰๏ผŒไฟ่ฏๆฏๆ—ฅ่›‹็™ฝ่ดจๆ‘„ๅ…ฅ้‡่พพๆ ‡ใ€‚้…ๅ›พ๏ผš็”ทๅญฉๅคดๅƒใ€Caๆฐ”ๆณกใ€B_Bๆฐ”ๆณกใ€้ธก่›‹ใ€็‰›ๅฅถ็“ถใ€็œผ็›ๅ›พๆ ‡ใ€‚

2. **่ฅๅ…ป่กฅๅ……ๅŽŸๅˆ™&ๅธธ่ง้ฟๅ‘ๆŒ‡ๅ—**
- ๆ ‡้ข˜๏ผšโ€œ่ฅๅ…ป่กฅๅ……ๅŽŸๅˆ™&ๅธธ่ง้ฟๅ‘ๆŒ‡ๅ—โ€๏ผŒๅ‰ฏๆ ‡้ข˜๏ผšโ€œ็ง‘ๅญฆ่กฅ่ฅๅ…ป๏ผŒ่ฟ™ไบ›ๅ‘่ฆ้ฟๅผ€โ€
- ๅŒ…ๅซไธคไธชๆ ธๅฟƒๅŽŸๅˆ™๏ผš
- **ไผ˜ๅ…ˆ่†ณ้ฃŸๆ‘„ๅ…ฅ**๏ผˆ็ปฟ่‰ฒๅฏนๅ‹พ๏ผ‰๏ผšๆ ธๅฟƒๅŽŸๅˆ™1๏ผšๆ—ฅๅธธๅ‡่กก้ฅฎ้ฃŸๆ˜ฏ่ฅๅ…ปๆ‘„ๅ…ฅ็š„้ฆ–่ฆๆฅๆบ๏ผŒไธๅฏ็”จ่กฅๅ……ๅ‰‚ไปฃๆ›ฟๆญฃๅธธไธ‰้คใ€‚้…ๅ›พ๏ผšๅญฉๅญ็”จ้คๅœบๆ™ฏ๏ผŒ็›˜ไธญๆœ‰่”ฌ่œใ€ๆฐดๆžœใ€่‚‰็ฑปใ€‚
- **ๆŒ‰้œ€้€‚้‡่กฅๅ……**๏ผˆ็บข่‰ฒSTOPๆ ‡ๅฟ—๏ผ‰๏ผšๆ ธๅฟƒๅŽŸๅˆ™2๏ผš่ฅๅ…ป็ด ่กฅๅ……ๅนถ้ž่ถŠๅคš่ถŠๅฅฝ๏ผŒ่ฟ‡้‡ๆ‘„ๅ…ฅ็ปด็”Ÿ็ด Aใ€้’™็ญ‰ๅฏ่ƒฝๅผ•ๅ‘ไธญๆฏ’ๆˆ–ไปฃ่ฐข่ดŸๆ‹…ใ€‚้…ๅ›พ๏ผšๅคš็“ถ่กฅๅ‰‚่ขซ็บข่‰ฒๅ‰ๅท่ฆ†็›–ใ€‚
- **้ฟๅ‘ๆŒ‡ๅ—**๏ผˆ้ป„่‰ฒๆ ‡็ญพ๏ผ‰๏ผš
- โ‘  ไธๅšไฝ“ๆฃ€่ฏ„ไผฐ็›ฒ็›ฎ่ทŸ้ฃŽ่กฅ โŒ
- โ‘ก ๆŠŠ็ฝ‘็บข่กฅๅ‰‚ๅฝ“้›ถ้ฃŸ็ป™ๅญฉๅญๅƒ โŒ
- โ‘ข ็”จๆˆไบบ่กฅๅ……ๅ‰‚ๅ‡้‡็ป™ๅ„ฟ็ซฅๆœ็”จ โŒ
- ้…ๅ›พ๏ผš็บข่‰ฒโ€œ้ฟๅ‘โ€็ˆ†็‚ธๆก†๏ผŒๅธฆๆœ‰้—ช็”ตๆ•ˆๆžœใ€‚

---

**็ฌฌไบŒ้ƒจๅˆ†๏ผšๅฎžๆ“ๅบ”็”จๆŒ‡ๅ—**

1. **ๅ„ฟ็ซฅ่ฅๅ…ป่กฅๅ……ไบงๅ“3ๆญฅ้€‰่ดญๆณ•**
- ๆ ‡้ข˜๏ผšโ€œๅ„ฟ็ซฅ่ฅๅ…ป่กฅๅ……ไบงๅ“3ๆญฅ้€‰่ดญๆณ•โ€๏ผŒๅ‰ฏๆ ‡้ข˜๏ผšโ€œๅ„ฟ็ซฅ่กฅๅ‰‚้€‰่ดญ3ๆญฅๅˆคๆ–ญๆณ•โ€
- ไธ‰ๆญฅๆณ•ๅˆ†ๅˆซ็”ฑๆ”พๅคง้•œๅ›พๆ ‡ๅผ•ๅฏผ๏ผš
- **็œ‹ๅˆ่ง„ๆ ‡่ฏ†**๏ผšไผ˜ๅ…ˆ้€‰ๆ‹ฉๅธฆ่“ๅธฝๆ ‡่ฏ†็š„ไฟๅฅ้ฃŸๅ“๏ผŒๆˆ–ๆœ‰ๅฉดๅนผๅ„ฟ/ๅ„ฟ็ซฅไธ“็”จๅค‡ๆกˆๆ ‡่ฏ†็š„ๆญฃ่ง„ไบงๅ“๏ผŒๆ‹’็ปไธ‰ๆ— ไบงๅ“ใ€‚้…ๅ›พ๏ผšๆ”พๅคง้•œ่š็„ฆโ€œ่“ๅธฝโ€ๆ ‡ๅฟ—ใ€‚
- **็œ‹้…ๆ–™ๆˆๅˆ†**๏ผšไผ˜ๅ…ˆ้€‰ๆ‹ฉๆ— ้ขๅค–ๆทปๅŠ ่”—็ณ–ใ€้ฆ™็ฒพใ€ไบบๅทฅ่‰ฒ็ด ใ€้˜ฒ่…ๅ‰‚็š„ไบงๅ“๏ผŒ่‡ดๆ•ๅŽŸๆ ‡ๆณจๆธ…ๆ™ฐๆ˜Ž็กฎใ€‚้…ๅ›พ๏ผšๆ–‡ไปถไธŠ่ดดๆœ‰โ€œๆ— ๆทปๅŠ โ€ๅฐ็ซ ๏ผŒ็ปฟ่‰ฒๅฏนๅ‹พใ€‚
- **็œ‹้€‚้…ๅนด้พ„**๏ผš้€‰ๆ‹ฉๆ ‡ๆณจๅฏนๅบ”้€‚็”จๅนด้พ„ๆฎต็š„ๅ„ฟ็ซฅไธ“็”จไบงๅ“๏ผŒไธ่ฆ่‡ช่กŒๅฐ†ๆˆไบบ่กฅๅ……ๅ‰‚ๅ‡้‡็ป™ๅญฉๅญๆœ็”จใ€‚้…ๅ›พ๏ผš่ฏ็“ถๆ ‡็ญพไธŠโ€œๅนด้พ„โ€่ขซ็บขๅœˆ็ชๅ‡บใ€‚

2. **ๅธธ่งๅ„ฟ็ซฅ่กฅๅ‰‚้€‚็”จๅœบๆ™ฏๅฏน็…ง่กจ**
- ๆ ‡้ข˜๏ผšโ€œๅธธ่งๅ„ฟ็ซฅ่กฅๅ‰‚้€‚็”จๅœบๆ™ฏๅฏน็…ง่กจโ€
- ่กจๆ ผๅฝขๅผ๏ผŒไธคๅˆ—๏ผšๅทฆไพงโ€œ่กฅๅ‰‚็ฑปๅž‹โ€๏ผŒๅณไพงโ€œ้€‚็”จๅœบๆ™ฏโ€๏ผŒ่ƒŒๆ™ฏ่‰ฒไบคๆ›ฟไธบ็บขใ€่“ใ€‚
- ๅ…ทไฝ“ๅ†…ๅฎน๏ผš
- **็ปด็”Ÿ็ด Dๆปดๅ‰‚** โ†’ ๅ…จๅนด้พ„ๆฎตๅ„ฟ็ซฅๆ—ฅๅธธๅธธ่ง„่กฅๅ……๏ผŒ้ข„้˜ฒไฝๅป็—…ใ€ไฟƒ่ฟ›้’™ๅธๆ”ถใ€‚้…ๅ›พ๏ผšๆปด็ฎก็“ถใ€้ชจๅคดๅ›พๆ ‡ใ€‚
- **้“ๅ‰‚** โ†’ ไฝ“ๆฃ€็กฎ่ฏŠ็ผบ้“ๆ€ง่ดซ่ก€๏ผŒๆˆ–ๆ—ฅๅธธ็บข่‚‰ใ€ๅŠจ็‰ฉ่‚่„ๆ‘„ๅ…ฅไธ่ถณ็š„ๅ„ฟ็ซฅใ€‚้…ๅ›พ๏ผšๆปด็ฎก็“ถใ€ๅ„ฟ็ซฅๅคดๅƒใ€‚
- **DHA่—ปๆฒน** โ†’ ๆ—ฅๅธธๆทฑๆตท้ฑผๆ‘„ๅ…ฅไธ่ถณ็š„ๅ„ฟ็ซฅ๏ผŒ่พ…ๅŠฉไฟƒ่ฟ›่ง†็ฝ‘่†œๅ’Œๅคง่„‘ๅ‘่‚ฒใ€‚้…ๅ›พ๏ผš้ฑผๅฝข่ƒถๅ›Šใ€ๅคง่„‘ไธŽ็œผ็›ๅ›พๆ ‡ใ€‚
- **้’™ๅ‰‚** โ†’ ๆ—ฅๅธธๅฅถ้‡ไธ่ถณใ€่บซ้ซ˜ๅขž้•ฟๅ็ผ“๏ผŒ็ปไฝ“ๆฃ€็กฎ่ฎค็ผบ้’™็š„ๅ„ฟ็ซฅใ€‚้…ๅ›พ๏ผš็™ฝ่‰ฒ่ฏ็‰‡ใ€ๅ„ฟ็ซฅๆต‹้‡่บซ้ซ˜ๅ›พใ€‚

---

**่ง†่ง‰ไธŽๆŽ’็‰ˆ็‰นๅพ๏ผš**
- ๆ•ดไฝ“้‡‡็”จ็ฝ‘ๆ ผๅŒ–ๅธƒๅฑ€๏ผŒๅ››ไธชไธป่ฆๆจกๅ—ๅˆ†ๅธƒๅœจ2x2็š„่ฑก้™ไธญใ€‚
- ไฝฟ็”จๅคง้‡ๆผซ็”ปๅ…ƒ็ด ๏ผšๅฆ‚็ˆ†็‚ธๆก†ใ€ๅฏน่ฏๆฐ”ๆณกใ€็ฎญๅคดใ€ๆ„Ÿๅนๅทใ€็ฆๆญข็ฌฆๅท็ญ‰ใ€‚
- ๅ›พๆ ‡็ณป็ปŸไธฐๅฏŒ๏ผšVit Dใ€Feใ€Znใ€Caใ€B_Bใ€่“ๅธฝใ€ๆ— ๆทปๅŠ ใ€ๅนด้พ„ใ€STOP็ญ‰ๅ‡ๆœ‰ไธ“ๅฑžๅ›พๅฝขๆ ‡่ฏ†ใ€‚
- ๅญ—ไฝ“ๅŠ ็ฒ—ใ€้˜ดๅฝฑใ€่พนๆก†ๅผบ่ฐƒๅ…ณ้”ฎไฟกๆฏ๏ผŒๅฆ‚ๆ ‡้ข˜ใ€ๆ•ฐๅญ—ใ€่ญฆ็คบ่ฏญใ€‚
- ่‰ฒๅฝฉ็ผ–็ ๆ˜Ž็กฎ๏ผš้ป„่‰ฒ็”จไบŽๆ็คบ้‡็‚น๏ผŒ่“่‰ฒ็”จไบŽ่ฏดๆ˜Žๆญฅ้ชค๏ผŒ็บข่‰ฒ็”จไบŽ่ญฆ็คบๆˆ–็ฆๆญขใ€‚

่ฏฅไฟกๆฏๅ›พๅ†…ๅฎนๅ…จ้ข๏ผŒ้€ป่พ‘ๆธ…ๆ™ฐ๏ผŒๅ…ผๅ…ท็ง‘ๅญฆๆ€งๅ’Œๅฎž็”จๆ€ง๏ผŒ้€‚ๅˆๅฎถ้•ฟๅฟซ้€ŸๆŽŒๆกๅ„ฟ็ซฅ่ฅๅ…ป่กฅๅ……็š„ๆ ธๅฟƒ็Ÿฅ่ฏ†ไธŽ้€‰่ดญๆŠ€ๅทงใ€‚

Paper Rendering Quality

U1-8B-MoT 8B-MoT-Infographic U1-8B-MoT 8B-MoT-Infographic
Prompt
[typesetting]

The page is laid out with two tables at the top, followed by a two-column text layout. The tables span the full width of the text area. The text includes a section heading.

[paragraphs]

the TOPIC MODELER, the GENDER SEGMENTER, and an OTHER module (transcript length and duration). We test for a linear relationship between each pair of variables: $H_O : r = 0$, $H_A : r \neq 0$, where $H_O$ is the origi-nal hypothesis, $H_A$ is the alternate hypothesis, and $r$ is the Pearsonโ€™s correlation coefficient. We follow Reddy et al. (2021) and Yang et al. (2019) and apply a Bonferroni cor-rection to our $\alpha$ value of $0.05$, setting $\alpha = 0.05/z$, where $z = \binom{124}{2} = 7,626$ for LDA, representing the number of feature relationships we consider. Hence, we reject $H_O$ in favor of $H_A$ if $p \leq \alpha$. Given the largeness of $z$, our $\alpha$ value becomes small, making our criteria for significance strict and thus suitable for investigating our research ques-tions. Furthermore, we filter our correlations $r$, such that $\Vert r\Vert > 0.1$ for our LDA experiments, and $\Vert r\Vert > 0.05$ for our BERTopic experiments (due to the smaller sample size of 10,000 podcasts, and fewer samples may have higher vari-ance). Our results focus on a selection of these significant correlations; the full results are available on the project web-site: https://www.gendered-discourse.net/extended-results.

### RQ0: How Are Women and Menโ€™s Discourse Different?

Using GDCF, our Gendered Discourse Correlation Frame-work shown in Figure 2, we then analyze significant corre-lations between between the gender features from the GEN-DER SEGMENTER module (Doukhan et al. 2018a), and the topic features from the TOPIC MODELER module (Blei, Ng, and Jordan 2003). We use the discourse topics to automati-cally form gendered discourse word lists via their significant correlations.

Starting with the first row of Table 1, we see that Topic 3โ€™s word list returned by LDA with Non-Contextual Embed-dings (Bag-Of-Words) (via the TOPIC MODELER module) contains the words women, woman, men, baby, pregnant, girls, men, doctor, health, birth (in descending weighted or-der). Based on this word list, we manually interpret this topic as being a content topic, specifically about pregnancy, as noted in the column โ€œTopic N Categories.โ€ Then, we look to the gender correlations in the columns โ€œGenderโ€ and โ€œ$r$,โ€ and see that $r(\text{Topic 3, Women}) = +0.15$ and $r(\text{Topic 3, Men}) = -0.14$. This indicates that the topic of pregnancy positively correlates with women (identified via the GENDER SEGMENTER module), and negatively corre-lates with men. Therefore, we associate Topic 3 (Content - Pregnancy) with Women, as noted in the โ€œTopic N Genderโ€ column. Similarly, we make these associations in the โ€œTopic N Genderโ€ column for Topics 10, 49, and 71.

Next, we focus on the Topic 54 row. This topic is inter-preted using the word list get, like, know, right, people, go-ing, podcast, make, want, one. This word list does not refer to any content, hence, we manually interpret this topic as being a discourse topic. Moving to the gender correlations, we see that $r(\text{Topic 54, Women}) = \emptyset$ and $r(\text{Topic 3, Men}) = +0.12$. The reason for $r(\text{Topic 54, Women}) = \emptyset$ is because the correlation between the features Topic 54 and Women did not come back as significant. However, due to the positive correlation of $0.12$ for Topic 3 and Men, we manually as-sociate Topic 3 with Men in the โ€œTopic N Genderโ€ column.

[tables]

Table 1: LDA with Non-Contextual Embeddings (Bag-Of-Words): The complete set of significant correlations between gender features and topic features โ€“ both content topics and discourse topics. Based on $r$, the Topic N Gender forms the gendered (discourse) word lists via Topics 54 and 60 (the masculine word lists) and Topic 62 (the feminine word list).

| Topic N | Gender | $r$ | Topic N Word List | Topic N Categories | Topic N Gender |
|---|---|---|---|---|---|
| Topic 3 | Women
Men | 0.15
-0.14 | women, woman, men, baby, pregnant, girls, men, doctor, health, birth | Content - Pregnancy | Women |
| Topic 10 | Women
Men | 0.10
-0.12 | energy, body, feel, mind, space, yoga, love, beautiful, feeling, meditation | Content - Yoga | Women |
| Topic 49 | Women
Men | -0.21
0.17 | game, know, think, team, going, mean, play, year, one, good | Content - Sports | Men |
| Topic 71 | Women
Men | 0.14
-0.14 | christmas, sex, girl, hair, love, get, date, girls, let, wear | Content - Dating | Women |
| Topic 54 | Women
Men | โ€“
0.12 | get, like, know, right, people, going, podcast, make, want, one | Discourse | Men |
| Topic 60 | Women
Men | -0.27
0.20 | going, know, think, get, got, one, really, good, well, yeah | Discourse | Men |
| Topic 62 | Women
Men | 0.33
-0.28 | like, know, really, going, people, want, think, get, things, life | Discourse | Women |

Table 2: BERTopic with Contextual Embeddings (BERT, ChatGPT, Llama): The complete set of significant correlations between gender features and topic features for discourse topics only (content topics are omitted).

| Topic N | Gender | $r$ | Topic N Word List | Topic N Categories | Topic N Gender |
|---|---|---|---|---|---|
| Topic 0 | Women
Men | -0.08
0.10 | like, yeah, know, oh, right, podcast, got, going, think, really | Discourse | Men |
| Topic 2 | Women
Men | 0.08
-0.08 | life, know, things, really, people, feel, like, want, love, going | Discourse | Women |
| Topic 5 | Women
Men | 0.08
โ€“ | like, know, think, yeah, episode, really, going, anchor, kind, right | Discourse | Women |
Prompt
[typesetting]

The page is a standard academic paper layout with a single column. The text is justified and divided into sections and subsections, indicated by numbered headings. Important terms at the beginning of some paragraphs are bolded. A horizontal rule separates the header from the main content, and another rule separates the main content from the footnote at the bottom.

[paragraphs]

Preprint Version.

**Figureโ€“Table Integration.** In addition to textual refinement, we extend the refinement process to include multimodal elements, to further enhance readability. For each section, the model first generates visualization requirements, such as tables with structured comparisons or figures with explanatory diagrams, together with natural language descriptions. Based on these descriptions, candidate figures and tables are synthesized. The compiled outputs are then fed back to an LLM for quality assessment, enabling automatic detection of issues such as oversized layouts or unreadable text. The LLM provides corrective suggestions, which are applied to improve the final visualizations. Finally, the text is refined again to ensure that all generated figures and tables are properly referenced within the survey.

# 4 EXPERIMENTS

## 4.1 EXPERIMENTAL SETTINGS

**Implementation Details.** Following Wang et al. (2024b), we adopt **GPT-4o-mini** as our genera-tion model for its balance of responsiveness and cost. Our retrieval database contains 680K computer science papers from arXiv, with PDFs converted into structured Markdown using MinerU (Wang et al., 2024a) for consistent formatting. The details of the retrieval process are provided in App. A.1. In outline generation, the system consults 1000โ€“1200 papers, with a maximum of 8 sections. For section drafting, each subsection retrieves up to 60 additional relevant papers, combined with those linked during outline generation. Finally, we apply two iterations of the review-and-refine loop to enhance coherence across sections and improve overall readability. Illustrative outputs compared with AutoSurvey are provided in App. A.8.

**Baselines.** We compare IterSurvey with a set of baselines, ranging from simple retrieval-augmented generation (Naive RAG), which directly drafts from retrieved documents, to more ad-vanced state-of-the-art systems. Specifically, we evaluate against AutoSurvey (Wang et al., 2024b), the first systematic framework for this task; SurveyForge (Yan et al., 2025), which combines heuris-tic outline generation based on the logical structures of human-written surveys with a memory-driven scholar navigation agent for high-quality retrieval; and SurveyGo (Wang et al., 2025), which em-ploys the LLMร—MapReduce-V2 algorithm to address the long-context challenge. We also compare with SurveyX (Liang et al., 2025), which introduces an Attribute Tree-based outlining mechanism; however, due to access restrictions, we include SurveyX only in arena experiments. All methods are evaluated on the same retrieval database with generation hyperparameters aligned to their original settings for fairness.

## 4.2 AUTOMATIC EVALUATION RESULTS

**Evaluation Setup.** We employ multiple complementary protocols to evaluate the quality of gen-erated surveys. On the 20-topic suite from Wang et al. (2024b), we adopt multi-dimensional scoring with LLM-as-a-judge. Content quality is assessed along three dimensions: coverage, structure, and relevance followed from Wang et al. (2024b). Besides, citation quality is evaluated using the NLI-based protocol of Gao et al. (2023), reporting both recall and precision: _Citation Recall_ measures whether all statements in the generated text are fully supported by the cited passages, while _Citation Precision_ identifies irrelevant citations to ensure that references are pertinent and directly support the claims. To improve scoring stability and reliability, prompts are standardized and judges must pro-vide a rationale before assigning scores. For additional robustness, we aggregate outputs from three judge models: GPT-4o, Claude-3.5-Haiku, and GLM-4.5V.1 Full prompts are provided in App. A.7.

**Results.** The results on the 20 topics from Wang et al. (2024b) are reported in Tab. 1. Statistical significance was confirmed via paired t-tests, indicating that IterSurvey consistently outperforms baseline models ($p < 0.05$). We summarize the main observations below.

- **Overall superiority.** IterSurvey consistently outperforms all baselines across both content and citation quality, achieving the highest overall average score (4.75). This demonstrates that the proposed framework is effective and robust across multiple evaluation dimensions.

[page_number]

6

[footnotes]

1Specifically, we use `chatgpt-4o-latest`, `claude-3-5-haiku-20241022`, and `glm-4.5v`.
Prompt
[typesetting]

This is a single-column page containing mostly text, structured with section headings and bold inline subheadings. URLs are formatted in a monospaced font and hyperlinked.

[paragraphs]

# A Image generation models

This section details the two diffusion image generation models used in this work, namely Stable Diffusion 1.4 and 1.5.

**Stable Diffusion 1.4** The Stable Diffusion model is a text-conditioned image generator model that combines an autoencoder with a diffusion model to create a latent diffusion model. The autoencoder encodes images into latent representations with a reduced dimensionality when compared to the input image, reducing the computational needs during the training phase. Text prompts, on the other hand, are encoded using a text encoder and are then cross-attended by the UNet backbone of the latent diffusion model. Finally, the loss is computed using a reconstruction objective between the noise added to the latent representation and the prediction made by the UNet.
Stable Diffusion 1.4 (https://huggingface.co/CompVis/stable-diffusion-v1-4) had several rounds of training on the LAION dataset (https://laion.ai/), with each round changing the input image dimension, aesthetic score, and the probability of dropping the text-conditioning to improve classifier-free guidance.

**Stable Diffusion 1.5** SD 1.5, in turn, has the same architecture and even the same starting point as 1.4, with the difference being how long the model was fine-tuned on top of SD 1.2. The 1.4 version is fine-tuned for 225 thousand steps at resolution 512x512 on โ€œlaion-aesthetics v2 5+โ€ with a 10% probability of dropping the text-conditioning, and version 1.5 for 595 thousand steps.
As demonstrated in Section D Stable Diffusion 1.4 has better performance than 1.5 in our approach, therefore, we will adopt SD 1.4 for most of the experiments in this paper.

# B Large language models

Here we give additional details on the large language models that we used in our experiments.

**Gemma** (Mesnard et al., 2024), trained on a diverse 6 Trillion token dataset comprising web documents, code and mathematical texts. We resorted to the 7 Billion parameter instruction-tuned decoder-only model, named _gemma-7b-it_ (https://huggingface.co/google/gemma-7b-it). This model uses a chat template, which we employ during inference.

**Llama 2** (Touvron et al., 2023), of which we used the 7 Billion parameter, pre-trained-only model, _Llama-2-7b_ (https://huggingface.co/meta-llama/Llama-2-7b-hf). This model was trained with a mix of publicly available data totalling 2 Trillion tokens. While its chat versions employ supervised fine-tuning and reinforcement learning with human feedback for alignment with human preferences in helpfulness and safety, the pre-trained-only model does not. This results in a less constrained model, but it may also cause it to disperse from the task at hand. Since this model is a pre-trained-only no chat template is needed.

**Mistral** (Jiang et al., 2023) fine-tuned on various HuggingFace instruction datasets. We resorted to the 7 Billion _Mistral-7B-Instruct-v0.2_ model (https://huggingface.co/mistralai/ Mistral-7B-Instruct-v0.2) and used the respective chat template during inference.

**Phi-2** (Gunasekar et al., 2023) is a compact 2.7 Billion model (https://huggingface.co/microsoft/ phi-2). Despite its size, it offers a competitive performance with respect to models several times its size. It was trained on 250 Billion tokens, obtained through a combination of NLP synthetic data created by GPT-3.5 and filtered web data from Falcon RefinedWeb and SlimPajama, which was assessed by GPT-4. This model was not fine-tuned through reinforcement learning from human feedback and does not have guardrails.

**Model ranking**
A ranking of these models in terms of their performance can be found in the HuggingFace leaderboard (https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) which assesses several LLMs that are trained under the same criteria and tested on the same benchmarks, including reasoning
Prompt
[typesetting]

The page is a standard academic paper layout, likely from a preprint server like arXiv. It features a title, author list with affiliations, an abstract, and the beginning of the "Introduction" section. A preprint notification ("Preprint. Under review.") is present at the bottom left. The text on the left margin ("arXiv:2502.01522v2 [cs.CV] 30 May 2025") is a vertical stamp typical of arXiv submissions.

[paragraphs]

arXiv:2502.01522v2 [cs.CV] 30 May 2025

# Unpaired Deblurring via Decoupled Diffusion Model

**Junhao Cheng**$^1$, **Wei-Ting Chen**$^2$, **Xi Lu**$^1$, **Ming-Hsuan Yang**$^3$
$^1$Sun Yat-sen University $^2$ Microsoft $^3$ University of California, Merced
https://github.com/donahowe/UID-Diff

**Abstract**

Generative diffusion models trained on large-scale datasets have achieved remarkable progress in image synthesis. In favor of their ability to supplement missing details and generate aesthetically pleasing contents, recent works have applied them to image deblurring via training an adapter on blurry-sharp image pairs to provide structural conditions for restoration. However, acquiring substantial amounts of realistic paired data is challenging and costly in real-world scenarios. On the other hand, relying solely on synthetic data often results in overfitting, leading to unsatisfactory performance when confronted with unseen blur patterns. To tackle this issue, we propose UID-Diff, a generative-diffusion-based model designed to enhance deblurring performance on unknown domains by decoupling structural features and blur patterns through joint training on three specially designed tasks. We employ two Q-Formers as structural features and blur patterns extractors separately. The features extracted by them will be used for the supervised deblurring task on synthetic data and the unsupervised blur-transfer task by leveraging unpaired blurred images from the target domain simultaneously. We further introduce a reconstruction task to make the structural features and blur patterns complementary. This blur-decoupled learning process enhances the generalization capabilities of UID-Diff when encountering unknown blur patterns. Experiments on real-world datasets demonstrate that UID-Diff outperforms existing state-of-the-art methods in blur removal and structural preservation in various challenging scenarios.

# 1 Introduction

Dynamic blur occurs when the camera and subject move relative to each other during the exposure time, resulting in a smeared and blurred image. Deblurring, the process of removing the blur pattern while preserving the underlying structure of degraded images, is essential for restoring high-quality images for human perception and low-level computer vision applications.

With the rapid advancement of photographic technology, a wide range of imaging devices are now employed to capture images in real-world scenarios. Due to their diverse lenses and structural designs, these devices may produce distinct blur patterns [1, 2, 3]. This diversity makes it challenging to develop an all-in-one method for deblurring images from arbitrary and varied sources. Consequently, focusing on deblurring algorithms tailored to specific domains has become increasingly significant.

As deep learning has advanced in recent years, existing deblurring models predominantly build on data-driven approaches that employ neural networks trained via supervised learning on synthetic paired data. Existing works have made efforts to develop deblurring models upon CNN [4, 5], Transformer [6, 7], and GAN [8, 9]. Recently, a new wave of research [10, 11, 12] has begun to investigate the integration of pre-trained generative diffusion models [13], such as Stable Diffusion (SD) [14], with an adapter designed to provide structural guidance for deblurring. These approaches aim to harness the generative capabilities of diffusion models to supplement missing details and generate aesthetically pleasing outputs. However, since paired blurry-sharp training data is limited in

[footnotes]

Preprint. Under review.

Overall Layout and Content Understanding

U1-8B-MoT 8B-MoT-Infographic U1-8B-MoT 8B-MoT-Infographic
Prompt
่ฏฅไฟกๆฏๅ›พไปฅโ€œๆ›ฒๅฐผๅธ็‰นโ€ไธบๆ ‡้ข˜๏ผŒๆ•ดไฝ“้‡‡็”จๆต…่“็™ฝ่‰ฒ่ฐƒ๏ผŒๅธƒๅฑ€ๆธ…ๆ™ฐ๏ผŒๅˆ†ไธบๅคšไธชๆจกๅ—ๅŒ–ๅŒบๅŸŸ๏ผŒๅ›ด็ป•ไธญๅคฎ็š„้€ๆ˜Ž่ƒถๅ›Šๅ›พๅƒๅฑ•ๅผ€ใ€‚ๅณไธŠ่ง’ๅฑ•็คบๆ›ฒๅฐผๅธ็‰น็š„ๅŒ–ๅญฆ็ป“ๆž„ๅผๅŠๅ…ถๅˆ†ๅญๅผ Cโ‚โ‚ˆHโ‚โ‚‡NOโ‚„ใ€‚

**1. ๆดปๆ€งๆˆๅˆ†ๆ•ฐๆฎ๏ผˆๅทฆไธŠ๏ผ‰**
- ไปฅ็Žฏๅฝขๅ›พๅฝขๅผๅฑ•็คบๆˆๅˆ†ๆฏ”ไพ‹๏ผš
- ๆ›ฒๅฐผๅธ็‰น >98%
- ่พ…ๆ–™ <2%
- ๅ›พไธ‹ๆ–นๆ ‡ๆณจ๏ผšโ€œ็บฏๅบฆ้ซ˜๏ผŒไธดๅบŠ็บงๆ ‡ๅ‡†โ€

**2. ้€‚ๅบ”็—‡๏ผˆๅณไธŠ๏ผ‰**
- ้€š่ฟ‡ไธ‰ไธชๅ›พๆ ‡ๅŠๆ–‡ๅญ—่ฏดๆ˜Ž๏ผš
- ้ผป้ƒจๅ›พๆ ‡๏ผš่ฟ‡ๆ•ๆ€ง็–พ็—…
- ็šฎ่‚ค็บน็†ๅ›พๆ ‡๏ผš็บค็ปดๅŒ–
- ็–ค็—•ๅ›พๆ ‡๏ผš็˜ข็—•็–™็˜ฉ

**3. ๅ‰‚้‡็Ÿฉ้˜ต๏ผˆไธญๅทฆ๏ผ‰**
- ่กจๆ ผๅฝขๅผ๏ผŒๅŒ…ๅซไธคๅˆ—๏ผšโ€œๅฃๆœโ€ๅ’Œโ€œ้ข‘็އโ€
- ๆˆไบบ๏ผš100mg / ๆฌก๏ผ›้ข‘็އ๏ผš1-3 ๆฌก / ๅคฉ
- ๅ„ฟ็ซฅ๏ผšๅ’จ่ฏขๅŒป็”Ÿ๏ผ›้ข‘็އ๏ผš้ตๅŒปๅ˜ฑ

**4. ่ฏไปฃๅŠจๅŠ›ๅญฆๆ—ถ้—ด่ฝด๏ผˆไธญๅณ๏ผ‰**
- ๆŠ˜็บฟๅ›พ๏ผŒๆจช่ฝดไธบๆ—ถ้—ด๏ผˆ0h ่‡ณ 24h๏ผ‰๏ผŒ็บต่ฝดไธบๆต“ๅบฆ๏ผˆๆ— ๅˆปๅบฆ๏ผ‰
- 0h๏ผšๅธๆ”ถๅผ€ๅง‹๏ผˆๆฐดๆปดๅ›พๆ ‡๏ผ‰
- 1-2h๏ผšๅณฐๅ€ผๆต“ๅบฆ๏ผˆๅฑฑๅณฐๅ›พๆ ‡๏ผ‰
- 4-6h๏ผšๅˆ†ๅธƒ/ไปฃ่ฐข๏ผˆๅพช็Žฏ็ฎญๅคดๅ›พๆ ‡๏ผ‰
- 24h๏ผšๆŽ’ๆณ„๏ผˆๅžƒๅœพๆกถๅ›พๆ ‡๏ผ‰
- ๅ›พไธญๆ ‡ๆณจๅŠ่กฐๆœŸ โ‰ˆ 5-8h

**5. ่ญฆๅ‘Š็ฝ‘ๆ ผ๏ผˆๅทฆไธ‹๏ผ‰**
- ๅˆ†ไธบๅ››ไธช่ฑก้™๏ผŒๆฏไธช้…ๆœ‰ๅ›พๆ ‡ๅ’Œๆ–‡ๅญ—๏ผš
- ็›ธไบ’ไฝœ็”จ๏ผšCYP้…ถๆŠ‘ๅˆถๅ‰‚/่ฏฑๅฏผๅ‰‚๏ผˆ้ฝฟ่ฝฎๅ›พๆ ‡๏ผ‰
- ๅ‰ฏไฝœ็”จ๏ผš่ƒƒ่‚ ้“ไธ้€‚๏ผŒ็šฎ็–น๏ผˆ่ƒƒๅ›พๆ ‡๏ผ‰
- ่‚ๅŠŸ่ƒฝ๏ผšๅฎšๆœŸ็›‘ๆต‹๏ผˆ่‚่„ๅ›พๆ ‡๏ผ‰
- ่‚พๅŠŸ่ƒฝ๏ผšๆ…Ž็”จ๏ผˆ่‚พ่„ๅ›พๆ ‡๏ผ‰

**6. ๆ‚ฃ่€…้€‚็”จๆ€ง๏ผˆไธญไธ‹๏ผ‰**
- ไธคไธชๅ›พๆ ‡็ป„ๅˆ๏ผš
- ๆˆไบบ๏ผšไบบ็‰ฉๅ›พๆ ‡ + ๅฏนๅ‹พ๏ผŒๆ ‡ๆณจโ€œๆˆไบบ ้€‚็”จโ€
- ๅ„ฟ็ซฅ๏ผšไบบ็‰ฉๅ›พๆ ‡ + ้—ฎๅท + ๅŒป็”Ÿๅ›พๆ ‡๏ผŒๆ ‡ๆณจโ€œๅ„ฟ็ซฅ ๅ’จ่ฏขๅŒป็”Ÿโ€

**7. ๅ‚จๅญ˜ๆŒ‡ๅ—๏ผˆๅณไธ‹๏ผ‰**
- ไธ‰ไธชๅ›พๆ ‡ๅนถๅˆ—๏ผš
- ๆธฉๅบฆ่ฎกๅ›พๆ ‡๏ผš2-25โ„ƒ ๅฎคๆธฉ
- ๅฏ†ๅฐ็“ถๅ›พๆ ‡๏ผšๅฏ†้—ญ
- ้ฎๅ…‰ๅ›พๆ ‡๏ผˆๅคช้˜ณๅŠ ๆ–œ็บฟ๏ผ‰๏ผš้ฟๅ…‰

ๆ•ดไฝ“่ฎพ่ฎก้ฃŽๆ ผ็Žฐไปฃใ€ไธ“ไธš๏ผŒไฝฟ็”จๅคง้‡ๅ›พๆ ‡่พ…ๅŠฉ็†่งฃ๏ผŒๆ•ฐๆฎๅฏ่ง†ๅŒ–ๆธ…ๆ™ฐ๏ผŒ้€‚ๅˆๅŒป็–—ๆˆ–่ฏๅ“ๅฎฃไผ ๅœบๆ™ฏใ€‚ๆ‰€ๆœ‰ๆ–‡ๆœฌๅ‡ไธบไธญๆ–‡๏ผŒ่ฏญ่จ€ๅ‡†็กฎ๏ผŒๆ— ๅ†—ไฝ™ๆ่ฟฐใ€‚
Prompt
The infographic presents an augmented reality (AR) shopping experience overlaid on a real-world retail environment. The scene is set in a brightly lit cosmetics aisle of a store, with shelves stocked with beauty products visible in the background. In the foreground, a pair of hands holds a black rectangular compact labeled "ANASTASIA BEVERLY HILLS BROW POWDER DUO" with "EBONY" and "NET WT. 2.5 OZ." printed below. A gold ring is visible on the left handโ€™s ring finger, and a black wristband is partially seen on the left wrist.

Overlaid on the image are several semi-transparent, rounded-corner UI elements resembling AR pop-ups or digital cards, providing contextual information about the product and the userโ€™s shopping list.

On the left side, a vertical panel titled "SHOPPING LIST" lists four items:
1. Face Wash โ€” marked with an โ€œXโ€ (completed)
2. Shampoo โ€” marked with an โ€œXโ€ (completed)
2. Eye Cream โ€” marked with an empty checkbox (not completed; duplicated item number)
3. Eye Cream โ€” marked with an empty checkbox (not completed)

This suggests a possible error or duplication in the list, with two entries for "Eye Cream".

In the center-right, a speech-bubble-shaped label displays the price: "$23.00".

To the right of the product, a larger panel titled "PRODUCT DETAILS:" provides information about the "ABH Brow Powder Duo". It features two color swatches:
- Left swatch: labeled "DEEP BROWN"
- Right swatch: labeled "BLACK"

Below the swatches, a star rating system shows four and a half filled stars, accompanied by the text "4.5 out of 5 stars".

Underneath the rating, a section titled "COMMON USES:" states: "DEFINES & FILLS BROWS".

Further down, a smaller rectangular box labeled "KEY INGREDIENTS" lists:
- Vitamin E
- Finely Milled Pigments

At the bottom right, another box titled "APPLICATION TIPS" includes a video icon (a rectangle with a play triangle) and the word "Video", indicating a multimedia tutorial is available.

The overall layout mimics an immersive AR interface, likely from a smart glasses or smartphone application, designed to enhance in-store shopping by providing instant, interactive product data directly within the userโ€™s field of view. The visual style uses dark gray, translucent backgrounds with white text for high contrast and readability against the busy store backdrop. The design emphasizes usability, with clear categorization of information into distinct panels and intuitive icons. All textual content is in English, and no other languages are present.
Prompt
่ฏฅไฟกๆฏๅ›พไปฅๆทฑ่“่‰ฒ็ง‘ๆŠ€ๆ„Ÿ่ƒŒๆ™ฏไธบไธป๏ผŒ้…ไปฅ็ดซ่‰ฒๅ’Œ้’่‰ฒ็š„็”ต่ทฏๆฟๅ›พๆกˆ่พนๆก†๏ผŒ่ฅ้€ ๅ‡บๆœชๆฅๆ•ฐๅญ—่ฎพๅค‡็š„่ง†่ง‰ๆฐ›ๅ›ดใ€‚ๆ ‡้ข˜โ€œ่ฐทๆญŒๆœ€ๆ–ฐ่ก€ๆฐงไปชๆœบๅž‹ๅ‚ๆ•ฐๅฏนๆฏ”๏ผˆ็คพๅช’็‰ˆ๏ผ‰โ€ไฝไบŽ้กถ้ƒจไธญๅคฎ๏ผŒไฝฟ็”จๅ‘ๅ…‰็™ฝ่‰ฒๅญ—ไฝ“๏ผŒ็ชๅ‡บไธป้ข˜ใ€‚ๆ•ดไฝ“ๅธƒๅฑ€ไธบๆจชๅ‘ไธ‰ๆ ๅผๅฏนๆฏ”็ป“ๆž„๏ผŒๅทฆไพงไธบๅ‚ๆ•ฐ็ฑปๅˆซๆ ‡็ญพๅˆ—๏ผŒไธญ้—ดๅŠๅณไพงๅˆ†ๅˆซไธบไธ‰ๆฌพๆ™บ่ƒฝ็ฉฟๆˆด่ฎพๅค‡็š„ๅ‚ๆ•ฐ่ฏฆๆƒ…ใ€‚

ๅทฆไพงๅ‚ๆ•ฐ็ฑปๅˆซๅˆ—ไปฅๅ›พๆ ‡+ๆ–‡ๅญ—ๅฝขๅผๅž‚็›ดๆŽ’ๅˆ—๏ผŒๅŒ…ๆ‹ฌ๏ผš
- ่Šฏ็‰‡๏ผˆๅ›พๆ ‡ไธบ่Šฏ็‰‡็ฌฆๅท๏ผ‰
- ็”ตๆฑ ๏ผˆๅ›พๆ ‡ไธบ็”ตๆฑ ็ฌฆๅท๏ผ‰
- ๅŠŸ่ƒฝ๏ผˆๅ›พๆ ‡ไธบๅฟƒ็”ตๆณขๅฝข็ฌฆๅท๏ผ‰
- ้‡้‡๏ผˆๅ›พๆ ‡ไธบ็งค็›˜็ฌฆๅท๏ผ‰
- ไปทๆ ผ๏ผˆๅ›พๆ ‡ไธบไปทๆ ผๆ ‡็ญพ็ฌฆๅท๏ผ‰
- ๅ‘ๅ”ฎๆ—ถ้—ด๏ผˆๅ›พๆ ‡ไธบๆ—ฅๅކ็ฌฆๅท๏ผ‰

ไธญ้—ดไธ‰ๆ ๅˆ†ๅˆซๅฏนๅบ”ไธ‰ๆฌพไบงๅ“๏ผš
1. **้ซ˜ไบฎๆŽจ่ๆœบๅž‹๏ผšGoogle Pixel Pulse๏ผˆๆœ€ๆ–ฐๆŽจ่๏ผ‰**
- ๆ ‡้ข˜ไธŠๆ–นๆœ‰้‡‘่‰ฒๆ˜Ÿๅฝขๅพฝ็ซ โ€œโ˜… ้ซ˜ไบฎๆŽจ่ๆœบๅž‹โ€๏ผŒๅนถ็”จ้‡‘่‰ฒ่พนๆก†้ซ˜ไบฎๆ˜พ็คบใ€‚
- ่Šฏ็‰‡๏ผšTensor G4ๅฎšๅˆถ่Šฏ็‰‡
- ็”ตๆฑ ๏ผš7ๅคฉ็ปญ่ˆช๏ผŒๅฟซๅ……
- ๅŠŸ่ƒฝ๏ผš่ฟž็ปญ่ก€ๆฐง็›‘ๆต‹๏ผŒ็ก็œ /ๅŽ‹ๅŠ›่ฟฝ่ธช๏ผŒAIๅฅๅบทๆŒ‡ๅฏผ
- ้‡้‡๏ผš28ๅ…‹๏ผˆ่ฝป็›ˆ๏ผ‰
- ไปทๆ ผ๏ผšยฅ1999
- ๅ‘ๅ”ฎๆ—ถ้—ด๏ผš2024ๅนด10ๆœˆ

2. **็ซžๅ“A๏ผˆไพ‹ๅฆ‚๏ผšApple Watch S9๏ผ‰**
- ่Šฏ็‰‡๏ผšS9 SiP่Šฏ็‰‡
- ็”ตๆฑ ๏ผš18ๅฐๆ—ถ๏ผˆๆญฃๅธธไฝฟ็”จ๏ผ‰
- ๅŠŸ่ƒฝ๏ผšๆŒ‰้œ€่ก€ๆฐง๏ผŒๅฟƒ็”ตๅ›พAPP๏ผŒๆ‘”ๅ€’ๆฃ€ๆต‹
- ้‡้‡๏ผš32ๅ…‹
- ไปทๆ ผ๏ผšยฅ3199
- ๅ‘ๅ”ฎๆ—ถ้—ด๏ผš2023ๅนด9ๆœˆ

3. **็ซžๅ“B๏ผˆไพ‹ๅฆ‚๏ผšGarmin Venu 3๏ผ‰**
- ่Šฏ็‰‡๏ผšElevated V5ไผ ๆ„Ÿๅ™จ
- ็”ตๆฑ ๏ผš14ๅคฉ๏ผˆๆ™บ่ƒฝๆจกๅผ๏ผ‰
- ๅŠŸ่ƒฝ๏ผšๅ…จๅคฉๅ€™่ก€ๆฐง๏ผŒ่บซไฝ“็”ต้‡๏ผŒGPS่ฟๅŠจ
- ้‡้‡๏ผš35ๅ…‹
- ไปทๆ ผ๏ผšยฅ2499
- ๅ‘ๅ”ฎๆ—ถ้—ด๏ผš2023ๅนด8ๆœˆ

ๆ‰€ๆœ‰ๆ•ฐๆฎๅ‡้‡‡็”จๆธ…ๆ™ฐ็š„ๆจชๅ‘ๅˆ†้š”็บฟ็ป„็ป‡๏ผŒๆฏ้กนๅ‚ๆ•ฐๅ†…ๅฎนๅฑ…ไธญๅฏน้ฝ๏ผŒๅญ—ไฝ“ไธบ็ฎ€ๆด็Žฐไปฃ็š„ๆ— ่กฌ็บฟไฝ“๏ผŒ้ขœ่‰ฒไธบๆต…่“ๆˆ–็™ฝ่‰ฒ๏ผŒ็กฎไฟๅฏ่ฏปๆ€งใ€‚้ซ˜ไบฎๆŽจ่ๆœบๅž‹ไฝฟ็”จ้‡‘่‰ฒ่พนๆก†ๅ’Œๆ›ดๆ˜Žไบฎ็š„ๆ–‡ๅญ—๏ผŒๅฝขๆˆ่ง†่ง‰็„ฆ็‚นใ€‚

ๅบ•้ƒจๆœ‰ไธ€่กŒๆณจ้‡Šๆ–‡ๅญ—๏ผšโ€œๆณจ๏ผšไปฅไธŠๅ‚ๆ•ฐไป…ไพ›ๅ‚่€ƒ๏ผŒๅ…ทไฝ“ไปฅๅฎ˜ๆ–นๅ‘ๅธƒไธบๅ‡†ใ€‚#็ง‘ๆŠ€ #ๅฅๅบท #่ฐทๆญŒๆ–ฐๅ“ #่ก€ๆฐงไปชๅฏนๆฏ”โ€๏ผŒๅญ—ไฝ“่พƒๅฐ๏ผŒ้ขœ่‰ฒ่พƒๆš—๏ผŒไฝœไธบ่กฅๅ……่ฏดๆ˜Žใ€‚

ๆ•ดไฝ“่ฎพ่ฎก้ฃŽๆ ผ็Žฐไปฃใ€็ง‘ๆŠ€ๆ„Ÿๅผบ๏ผŒ้€š่ฟ‡่‰ฒๅฝฉๅฏนๆฏ”ใ€่พนๆก†้ซ˜ไบฎๅ’Œๅ›พๆ ‡่พ…ๅŠฉ๏ผŒๆœ‰ๆ•ˆไผ ่พพไบ†ๅ„ๆœบๅž‹ๅœจๅ…ณ้”ฎๆ€ง่ƒฝๆŒ‡ๆ ‡ไธŠ็š„ๅทฎๅผ‚๏ผŒๅฐคๅ…ถ็ชๅ‡บไบ†Google Pixel Pulseๅœจ็ปญ่ˆชใ€ไปทๆ ผๅ’ŒๅŠŸ่ƒฝ้›†ๆˆๆ–น้ข็š„ไผ˜ๅŠฟใ€‚
Prompt
่ฏฅไฟกๆฏๅ›พไปฅๅคๅคๆ‰‹็ป˜้ฃŽๆ ผๅ‘ˆ็Žฐ๏ผŒๆ•ดไฝ“ๅธƒๅฑ€ๅฆ‚ไธ€ๆœฌๆ‰“ๅผ€็š„ๆณ›้ป„ไนฆ้กต๏ผŒ่ƒŒๆ™ฏไธบ็ฑณ้ป„่‰ฒไปฟๆ—ง็บธๅผ ่ดจๆ„Ÿ๏ผŒ่พน็ผ˜ๅธฆๆœ‰ไธ่ง„ๅˆ™ๆ’•่ฃ‚ๆ•ˆๆžœใ€‚ๆ ‡้ข˜โ€œๅš็‰ฉ้ฆ†ๆธธ่งˆๆ‰ฉๅฑ•ๅ†…ๅฎนไธŽ่ฆ็‚นโ€ไฝไบŽ้กถ้ƒจไธญๅคฎ๏ผŒๅญ—ไฝ“ไธบๆทฑๆฃ•่‰ฒ่‰บๆœฏๅญ—๏ผŒไธคไพง้ฅฐๆœ‰ๅทๆ›ฒ่Šฑ็บน่ฃ…้ฅฐ๏ผŒ่ง†่ง‰ไธŠ็ชๅ‡บไธป้ข˜ใ€‚

ๅ…จๅ›พ้‡‡็”จๅ…ญ็‚นๅผ็ป“ๆž„ๅŒ–ๅธƒๅฑ€๏ผŒๅ›ด็ป•ไธญๅฟƒๅˆ†ๅธƒๅ…ญไธชๆ ธๅฟƒๆจกๅ—๏ผŒๆฏไธชๆจกๅ—ๅ‡้…ๆœ‰็‹ฌ็ซ‹ๆ’็”ปใ€็ผ–ๅทๆ ‡้ข˜ๅ’Œ่ฏดๆ˜Žๆ–‡ๅญ—๏ผŒ้€š่ฟ‡่ฃ…้ฅฐๆ€ง่พนๆก†ใ€่Šฑ็Žฏใ€ไธๅธฆ็ญ‰ๅ…ƒ็ด ่ฟ›่กŒๅŒบๅˆ†ไธŽ็พŽๅŒ–ใ€‚ๆ•ดไฝ“่ฎพ่ฎก้ฃŽๆ ผๆธฉ้ฆจใ€ๆ–‡่‰บ๏ผŒ่žๅˆไบ†้Ÿณไน็ฌฆๅทใ€ๆ˜Ÿๆ˜Ÿใ€่–ฐ่กฃ่‰ใ€ไบ‘ๆœต็ญ‰็‚น็ผ€ๅ…ƒ็ด ๏ผŒ่ฅ้€ ๅ‡บ่ฝปๆพๆ„‰ๆ‚ฆ็š„ๆ–‡ๅŒ–ๆŽข็ดขๆฐ›ๅ›ดใ€‚

ๅ„ๆจกๅ—ๅ†…ๅฎนๅฆ‚ไธ‹๏ผš

1. **ๆฒ‰ๆตธๅผไฝ“้ชŒ**
- ๆ ‡้ข˜๏ผšโ€œ1. ๆฒ‰ๆตธๅผไฝ“้ชŒโ€
- ่ฏดๆ˜Žๆ–‡ๅญ—๏ผšโ€œๅ‚ไธŽไบ’ๅŠจๅฑ•่งˆ๏ผŒๆ„Ÿๅ—ๅކๅฒๅœบๆ™ฏ่ฟ˜ๅŽŸ๏ผŒ่บซไธดๅ…ถๅขƒใ€‚โ€
- ่ง†่ง‰ๅ…ƒ็ด ๏ผšๅทฆไพงๆ็ป˜ไธ€ไฝ้‡‘ๅ‘็”ทๅญฉๆ‰‹ๆŒๆ”พๅคง้•œ่ง‚ๅฏŸไธ€ไธชๅพฎ็ผฉๅކๅฒ่ก—ๆ™ฏๆจกๅž‹๏ผˆๅŒ…ๅซๆˆฟๅฑ‹ใ€ๆ‘Šไฝๅ’Œไบบ็‰ฉ๏ผ‰๏ผŒไธŠๆ–นๆœ‰้ฝฟ่ฝฎไธŽ็ฏๆณก็ป„ๆˆ็š„ๆ€่€ƒๆฐ”ๆณก๏ผŒ่ฑกๅพๆŽข็ดขไธŽๅ‘็Žฐใ€‚ๅณไพง้…ๆœ‰ไธ€ไธช็ณป็€็ฒ‰่‰ฒ่ด่ถ็ป“็š„็คผ็‰ฉ็›’๏ผŒๆ ‡็ญพๅ†™ๆœ‰โ€œSURPRISEโ€ใ€‚

2. **ไธป้ข˜่ฎฒๅบงไธŽๅทฅไฝœๅŠ**
- ๆ ‡้ข˜๏ผšโ€œ2. ไธป้ข˜่ฎฒๅบงไธŽๅทฅไฝœๅŠโ€
- ่ฏดๆ˜Žๆ–‡ๅญ—๏ผšโ€œ่†ๅฌไธ“ๅฎถๆทฑๅบฆ่งฃ่ฏป๏ผŒไบฒๆ‰‹ๅˆถไฝœๆ‰‹ๅทฅ่‰บๅ“๏ผŒๅญฆไน ๆ–ฐ็Ÿฅใ€‚โ€
- ่ง†่ง‰ๅ…ƒ็ด ๏ผšๅณไพงๅฑ•็คบไธ€ๅผ ๆœจๆกŒ๏ผŒๆกŒไธŠๆ‘†ๆ”พ้™ถๅฃถใ€้™ถ็ฝใ€ๅˆปๅˆ€็ญ‰ๆ‰‹ๅทฅๅทฅๅ…ท๏ผŒๆ—่พนๅ †ๅ ไนฆ็ฑไธŽๅท่ฝด๏ผ›ๅ‘จๅ›ด็Žฏ็ป•ๆฉ„ๆฆ„ๆž่Šฑ็Žฏ๏ผŒไธŠๆ–นๆ‚ฌๆŒ‚ไธ€ไธฒ้ฃŽ้“ƒ๏ผˆๅซๆœˆไบฎใ€ๆ˜Ÿๆ˜ŸไธŽ้“ƒ้“›๏ผ‰๏ผŒ่ƒŒๆ™ฏ็‚น็ผ€ไบ‘ๆœตไธŽๆ˜Ÿๅ…‰ใ€‚

3. **้ฆ†่—็ๅ“ๆŽข็ดข**
- ๆ ‡้ข˜๏ผšโ€œ3. ้ฆ†่—็ๅ“ๆŽข็ดขโ€
- ่ฏดๆ˜Žๆ–‡ๅญ—๏ผšโ€œๅฏปๆ‰พ้•‡้ฆ†ไน‹ๅฎ๏ผŒไบ†่งฃ่ƒŒๅŽ็š„ๆ•…ไบ‹ไธŽๆ–‡ๅŒ–ไปทๅ€ผ๏ผŒๆทฑๅบฆๆŒ–ๆŽ˜ใ€‚โ€
- ่ง†่ง‰ๅ…ƒ็ด ๏ผšๅทฆไพงๆ˜ฏไธ€ไธชๆ‰“ๅผ€็š„ๆœจ่ดจๅฎ็ฎฑ๏ผŒๅ†…ๆœ‰้’้“œ้ผŽ็Šถๆ–‡็‰ฉไธŽๅ‘ๅ…‰ๅท่ฝด๏ผ›ๆ—ๆœ‰็ปฟ่‰ฒ็މ็’งๅŠๅ ใ€ๆ•ฃ่ฝ้“œ้’ฑ๏ผŒไปฅๅŠไธ€ๆ”ฏ็‚น็‡ƒ็š„็™ฝ่‰ฒ่œก็ƒ›๏ผŒ็ƒ›ๅฐ่ฃ…้ฅฐๆœ‰่–ฐ่กฃ่‰ไธŽๅฐ่ŠฑๆŸใ€‚

4. **็‰น่‰ฒๅฏผ่งˆ่ทฏ็บฟ**
- ๆ ‡้ข˜๏ผšโ€œ4. ็‰น่‰ฒๅฏผ่งˆ่ทฏ็บฟโ€๏ผˆ็ฝฎไบŽ็ฑณ่‰ฒไธๅธฆๆจชๅน…ไธญ๏ผ‰
- ่ฏดๆ˜Žๆ–‡ๅญ—๏ผšโ€œ่ทŸ้šๅฎšๅˆถ่ทฏ็บฟ๏ผŒๅ‘็Žฐ้š็ง˜่ง’่ฝไธŽ็‹ฌ็‰น่ง†่ง’๏ผŒๅˆซๆ ท็ฒพๅฝฉใ€‚โ€
- ่ง†่ง‰ๅ…ƒ็ด ๏ผšไธ‹ๆ–นๆ˜ฏไธ€ๅผ ๅฑ•ๅผ€็š„ๅคๅคๅœฐๅ›พ๏ผŒๆ ‡ๆœ‰ๆ‹ฑ้—จใ€ๅ‡‰ไบญใ€ไฝ›ๅƒใ€้›•ๅก‘็ญ‰ๆ™ฏ็‚น๏ผŒไปฅ็บข่‰ฒ่™š็บฟ่ฟžๆŽฅ๏ผŒๅนถ้…ๆœ‰ๆŒ‡ๅ—้’ˆๅ›พๆ ‡๏ผŒไฝ“็Žฐ่ทฏๅพ„่ง„ๅˆ’ๆฆ‚ๅฟตใ€‚

5. **ๆ•ฐๅญ—ๅŒ–ไบ’ๅŠจ**
- ๆ ‡้ข˜๏ผšโ€œ5. ๆ•ฐๅญ—ๅŒ–ไบ’ๅŠจโ€๏ผˆ็ฝฎไบŽๅœ†ๅฝขๆณข็‚น่พนๆก†ๅ†…๏ผ‰
- ่ฏดๆ˜Žๆ–‡ๅญ—๏ผšโ€œๅˆฉ็”จAR/VRๆŠ€ๆœฏ๏ผŒๆ‰“็ ดๆ—ถ็ฉบ้™ๅˆถ๏ผŒไฝ“้ชŒ่™šๆ‹Ÿ็Žฐๅฎžใ€‚โ€
- ่ง†่ง‰ๅ…ƒ็ด ๏ผšๅณไพงๆ็ป˜ไธ€ไฝๆˆดVR็œผ้•œ็š„ไบบๆญฃๅœจ่งฆๆŽง็ฉบไธญๆ‚ฌๆตฎ็š„้™ถ็ฝๅ›พๅƒ๏ผŒๅ‘จๅ›ดๆœ‰Wi-Fiไฟกๅทใ€ๆ•ฐๆฎๅ›พ่กจใ€ๅฃฐๆณขๅ›พ็ญ‰็ง‘ๆŠ€ๅ…ƒ็ด ๏ผŒไฝ“็Žฐๆ•ฐๅญ—ไบคไบ’ๅœบๆ™ฏใ€‚

6. **ๆ–‡ๅŒ–่ก็”Ÿๅ“**
- ๆ ‡้ข˜๏ผšโ€œ6. ๆ–‡ๅŒ–่ก็”Ÿๅ“โ€
- ่ฏดๆ˜Žๆ–‡ๅญ—๏ผšโ€œ้€‰่ดญ็‹ฌ็‰น็บชๅฟตๅ“๏ผŒๅฐ†ๅš็‰ฉ้ฆ†่ฎฐๅฟ†ๅธฆๅ›žๅฎถ๏ผŒๅปถ็ปญ็พŽๅฅฝใ€‚โ€
- ่ง†่ง‰ๅ…ƒ็ด ๏ผšๅทฆไธ‹่ง’้™ˆๅˆ—ๅคš็งๆ–‡ๅˆ›ๅ•†ๅ“๏ผŒๅŒ…ๆ‹ฌๅฐๆœ‰ๅš็‰ฉ้ฆ†ๅปบ็ญ‘ๅ›พๆกˆ็š„ๅธ†ๅธƒ่ข‹๏ผˆๆ ‡ๆœ‰โ€œMUSEUMโ€๏ผ‰ใ€็ฌ”่ฎฐๆœฌใ€ๆ˜Žไฟก็‰‡ใ€ๅพฝ็ซ ๏ผ›ๅณไธ‹่ง’ๅˆ™ๆ˜ฏไธ€็›˜็ฒพ่‡ดไธ‰ๆ˜Žๆฒป๏ผˆ้ขๅŒ…ไธŠ็ƒ™ๆœ‰ไบ”่ง’ๆ˜Ÿๅ›พๆกˆ๏ผ‰๏ผŒ้…่“่Ž“ไธŽๅท้ฅผ๏ผŒๆ—ๆœ‰ไธ€ๅชๆˆดๆดพๅฏนๅธฝใ€็ณป่“่‰ฒ่ด่ถ็ป“็š„็™ฝ้น…๏ผŒๅฃไธญๅ–ทๅ‡บ้Ÿณ็ฌฆ๏ผŒๅ……ๆปก็ซฅ่ถฃใ€‚

ๆ•ดๅผ ไฟกๆฏๅ›พ้€š่ฟ‡ๅ›พๆ–‡็ป“ๅˆ็š„ๆ–นๅผ๏ผŒ็ณป็ปŸไป‹็ปไบ†ๅš็‰ฉ้ฆ†ๅ‚่ง‚็š„ๅ…ญๅคงๅปถไผธๆดปๅŠจ๏ผŒๆ—ขไผ ่พพๅฎž็”จไฟกๆฏ๏ผŒๅˆๅ…ผๅ…ท็พŽๅญฆๆ„ŸๆŸ“ๅŠ›๏ผŒ้€‚ๅˆ็”จไบŽๅฎฃไผ ๅ†Œใ€ๆ•™่‚ฒๆตทๆŠฅๆˆ–็บฟไธŠๆŽจๅนฟๆๆ–™ใ€‚ๆ‰€ๆœ‰ๆ–‡ๆœฌๅ‡ไธบไธญๆ–‡๏ผŒๆ— ่‹ฑๆ–‡ๆˆ–ๆ•ฐๅญ—็ผ–็ ๏ผŒ่ฏญ่จ€้ฃŽๆ ผไบฒๅˆ‡่‡ช็„ถ๏ผŒ็ฌฆๅˆๅคงไผ—ไผ ๆ’ญ้œ€ๆฑ‚ใ€‚

๐Ÿ› ๏ธ Quick Start

๐ŸŒ Use with SenseNova-Studio

The fastest way to experience SenseNova-U1 is through SenseNova-Studio โ€” a ๐Ÿ†“ free online playground where you can try the model directly in your browser, no installation or GPU required.

Note: To serve more users, U1-Fast has undergone step and CFG distillation, and is dedicated to infographic generation.

๐Ÿฆž Use with SenseNova-Skills (OpenClaw)

The easiest way to integrate SenseNova-U1 into your own agent or application is through our companion repository SenseNova-Skills (OpenClaw) ๐Ÿฆž, which ships SenseNova-U1 as a ready-to-use skill with a unified tool-calling interface.

Refer to the SenseNova-Skills README for installation and usage details.

โœจ Click to collapse and view interesting cases made through Skills and Studio

Skill Cases

๐Ÿค— Run with transformers (Default)

Setup: Follow the Installation Guide to clone the repo and install dependencies with uv.

๐ŸŒŸ Generate High-Quality Infographics

For generating complex infographics, we highly recommend using the following parameters: --cfg_scale 4.0, --timestep_shift 3.0, and --num_steps 50.

python examples/t2i/inference.py \
  --model_path sensenova/SenseNova-U1-8B-MoT-Infographic \
  --prompt "่ฟ™ๅผ ไฟกๆฏๅ›พ็š„ๆ ‡้ข˜ๆ˜ฏโ€œSenseNova-U1โ€๏ผŒ้‡‡็”จ็Žฐไปฃๆž็ฎ€็ง‘ๆŠ€็Ÿฉ้˜ต้ฃŽๆ ผใ€‚ๆ•ดไฝ“ๅธƒๅฑ€ไธบๆฐดๅนณไธ‰ๅˆ—็ฝ‘ๆ ผ็ป“ๆž„๏ผŒ่ƒŒๆ™ฏๆ˜ฏๅธฆๆœ‰ๆžๆต…้“ถ็ฐ่‰ฒ็ป†ๅฏ†็‚น้˜ต็š„ๅ“‘ๅ…‰็บฏ็™ฝ้ซ˜็บง็บธๅผ ็บน็†๏ผŒ็”ป้ข้•ฟๅฎฝๆฏ”ไธบ16:9ใ€‚\n\nๆŽ’็‰ˆ้‡‡็”จไธฅ่ฐจ็š„่ง†่ง‰ๅฑ‚็บง๏ผšไธปๆ ‡้ข˜ไฝฟ็”จ็ฒ—ไฝ“ๆ— ่กฌ็บฟ้ป‘ไฝ“ๅญ—๏ผŒๆญฃๆ–‡ไฝฟ็”จๆธ…ๆ™ฐ็š„็Žฐไปฃ็ญ‰ๅฎฝๅญ—ไฝ“ใ€‚้…่‰ฒๆ–นๆกˆๆžๅ…ถๅ…‹ๅˆถ๏ผŒไปฅ็บฏ็™ฝ่‰ฒไธบๅบ•๏ผŒๆทฑ็‚ญ้ป‘ไธบไธป่ง†่ง‰ๆ–‡ๅญ—ๅ’Œ่พนๆก†๏ผŒๆต…็Ÿณๆฟ็ฐ็”จไบŽ่ƒŒๆ™ฏ่‰ฒๅ—ๅ’Œๆฌก่ฆไฟกๆฏๅŒบๅˆ†๏ผŒๅ›พๆ ‡้‡‡็”จ็ฒพ่‡ด็š„้“ถ็ฐ่‰ฒ็บฟๆก†็ป˜ๅˆถใ€‚\n\nๅœจ็”ป้ขๆญฃไธŠๆ–นๅฑ…ไธญไฝ็ฝฎ๏ผŒไฝฟ็”จ้†’็›ฎ็š„ๆทฑ็‚ญ้ป‘็ฒ—ไฝ“ๅญ—ๆŽ’ๅธƒ็€ๅคงๆ ‡้ข˜โ€œSenseNova-U1โ€ใ€‚ๆ ‡้ข˜ๆญฃไธ‹ๆ–นๆ˜ฏๆต…็Ÿณๆฟ็ฐ่‰ฒ็š„็ญ‰ๅฎฝๅญ—ไฝ“ๅ‰ฏๆ ‡้ข˜โ€œๆ–ฐไธ€ไปฃ็ซฏๅˆฐ็ซฏ็ปŸไธ€ๅคšๆจกๆ€ๅคงๆจกๅž‹ๅฎถๆ—โ€ใ€‚\n\n็”ป้ขไธปไฝ“ๅˆ†ไธบๅทฆใ€ไธญใ€ๅณไธ‰ไธช็›ธ็ญ‰็š„ๅž‚็›ดไฟกๆฏๅŒบๅ—๏ผŒๅŒบๅ—ไน‹้—ด้€š่ฟ‡ๅ……่ถณ็š„่ดŸ็ฉบ้—ด่ฟ›่กŒ็‰ฉ็†้š”็ฆปใ€‚\n\nๅทฆไพงๅŒบๅ—็š„ไธป้ข˜ๆ˜ฏๆฆ‚่ฟฐใ€‚้กถ้ƒจๆœ‰ไธ€ไธช้“ถ็ฐ่‰ฒ็บฟๆก†็ป˜ๅˆถ็š„ใ€็”ฑๆ”พๅคง้•œๅ’Œ้ฝฟ่ฝฎไบค็ป‡็š„ๅ›พๆ ‡๏ผŒๆ—่พนๆ˜ฏ็ฒ—ไฝ“ๅฐๆ ‡้ข˜โ€œOverviewโ€ใ€‚่ฏฅๅŒบๅ—ๅ†…ไปŽไธŠๅˆฐไธ‹ๅž‚็›ดๆŽ’ๅˆ—็€ไธ‰ไธช่ฆ็‚น๏ผš็ฌฌไธ€ไธช่ฆ็‚นๆ—่พนๆ˜ฏไธ€ไธชไปฃ่กจๆ–‡ๆกฃไธŽ็…ง็‰‡้‡ๅ ็š„ๆž็ฎ€ๅ›พๆ ‡๏ผŒ็ดง่ทŸ็€ๆ–‡ๅญ—โ€œๅคšๆจกๆ€ๆจกๅž‹ๅฎถๆ—๏ผŒ็ปŸไธ€ๆ–‡ๆœฌ/ๅ›พๅƒ็†่งฃๅ’Œ็”Ÿๆˆโ€ใ€‚ๅ‘ไธ‹ๆ˜ฏ็”ฑไธคไธช็›ธ่ฟž็š„ๅŒๅฟƒๅœ†็ป„ๆˆ็š„ๆžถๆž„ๅ›พๆ ‡๏ผŒ้…ๆœ‰ๆ–‡ๅญ—โ€œๅŸบไบŽNEO-Unifyๆžถๆž„๏ผˆ็ซฏๅˆฐ็ซฏ็ปŸไธ€็†่งฃๅ’Œ็”Ÿๆˆ๏ผ‰โ€ใ€‚ๆœ€ไธ‹ๆ–นๆ˜ฏไธ€ไธชๅธฆๆœ‰ๆ–œ็บฟๅˆ’ๆމ็š„็œผ็›ๅ’Œๆผๆ–—ๅฝข็Šถ็š„ๅ›พๆ ‡๏ผŒๆ˜Ž็กฎๆŒ‡็คบๆ–‡ๆœฌโ€œๆ— ้œ€่ง†่ง‰็ผ–็ ๅ™จ(VE)ๅ’Œๅ˜ๅˆ†่‡ช็ผ–็ ๅ™จ(VAE)โ€ใ€‚\n\nไธญ้—ดๅŒบๅ—ๅฑ•็คบๆจกๅž‹็Ÿฉ้˜ตใ€‚้กถ้ƒจๆ˜ฏไธ€ไธชๅŒ…ๅซไธคไธชๅˆ†ๆ”ฏ่Š‚็‚น็š„ๆ ‘็Šถ็ฝ‘็ปœๅ›พๆ ‡๏ผŒๆ—่พนๆ˜ฏ็ฒ—ไฝ“ๅฐๆ ‡้ข˜โ€œไธคไธชๆจกๅž‹่ง„ๆ ผโ€ใ€‚ๅŒบๅ—ๅ†…ๅˆ†ไธบไธŠไธ‹ไธคไธชๅŒ…่ฃนๅœจๆต…็Ÿณๆฟ็ฐ่‰ฒๆž็ป†่พนๆก†ๅ†…็š„ๅก็‰‡ใ€‚ไธŠๆ–น็š„ๅก็‰‡ๅ†…็”ป็€ไธ€ไธชไปฃ่กจ้ซ˜ๅฏ†ๅบฆ็š„ๅฎžๅฟƒๅ‡ ไฝ•็ซ‹ๆ–นไฝ“ๅ›พๆ ‡๏ผŒๅคงๅญ—ๆ ‡ๆณจโ€œSenseNova-U1-8B-MoTโ€๏ผŒไธ‹ๆ–นๆ˜ฏ็ญ‰ๅฎฝๅญ—ไฝ“่ฏดๆ˜Žโ€œ8B MoT ๅฏ†้›†ไธปๅนฒๆจกๅž‹โ€ใ€‚ไธ‹ๆ–น็š„ๅก็‰‡ๅ†…็”ป็€ไธ€ไธชๅธฆๆœ‰้—ช็”ต็ฌฆๅท็š„็ฝ‘็Šถๅ‘ๅ…‰ๅคง่„‘ๅ›พๆ ‡๏ผŒๅคงๅญ—ๆ ‡ๆณจโ€œSenseNova-U1-A3B-MoTโ€๏ผŒไธ‹ๆ–นๆ˜ฏ็ญ‰ๅฎฝๅญ—ไฝ“่ฏดๆ˜Žโ€œA3B MoT ๆททๅˆไธ“ๅฎถ๏ผˆMoE๏ผ‰ไธปๅนฒๆจกๅž‹โ€ใ€‚ๅœจ่ฟ™ไธคไธช็‹ฌ็ซ‹ๅก็‰‡็š„ๆญฃไธ‹ๆ–น๏ผŒๅทฆไพงๆ”พ็ฝฎไธ€ไธช็ฌ‘่„ธ่ฝฎๅป“ๅ›พๆ ‡ๆญ้…ๆ–‡ๅญ—โ€œๅฐ†ๅœจHF็ญ‰ๅนณๅฐๅ…ฌๅผ€โ€๏ผŒๅณไพงๆ”พ็ฝฎไธ€ไธชๅธฆๆœ‰ๆŠ˜่ง’็š„ไนฆ้ขๆŠฅๅ‘Šๅ›พๆ ‡ๆญ้…ๆ–‡ๅญ—โ€œๅฐ†ๅ‘ๅธƒๆŠ€ๆœฏๆŠฅๅ‘Šโ€ใ€‚\n\nๅณไพงๅŒบๅ—ๅ‘ˆ็Žฐๆ ธๅฟƒไผ˜ๅŠฟใ€‚้กถ้ƒจๆ˜ฏไธ€ไธชไปฃ่กจๅท…ๅณฐ็š„ไธŠๅ‡้˜ถๆขฏๆŠ˜็บฟๅ›พๅ›พๆ ‡๏ผŒๆ—่พนๆ˜ฏ็ฒ—ไฝ“ๅฐๆ ‡้ข˜โ€œHighlightsโ€ใ€‚่ฏฅๅŒบๅ—ๅ†…้ƒจๅž‚็›ดๅˆ†ๅธƒ็€ๅ››ไธชๅธฆๆœ‰ๆต…็Ÿณๆฟ็ฐๅบ•่‰ฒ็š„้•ฟๆ–นๅฝข่‰ฒๅ—๏ผŒๆฏไธช่‰ฒๅ—ๅ†…้ƒจๅทฆไพงๅฏนๅบ”ไธ€ไธชๅ…ทไฝ“็š„ๅ›พๆ ‡๏ผŒๅณไพงไธบๆ–‡ๅญ—ใ€‚็ฌฌไธ€ไธช่‰ฒๅ—ๅ†…ๆ˜ฏไธ€ไธชๆ— ็ผ็›ธ่ฟž็š„่Žซๆฏ”ไนŒๆ–ฏ็Žฏๅ›พๆ ‡๏ผŒ้…ๆ–‡โ€œๅŽŸ็”Ÿ็ปŸไธ€ๆžถๆž„๏ผŒๆ— VEๅ’ŒVAEโ€ใ€‚็ฌฌไบŒไธช่‰ฒๅ—ๅ†…ๆ˜ฏไธ€ไธช้กถ็ซฏๅธฆๆœ‰ๆ˜Ÿๆ˜Ÿ็š„ๅฅ–ๆฏๅ›พๆ ‡๏ผŒ้…ๆ–‡โ€œๅ•ไธ€็ปŸไธ€ๆจกๅž‹ๅœจ็†่งฃๅ’Œ็”ŸๆˆไปปๅŠกไธŠๅ‡่พพๅˆฐSOTAๆ€ง่ƒฝโ€ใ€‚็ฌฌไธ‰ไธช่‰ฒๅ—ๅ†…ๆ˜ฏไปฃ่กจๆ–‡ๆœฌ่กŒไธŽๆ‹็ซ‹ๅพ—็…ง็‰‡ไบคๆ›ฟ็ฉฟๆ’็š„ๅ›พๆ ‡๏ผŒ้…ๆ–‡โ€œๅผบๅคง็š„ๅŽŸ็”Ÿไบค้”™ๆŽจ็†่ƒฝๅŠ›๏ผˆๆจกๅž‹ๅŽŸ็”Ÿ็”Ÿๆˆๅ›พๅƒ่ฟ›่กŒๆŽจ็†๏ผ‰โ€ใ€‚ๆœ€ๅŽไธ€ไธช่‰ฒๅ—ๅ†…ๆ˜ฏไธ€ไธช่ขซๅˆ‡ๅˆ†ๅ‡บไธ€ๅฐๅ—็š„็กฌๅธไธŽ่ฏฆ็ป†้ฅผ็Šถๅ›พ็ป“ๅˆ็š„ๅ›พๆ ‡๏ผŒ้…ๆ–‡โ€œ่ƒฝ็”Ÿๆˆๅคๆ‚ไฟกๆฏๅ›พ่กจ๏ผŒๆ€งไปทๆฏ”ๅ‡บ่‰ฒโ€ใ€‚" \
  --width 2720 --height 1536 \
  --cfg_scale 4.0 --cfg_norm none --timestep_shift 3.0 --num_steps 50 \
  --output output.png --profile

Default resolution is 2048ร—2048 (1:1). See supported resolution buckets for other aspect ratios.

For high-quality infographic generation, it is recommended to apply prompt enhancement before generating images.

๐Ÿ’พ Memory-efficient inference (GGUF + VRAM modes)

For users running on a single consumer GPU, two complementary features lower the VRAM footprint of the transformers path. They can be combined freely.

--vram_mode: single-GPU layer offload

Pass --vram_mode to keep the language-model layers resident on CPU pinned memory and stream them onto the GPU on-demand during forward, freeing weight VRAM while keeping activations on-device.

Mode Behavior When to use
full (default) No offload; whole model on GPU Plenty of VRAM, best speed
low Synchronous per-layer CPUโ†”GPU swap Lowest VRAM footprint
balanced Async prefetch overlaps H2D copy with compute Tight on VRAM but want to recover speed
python examples/t2i/inference.py \
  --model_path sensenova/SenseNova-U1-8B-MoT-Infographic \
  --vram_mode balanced \
  --prompt "..." --output output.png

--gguf_checkpoint and --vram_mode compose: a Q4 GGUF + balanced is the recommended setup for ~10โ€“12 GB consumer cards.

โšก Run with LightLLM + LightX2V (Recommended)

For production serving, we co-design a dedicated inference stack on top of LightLLM (understanding) and LightX2V (generation). The two engines are disaggregated so that each path can use its own parallelism and resource budget, with a low-overhead transfer channel in between.

On a single node with TP2 + CFG2, this stack delivers roughly ~0.15 s/step and ~9 s end-to-end for a 2048ร—2048 image on H100 / H200, with a ~2.4โ€“3.2ร— prefill speedup from our FA3-based hybrid-mask attention over the Triton baseline. Full per-GPU performance are reported in docs/inference_infra.md.

An official docker image is provided for one-command deployment:

docker pull lightx2v/lightllm_lightx2v:20260407

โš™๏ธ Deployment guide (Docker, launch flags, modes, quantization, API test): see docs/deployment.md.

๐Ÿ“– Full design and performance profiling: see docs/inference_infra.md.

๐ŸŒ Join the Community!

Join our growing community to share feedback, get support, and stay updated on the latest SenseNova-U1 developments โ€” we'd love to hear from you!

Discord WeChat Group

โš–๏ธ License

This project is released under the Apache 2.0 License.

Downloads last month
16
Safetensors
Model size
18B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Paper for hardcoremoore/SenseNova-U1-8B-MoT-Infographic