Instructions to use hardcoremoore/SenseNova-U1-8B-MoT-Infographic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use hardcoremoore/SenseNova-U1-8B-MoT-Infographic with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("hardcoremoore/SenseNova-U1-8B-MoT-Infographic", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
English | ็ฎไฝไธญๆ
๐ฃ Updated News
[2026.05.15]Release SenseNova-U1-8B-MoT-Infographic ๐, for improved infographic generation. See U1 Infographic Model for details, and โจ Infographic Showcases for 100 generated examples.
โจ Click to expand older news
[2026.05.10]Release ๐ฅSenseNova-U1 Technical Report๐ฅ and the weights for SenseNova-U1-A3B-MoT-SFT & SenseNova-U1-A3B-MoT.[2026.05.08]Add GGUF quantized checkpoints and layer-offload VRAM modes for low-VRAM single-GPU inference. See Memory-efficient inference. GGUF weights forSenseNova-U1-8B-MoT-Mergerare available at ๐ค smthem/SenseNova-U1-8B-MoT-Merger-gguf โ many thanks to @smthem for contributing the quantized weights.[2026.05.06]Release SenseNova-U1-8B-MoT-LoRA-8step-V1.0. Please see the example script.[2026.04.30]Release the preview version of the 8-step inference model SenseNova-U1-8B-MoT-8step-preview. In most cases, the image generation quality of this model closely matches that of the base model (see comparison and existing issues). To test this model, you can use the inference scripts, but with the following parameters:--cfg_scale 1.0 --num_steps 8.[2026.04.27]Initial release of the weights for SenseNova-U1-8B-MoT-SFT and SenseNova-U1-8B-MoT.[2026.04.27]Initial release of the inference code for SenseNova-U1.
๐ Overview
๐ SenseNova U1 is a new series of native multimodal models that unifies multimodal understanding, reasoning, and generation within a monolithic architecture. It marks a fundamental paradigm shift in multimodal AI: from modality integration to true unification. Rather than relying on adapters to translate between modalities, SenseNova U1 models think-and-act across language and vision natively.
โจ Click to expand architecture details
Unifying visual understanding and generation in an end-to-end architecture from pixel to word opens tremendous possibilities, enabling highly efficient and strong understanding, generation, and interleaved reasoning in a natively multimodal manner.
๐๏ธ Key Pillars:
At the core of SenseNova U1 is NEO-unify, a novel architecture designed from the first principles for multimodal AI: It eliminates both Visual Encoder (VE) and Variational Auto-Encoder (VAE) where pixel-word information are inherently and deeply correlated. Several important features are as follows:
- ๐ Model language and visual information end-to-end as a unified compound.
- ๐ผ๏ธ Preserve semantic richness while maintaining pixel-level visual fidelity.
- ๐ง Reason across modalities with high efficiency & minimal conflict via native MoTs.
Powered by this new core architecture, SenseNova U1-8B-MoT-Infographic (infographic-specifically enhanced version of SenseNova U1-8B-MoT) delivers exceptional efficiency and state-of-the-art infographic performance:
Generation Latency vs. Averaging Performance on Infographic Benchmarks (BizGenEval, IGenBench). |
Generation Latency vs. Averaging Performance on general benchmarks (OneIG, LongText, CVTG). |
- Benchmark Performance: Compared with the base SenseNova-U1-8B-MoT model, BizGenEval hard/easy increased from 39.8 / 61.1 to 46.6 / 65.4 (+6.8 / +4.3 points), and IGenBench Q-ACC/I-ACC increased from 51.3 / 4.2 to 69.5 / 17.0 (+18.2 / +12.8 points), while maintaining robust visual understanding capabilities without substantial degradation.
- Generation Quality: The model produces complex infographics across 100+ styles and layouts, with improved visual aesthetics and text rendering โ including dense small text such as arXiv-style pages.
โจ Click to expand Benchmark Details
| Model | BizGenEval Avg. (hard / easy) โ | IGenBench Q-ACCโ | IGenBench I-ACC โ | OneIG(EN) โ | OneIG(ZH) โ |
|---|---|---|---|---|---|
| Commercial Models | |||||
| Nano-Banana-Pro | 76.7 / 93.7 | 90.6 | 48.8 | 58.1 | 56.8 |
| Nano-Banana-2.0 | 68.5 / 92.5 | 85.6 | 34.4 | 54.0 | 54.9 |
| GPT-Image-1.5 | 35.9 / 81.6 | 55.0 | 12.0 | - | - |
| Qwen-Image-2.0 | 45.5 / 65.8 | 50.0 | 3.0 | 54.1 | 50.9 |
| Seedream-4.5 | 30.1 / 66.2 | 61.0 | 6.0 | 56.4 | 55.0 |
| Open-source Models | |||||
| SenseNova-U1-8B-MoT-Infographic | 46.6 / 65.4 | 69.5 | 17.0 | 55.6 | 53.3 |
| SenseNova-U1-8B-MoT | 39.8 / 61.1 | 51.3 | 4.2 | 54.5 | 53.8 |
| Z-Image | 8.2 / 43.8 | 30.0 | 1.0 | 54.6 | 53.5 |
| Qwen-Image-2512 | 6.3 / 41.0 | 32.2 | 1.0 | 53.0 | 51.5 |
| Qwen-Image | 2.8 / 23.8 | 36.0 | 0.0 | 53.9 | 54.8 |
| Bagel | 2.0 / 3.7 | 4.9 | 0.0 | 36.1 | 37.0 |
IGenBench scores are reported as percentages. Models are ordered by the arithmetic mean of BizGenEval hard, BizGenEval easy, IGenBench Q-ACC, and IGenBench I-ACC within the commercial and open-source groups separately. OneIG is included as a general generation reference. Full per-category results are intended for the Hugging Face model card.
๐ฐ High-density information rendering (Specialized): This specific model demonstrates strong capabilities in dense visual communication, generating richly structured layouts for knowledge illustrations, posters, presentations, comics, resumes, and other information-rich formats.
๐ Open-source SoTA: SenseNova U1 sets a new standard for unified multimodal understanding and generation, achieving state-of-the-art infographic performance among open-source models.
๐จ Infographic Showcases
๐ธ More generation samples: see โจ Infographic Showcases.
โจ Click to collapse infographic showcases
Qualitative Comparison
We present a qualitative comparison between the base SenseNova-U1-8B-MoT and the fine-tuned SenseNova-U1-8B-MoT-Infographic model across five key dimensions: background stability, chart accuracy, text Rendering Accuracy and size appropriateness, arXiv paper rendering quality, and overall layout and content understanding. For the full comparison, please refer to โจ Comparation Infographic Cases.
โจ Click to collapse qualitative comparison
Background Stability
| U1-8B-MoT | 8B-MoT-Infographic | U1-8B-MoT | 8B-MoT-Infographic |
|---|---|---|---|
Prompt่ฏฅไฟกๆฏๅพ้ขไธบโ็ๆ่ง่งๆฆ่งโ๏ผๆดไฝ้็จๆจชๅๅๆ ๅธๅฑ๏ผๅไธบไธไธไธคไธชไธป่ฆ้จๅใไธๅ้จๅไธบ่ง่งๅๆฆ่งๅบ๏ผ็ฑๅไธชๅฝฉ่ฒ็ฉๅฝขๅบๅๅนถๅ็ปๆ๏ผๆฏไธชๅบๅ้่ฟๅพๆ ๅ็ฎ็ญๆ ้ขไผ ่พพไธไธชๆ ธๅฟๆฆๅฟต๏ผไธๅ้จๅไธบโใ็ๆๅบ็กๅธธ่ฏใโ่ฏฆ็ป่งฃ้ๅบ๏ผๅ
ๅซๅไธช็ผๅทๆก็ฎ๏ผๅฏนๅบไธๅ้จๅ็ๅไธชไธป้ข๏ผๆไพๆด่ฏฆๅฐฝ็ๆๅญ่ฏดๆใ **ไธๅ้จๅ๏ผ็ๆ่ง่งๆฆ่ง** ๆญคๅบๅ็ฑๅไธชๆฐดๅนณๆๅ็ๅฝฉ่ฒๆนๅๆๆ๏ผไปๅทฆ่ณๅณไพๆฌกไธบๆต ่่ฒใๆต ้ป่ฒใๆต ็ปฟ่ฒๅๆต ็ดซ่ฒ๏ผๆฏไธชๆนๅๅ ๅซไธ็ปๅพๆ ๅไธๆน็ไธญๆๆ ้ขใ 1. **็ฌฌไธๅ๏ผๆต ่่ฒ๏ผ๏ผๅไฝๅณไบง็** * **ๅพๆ **๏ผๅทฆไพงๆฏไธไธชๅๅ ็็ฏๆณก๏ผไธญ้ดๆฏไธไธชๅธฆๆ็ฌ็ๆๆกฃๅพๆ ๏ผๅณไพงๆฏไธไธช้ๅคดๅพๆ ๏ผไธ่ ไน้ด็จ็ฎญๅคด่ฟๆฅ๏ผ่กจ็คบโๅๆ โ ๅไฝ โ ไฟๆคโ็ๆต็จใ * **ๆๅญ**๏ผ * ๅพๆ ไธๆนๆๅฐๅญโ่ชๅจไฟๆคโใ * ๆนๅๅบ้จๆๅคงๅญๆ ้ขโๅไฝๅณไบง็โใ 2. **็ฌฌไบๅ๏ผๆต ้ป่ฒ๏ผ๏ผๆ ธๅฟๆๅฉ** * **ๅพๆ **๏ผไธญๅฟๆฏไธๅชๆๆๅไธๆไธพ๏ผไธๆนๆๅคไธชๅ ็ด ๅด็ป๏ผไธไธชๅธฆยฉ็ฌฆๅท็ๅๅใไธไธชๅๅญใไธๅ ้ๅธๅ็พๅ ็ฌฆๅทใไปฅๅๅคไธชๆๅไธๅๆนๅ็็ฎญๅคด๏ผ่ฑกๅพๆๅฉ็ๅค็ง่กจ็ฐๅฝขๅผๅๆถ็ใ * **ๆๅญ**๏ผ * ๅพๆ ไธๆนๆ ้ขๅคๅฐๅญใ * ๆนๅๅบ้จๆๅคงๅญๆ ้ขโๆ ธๅฟๆๅฉโใ 3. **็ฌฌไธๅ๏ผๆต ็ปฟ่ฒ๏ผ๏ผ็นๅฎๆกไปถๅนณ่กก** * **ๅพๆ **๏ผไธไธชๅคฉๅนณ๏ผๅทฆไพงๆ็ไธๆๆๅผ็ไนฆๆฌๅๆ ๆโNEWSโ็้บฆๅ ้ฃ๏ผไปฃ่กจโๅ็ไฝฟ็จโ๏ผๅณไพงๆ็ไธๆไธไธชๅธฆ้็ๆไปถๅคน๏ผไปฃ่กจโๅๆงไฝๅโใๅคฉๅนณๅๅณไพงๅพๆใ * **ๆๅญ**๏ผ * ๅทฆไพงๆ็ไธๆนๆ ๆณจโๅ็ไฝฟ็จโใ * ๅณไพงๆ็ไธๆนๆ ๆณจโๅๆงไฝๅโใ * ๆนๅๅบ้จๆๅคงๅญๆ ้ขโ็นๅฎๆกไปถๅนณ่กกโใ 4. **็ฌฌๅๅ๏ผๆต ็ดซ่ฒ๏ผ๏ผไฟๆคๆ้** * **ๅพๆ **๏ผๅทฆไพงๆฏไธไธชๆฒๆผ๏ผไธญ้ดๆฏไธไธชๅๅณ็็ฒ็ฎญๅคด๏ผๅณไพงๆฏไธไธชๅข็ข๏ผ้กถ้จๆๅๅญๆถ๏ผใๆฒๆผไธๆน่ฟๆไธไธชๆถ้ๅพๆ ใ * **ๆๅญ**๏ผ * ๅข็ขๆๆ ๆณจโไฝ่ ๆ็ไนๅนด + Xๅนดโใ * ๆนๅๅบ้จๆๅคงๅญๆ ้ขโไฟๆคๆ้โใ **ไธๅ้จๅ๏ผใ็ๆๅบ็กๅธธ่ฏใ** ๆญคๅบๅไฝไบไธๅ้จๅไธๆน๏ผ่ๆฏไธบ็ฝ่ฒ๏ผๅ ๅซๅไธช็ฌ็ซ็ๆๆฌๆก๏ผๆฏไธชๆๆฌๆก้ฝๆไธไธชๅฝฉ่ฒๆ ้ขๆ ๅไธๆน็่ฏฆ็ป่ฏดๆๆๅญ๏ผ้ข่ฒไธไธๅ้จๅๅฏนๅบใ 1. **1. ่ชๅจ่ทๅพไฟๆค** * **ๆ ้ขๆ **๏ผ่่ฒ่ๆฏ๏ผ็ฝ่ฒๆๅญโ1. ่ชๅจ่ทๅพไฟๆคโใ * **ๆญฃๆ**๏ผโไฝๅๅไฝๅฎๆไนๆถ่ตท๏ผๅณ่ชๅจไบซๆ็ๆ๏ผๆ ้็ป่ฎฐ๏ผ็ป่ฎฐไธป่ฆๆฏไธพ่ฏ๏ผใโ 2. **2. ๆ ธๅฟๆๅฉ** * **ๆ ้ขๆ **๏ผๆฉ้ป่ฒ่ๆฏ๏ผ็ฝ่ฒๆๅญโ2. ๆ ธๅฟๆๅฉโใ * **ๆญฃๆ**๏ผโๅ ๆฌไบบ่บซๆ๏ผๅฆ็ฝฒๅๆใไฟฎๆนๆ๏ผๅ่ดขไบงๆ๏ผๅฆๅคๅถๆใๅ่กๆใไฟกๆฏ็ฝ็ปไผ ๆญๆ๏ผๅฏ่ฎธๅฏๆ่ฝฌ่ฎฉ่ทๅฉ๏ผใโ 3. **3. ๅ็ไฝฟ็จ** * **ๆ ้ขๆ **๏ผ็ปฟ่ฒ่ๆฏ๏ผ็ฝ่ฒๆๅญโ3. ๅ็ไฝฟ็จโใ * **ๆญฃๆ**๏ผโๅจ็นๅฎๆกไปถไธ๏ผๅฆๆๅญฆใๆฐ้ปๆฅ้ใไธชไบบๅญฆไน ็ญ๏ผ๏ผๅฏไปฅไธ็ป่ฎธๅฏใไธๆฏไปๆฅ้ ฌไฝฟ็จ๏ผไฝ้ๆๆไฝ่ ๅๅบๅค๏ผไธไธๅพไพต็ฏๅ ถไปๆๅฉใโ 4. **4. ไฟๆคๆ้** * **ๆ ้ขๆ **๏ผ็ดซ่ฒ่ๆฏ๏ผ็ฝ่ฒๆๅญโ4. ไฟๆคๆ้โใ * **ๆญฃๆ**๏ผโไธ่ฌไธบไฝ่ ๆ็ไนๅนดๅ ๆญปๅ50ๅนด๏ผไธญๅฝๅคง้็ญๅคๆฐๅฐๅบ๏ผ๏ผๆ้ๅฑๆปกๅ่ฟๅ ฅๅ ฌๆ้ขๅใโ **ๆดไฝ้ฃๆ ผไธๆฐๆฎ็ผ็ **๏ผ ่ฏฅไฟกๆฏๅพ้็จๆๅนณๅ่ฎพ่ฎก้ฃๆ ผ๏ผ่ฒๅฝฉ้ฒๆไธๅๅบๆธ ๆฐใ้่ฟ้ข่ฒ็ผ็ ๏ผ่ใ้ปใ็ปฟใ็ดซ๏ผๅฐๅไธชไธป้ข่ฟ่ก่ง่งๅบๅ๏ผๅนถๅจไธไธไธค้จๅไฟๆไธ่ดใๅพๆ ไฝไธบไธป่ฆ็ๆฐๆฎๅฏ่งๅๆๆฎต๏ผ็ด่งๅฐ่กจ่พพไบๆฝ่ฑกๆฆๅฟตใๆๆๆๅญๅไธบ็ฎไฝไธญๆ๏ผๅ ๅฎน็ปๆไธฅ่ฐจ๏ผ้ป่พๆธ ๆฐ๏ผๆจๅจไปฅๅพๆ็ปๅ็ๆนๅผๆฎๅ็ๆๅบ็ก็ฅ่ฏใ |
Prompt่ฏฅไฟกๆฏๅพไปฅไธญๆไธบไธป่ฆ่ฏญ่จ๏ผ้็จๆจชๅๅๆ ผๅธๅฑ๏ผๆธ
ๆฐๅ็ฐไธไธชๅ็ไป่กฐ่ฝๅฐๅคๅ
ด็ๅไธชๅ
ณ้ฎ้ถๆฎตใๆดไฝ้ฃๆ ผไธบๆ็ปๅก้ๆ็ป๏ผ่ฒๅฝฉๆๅ๏ผ็บฟๆก็ฎๆด๏ผๅ
ทๆไบฒๅๅๅๅไบๆงใๆฏไธช้ถๆฎต็ฑไธๆน็ๆ ้ขใไธญ้ด็ๆๅพๅไธๆน็ๆๅญ่ฏดๆไธ้จๅๆๆ๏ผ้่ฟ่็บฟๅ้๏ผ็ปๆๅๆใ ็ฌฌไธ้ถๆฎตๆ ้ขไธบโ1. ๆพ็ป็่พ็ ไธๆฒก่ฝโ๏ผๆๅพๆ็ปไบไธๅบง็ ด่ดฅ็ๅๅ ก๏ผๅๅ กไธๆ็ๆฒไผค็่กจๆ ๏ผๅจๅดๆฃ่ฝ็็ๅ ๏ผ่ฑกๅพๆๆฅ่ฃ่็ๆถ้๏ผๆ่พน็ซๆๆ ็โOLD BRANDโ๏ผ่ๆฏไธญๅฏ่งๅคงๆฌ้๏ผๆ็คบไผ ็ปๆๅๅฒๅ็ใไธๆนๆๅญ่ฏดๆ๏ผโๆพ็ปๆฏๅธๅบ้ขๅฏผ่ ๏ผไฝๆช่ฝ่ทไธๆถไปฃๆญฅไผ๏ผ้ๆธ่ขซ้ๅฟ๏ผ้ขไธด็ๅญๅฑๆบใโ ็ฌฌไบ้ถๆฎตๆ ้ขไธบโ2. ๅๆฐไธ้ๅกโ๏ผๆๅพๅฑ็คบๅไบบๅข้ๅดๅ่ฎจ่ฎบ๏ผๅ ถไธญไธไบบๆๅ็ฝๆฟไธ็็ปฟ่ฒๅถๅญๆ ๅฟ่ฎพ่ฎก๏ผๅจๅด็ฏ็ป้ฝฟ่ฝฎใ็ฏๆณก๏ผไปฃ่กจๅๆ๏ผๅๆ ็โNEW IDEASโใไธๆนๆๅญ่ฏดๆ๏ผโ่ฟ่กๆทฑๅบฆๅธๅบ่ฐ็ ๏ผ้ๆฐๅฎไฝๅ็๏ผๅผๅ ฅๅๆฐ่ฎพ่ฎกๅๆฐๅญๅ็ญ็ฅ๏ผ้ๅกๆ ธๅฟไปทๅผใโ ็ฌฌไธ้ถๆฎตๆ ้ขไธบโ3. ๆๅ็ฟป็โ๏ผๆๅพๅ ๅซไธๅชๆตด็ซ้็็ๅคๅฐ๏ผ่ฑกๅพๆถ ๆง๏ผๅณไพงๆฏไธๅ่ถๅฟ็ๆฑ็ถๅพ๏ผไธๆนๆฏไธไธชๅธฆๆ็ฑๅฟ็ๅ ่ฃน๏ผไปฃ่กจไบงๅไบคไป๏ผไธ็พคๆฌขๅผ็ไบบ็พค่กจ่พพๅๆฆใไธๆนๆๅญ่ฏดๆ๏ผโๅญๅๆฐไบงๅๅๆฐๅฝข่ฑก้่ทๆถ่ดน่ ไฟกไปป๏ผไธ็ปฉ้ๅฟไธๆฌ๏ผ้ๆฐ่ตขๅพๅธๅบไปฝ้ขใโ ็ฌฌๅ้ถๆฎตๆ ้ขไธบโ4. ๆชๆฅๅฑๆโ๏ผๆๅพๆ็ปไธๆ็ซ็ฎญไปๅฐ็่ฝจ้ๅๅฐๅ็ฉบ๏ผๅจๅดๆๆๆใไบๆตๅไธ็็ปฟๅถ๏ผ่ฑกๅพๅฏๆ็ปญๅๅฑ๏ผไธๆนๆจชๅน ๅ็โFUTURE READYโใไธๆนๆๅญ่ฏดๆ๏ผโๆ็ปญๅๆฐ๏ผๅ ณๆณจๅฏๆ็ปญๅๅฑๅ็จๆท่ฟๆฅ๏ผ็ซๅฟๆไธบๆดๅ ทๅฝฑๅๅ็ๆชๆฅๅ็ใโ ๆดไธชไฟกๆฏๅพ้่ฟ่ง่ง้ๅป๏ผๅฆๅๅ กใๅคๅฐใ็ซ็ฎญ๏ผๅๆฐๆฎๅพ่กจ๏ผๆฑ็ถๅพ๏ผ็ปๅ๏ผ็ๅจ่ฎฒ่ฟฐไบไธไธชๅ็ไปๅฑๆบๅฐๅคๅ ด็ๅฎๆดๆ ไบ๏ผๅผบ่ฐๅๆฐใ็จๆทไฟกไปปๅๅฏๆ็ปญๅๅฑ็้่ฆๆงใๆๆๆๆฌๅไธบ็ฎไฝไธญๆ๏ผๆ ่ฑๆไปฅๅค็ๅ ถไป่ฏญ่จใ |
||
PromptThe infographic titled "College Entrance Pathway Reforce Comparison" presents a structured comparison of key aspects for prospective students in Guangdong, China, aiming to enter college through a specialized entrance examination. The layout is organized as a multi-column table with four main columns: "Content Item / Evaluation Criteria", "Statistics", "Quotes", and "Key Terms". Each row corresponds to a distinct evaluation criterion or step in the preparation process, with visual icons, text, and data points enhancing clarity. The infographic uses a clean, minimalist design with black line art icons on a light beige background. Text is primarily in bold sans-serif font, with headings emphasized for readability. Data is encoded using icons (e.g., graduation cap, calendar, books, target, rocket) to visually represent concepts, while numerical values are explicitly labeled for precision. The first row addresses **Eligibility Criteria**: - In the "Statistics" column, it features an icon of a person checking a map of Guangdong with the text: "Official Eligibility Requirements Confirm if you qualify to register". - The "Quotes" column lists three eligible groups with corresponding icons: "Final-Year Guangdong Junior College Student", "Guangdong Resident <2 Years Post Graduation", and "Eligible Retired Military Personnel". - The "Key Terms" column shows a magnifying glass over a document with the label: "Eligibility Verification". The second row covers **Exam Structure & Scoring Breakdown**: - "Statistics" displays icons representing different test types and scores: 100 pts (graduation cap), 200 pts (person at desk), 1000 pts (document with pen), 150 pts (document with pen). Below: "Total 500 points across 4 test papers". - "Quotes" lists four subject components in document-shaped boxes: "Political Theory (100 pts)", "Major-Aligned Public Subject (100 pts)", "Professional Subject 1 (150 pts)", "Professional Subject 2 (150 pts)". - "Key Terms" includes a balance scale icon with "Score Distribution". The third row details the **Official Annual Exam Timeline**: - "Statistics" contains a horizontal timeline with icons of a calendar and clock, labeled "Annual Key Timeline". - "Quotes" provides a detailed timeline: Jan: Registration Open โ Jan: Admission Open โ Mid-Mar: Exam Date โ Mid-Apr: Score Release โ May-Jun: Admission Offers. - "Key Terms" shows a calendar and clock with "Critical Dates". The next three rows outline a three-step preparation strategy: **Step 1 - Confirm Target Major & Institution**: - "Statistics": Icon of a person holding a map with a target, text: "Confirm your target 6 months in advance". - "Quotes": Two bullet points: "Download official exam syllabi and past professional subject papers from the target institutionโs admission portal" and "Cross-verify that your junior college major meets the target majorโs prerequisite requirements". - "Key Terms": Clock and books with "Target Selection". **Step 2 - Public Subject Foundation Building**: - "Statistics": Icon of a person studying with books and a coffee cup, text: "Complete 3 months of structured public subject study". - "Quotes": Two bullet points: "Complete 5+ years of past public subject exam papers to identify recurring test points" and "Political Theory allocates 30% of total score to current affairs from the past calendar year". - "Key Terms": Box with lightbulb and "Core Knowledge". **Step 3 - Professional Subject Sprint Revision**: - "Statistics": Icon of a running person with a book and clock, text: "Focus on high-weight professional subjects in the final 2 months". - "Quotes": Two bullet points: "Practice past professional subject papers from your target institution and review core major textbooks" and "60% of professional subject questions are repeated or adapted from past 3 years of papers for most institutions". - "Key Terms": Trophy and gears with "Intensive Review". Red horizontal lines separate the first three criteria from the three-step strategy, while a blue line separates Step 1 from Steps 2 and 3, visually grouping related content. All textual information is preserved exactly as presented, including spelling variations like "Oficial" (likely intended as "Official"). The infographic serves as a strategic roadmap combining official requirements, scoring details, timelines, and actionable preparation steps for candidates. |
PromptThe infographic titled "12-Month Market Performance: US vs. Asia" presents a structured, puzzle-piece-based visual analysis comparing the performance of US and Asian equity markets over a 12-month period. The layout is organized into three main steps, arranged in a central vertical flow with interconnected puzzle pieces, emphasizing a modular, analytical approach to market comparison. The design uses clean black-and-white line art with light blue accents for key sections, icons for visual representation, and clear typography for readability. **Step 1** (top center) introduces the scope of the analysis. It features an illustration of four people examining charts, symbolizing data analysis. To the right, it defines the market indices being compared: - **US Markets**: S&P 500, NASDAQ - **Asian Markets**: Nikkei 225, Hang Seng, KOSPI, CSI 300 It also lists the types of data analyzed: - Trailing Return (represented by a rising bar chart icon) - Average Daily Volume (represented by a stacked bar chart icon) - Top Sector Return (represented by a pie chart icon) **Step 2** (left side, labeled "Metrics that account for 72% of short-term S&P 500 volatility") focuses on US Market Core Driving Indicators. This section contains icons representing industry (factory), finance (bank building), money (hand holding dollar sign), and labor (worker in hard hat). Below these icons, a light blue banner reads "US Market Core Driving Indicators". Specific metrics are listed with red warning triangle icons: - CPI YoY: 3.2% - Federal Funds Rate: 5.25โ5.5% - Non-farm Payrolls: +187k July 2024 **Step 3** (right side, labeled "Metrics that predict 68% of MSCI Asia Ex-Japan 3-month forward returns") focuses on Asian Market Core Leading Indicators. This section includes icons for shipping (container), manufacturing (gears), and calculation (calculator). A light blue banner below reads "Asian Market Core Leading Indicators". Specific metrics are listed: - Manufacturing PMI: 51.2 (with red warning triangle) - Q2 Export Growth: +6.8% YoY (with red warning triangle) - Avg Policy Rate: 3.1% (with information circle icon) At the bottom center, a large puzzle piece titled "Policy Shifts & Market Volatility Correlation" displays a line graph with two fluctuating lines: - **US VIX (navy line)** โ representing US market volatility - **Asian Avg Volatility (green line)** โ representing average Asian market volatility Arrows connect the two lines, indicating correlation. Below the graph, key insights are provided with red warning triangles: - Rate hike impact: +27% US VIX - Trade policy impact: +34% Asian VIX - Cross-regional sell-off correlation: 0.68 The overall structure visually represents how US and Asian market performances are driven by distinct but interrelated economic indicators, with a central focus on their volatility dynamics and policy impacts. The use of puzzle pieces metaphorically suggests that these components fit together to form a complete picture of global market trends. The infographic employs consistent iconography, color-coding (red for warnings, blue for core sections), and clear textual labeling to convey complex financial data in an accessible format. |
||
Chart Accuracy
| U1-8B-MoT | 8B-MoT-Infographic | U1-8B-MoT | 8B-MoT-Infographic |
|---|---|---|---|
PromptCreate an infographic that features a title and a subtitle centered at the top, reading 'Fastest Cuisines to Prepare' and 'Average Ghost Kitchen Handover Time by Item Type (Minutes)' respectively. The main visual is a horizontal grouped bar chart combining a Fast-food neon visual style with checkerboard borders along the edges, featuring a centered legend above the chart area for 'QuickEats' (cyan neon border) and 'DashNow' (orange neon border). To the bottom right of the bar chart, there is a simple illustration of two mopeds waiting for orders. The chart's vertical axis lists four categories, each preceded by a simple icon, while the horizontal axis represents handover time in minutes with numerical labels at 0, 5, 10, 15, and 20, supplemented by dotted vertical gridlines. Each category features a pair of black bars representing the two platforms, with exact values displayed directly inside the right end of each bar. For 'Classic Tacos', QuickEats takes 10.0 minutes while DashNow takes 11.5 minutes. 'Supreme Burritos' require the longest preparation, with 17.5 minutes for QuickEats and 19.0 minutes for DashNow. 'Spicy Nachos' take 9.5 minutes on QuickEats and 10.0 minutes on DashNow. Finally, 'Mini Quesadillas' are the fastest, taking 8.0 minutes for QuickEats and 8.5 minutes for DashNow. The given data is : [{"category": "Classic Tacos", "platform": "QuickEats", "unit": "Minutes", "value": 10.0}, {"category": "Classic Tacos", "platform": "DashNow", "unit": "Minutes", "value": 11.5}, {"category": "Supreme Burritos", "platform": "QuickEats", "unit": "Minutes", "value": 17.5}, {"category": "Supreme Burritos", "platform": "DashNow", "unit": "Minutes", "value": 19.0}, {"category": "Spicy Nachos", "platform": "QuickEats", "unit": "Minutes", "value": 9.5}, {"category": "Spicy Nachos", "platform": "DashNow", "unit": "Minutes", "value": 10.0}, {"category": "Mini Quesadillas", "platform": "QuickEats", "unit": "Minutes", "value": 8.0}, {"category": "Mini Quesadillas", "platform": "DashNow", "unit": "Minutes", "value": 8.5}] |
PromptCreate an infographic that presents a centered title at the top, stating "รbertaktet vs. Standard-Takt", with the subtitle "Temperaturanstieg bei langen Gaming-Sessions" directly below it. The main visual is a line chart spanning the width of the infographic on a dark background, embodying a Gamer Aesthetic with vibrant RGB neon accents. This chart has a vertical axis on the left labeled with numerical values in increments of 10 from 30ยฐC to 100ยฐC, and a horizontal axis at the bottom with time labels: '0m', '15m', '30m', '45m', '60m', '75m', '90m', '105m', and '120m'. Horizontal grid lines mark each 10ยฐC increment. A horizontal legend is positioned under the subtitle, containing a cyan circular marker and line for "Standard-Takt" and a magenta circular marker and line for "รbertaktet (+150MHz)". Two data series are plotted as glowing neon lines with hollow circular markers at each data point, accompanied by gradient shading below each line. The cyan "Standard-Takt" line shows a steep rise from 38ยฐC at 0m to 68ยฐC at 15m, followed by a flat plateau reaching 73.5ยฐC at 120m. The magenta "รbertaktet" line displays a similar initial spike from 42ยฐC to 75ยฐC, but continues with a gradual linear creep up to 93ยฐC at 120m. Spike annotations (callout boxes) point to the final data points on the right, highlighting the peak temperatures: a magenta box reads "Peak: 93ยฐC" and a cyan box reads "Peak: 73.5ยฐC". A stylized thermometer line-art icon is subtly placed in the center of the chart's background. The given data is : [{"profile": "Standard-Takt", "temperature": 38, "time": "0m"}, {"profile": "รbertaktet", "temperature": 42, "time": "0m"}, {"profile": "Standard-Takt", "temperature": 68, "time": "15m"}, {"profile": "รbertaktet", "temperature": 75, "time": "15m"}, {"profile": "Standard-Takt", "temperature": 71, "time": "30m"}, {"profile": "รbertaktet", "temperature": 79, "time": "30m"}, {"profile": "Standard-Takt", "temperature": 72, "time": "45m"}, {"profile": "รbertaktet", "temperature": 82, "time": "45m"}, {"profile": "Standard-Takt", "temperature": 72.5, "time": "60m"}, {"profile": "รbertaktet", "temperature": 85, "time": "60m"}, {"profile": "Standard-Takt", "temperature": 73, "time": "75m"}, {"profile": "รbertaktet", "temperature": 87, "time": "75m"}, {"profile": "Standard-Takt", "temperature": 73, "time": "90m"}, {"profile": "รbertaktet", "temperature": 89, "time": "90m"}, {"profile": "Standard-Takt", "temperature": 73.5, "time": "105m"}, {"profile": "รbertaktet", "temperature": 91, "time": "105m"}, {"profile": "Standard-Takt", "temperature": 73.5, "time": "120m"}, {"profile": "รbertaktet", "temperature": 93, "time": "120m"}] |
||
PromptCreate an infographic that displays data in a vertical diverging bar chart format. At the top left of the visualization, there is a title: 'Anomalie de l'Atlantique Sud : Dรฉrive magnรฉtique', and a subtitle: 'Vecteurs de dรฉrive vers l'est et l'ouest en kilomรจtres par rapport ร la ligne de base historique'. In the upper left area below the text, an icon of a compass rose is placed within a magnetic field line curve. The main chart features a horizontal zero-axis line, labeled with a '0' on the far left, representing the historical coordinate baseline. The x-axis at the bottom displays the decades '1980', '1990', '2000', '2010', and '2020', each marked with a small vertical tick. For each decade, a vertical bar extends from the zero-axis, with its corresponding data label positioned directly at the end of the bar. The data shows westward drift represented by blue bars extending below the axis for '1980' with a value of '-15 km' and '1990' with a value of '-32 km'. Eastward drift is represented by red bars extending above the axis for '2000' with a value of '+10 km', '2010' with a value of '+45 km', and '2020' with a value of '+68 km'. The overall visual style mimics a geophysical science journal, utilizing compass red and blue color tones. The given data is : [{"decade": "1980", "drift_km": -15}, {"decade": "1990", "drift_km": -32}, {"decade": "2000", "drift_km": 10}, {"decade": "2010", "drift_km": 45}, {"decade": "2020", "drift_km": 68}] |
PromptCreate an infographic in a corporate report minimalism style with muted corporate grays and blues, featuring a large title, 'Seasonal Fluctuations in 15-Year Mortgages', at the top. Directly below it is a subtitle, 'Historical prepayment velocities showing seasonal housing market trends'. Underneath the subtitle, a horizontal legend identifies two categories with small square icons: 'Spring/Summer Originations' in lighter gray-blue and 'Fall/Winter Originations' in darker gray-blue. The main visual is a multi-line chart in a wide landscape orientation. The vertical axis has numeric labels at 0.0, 5.0, 10.0, 15.0, and 20.0, with horizontal grid lines extending across the plot. The horizontal axis features labels: 'Jan 2018', 'Apr', 'Jul', 'Oct', 'Jan 2019', 'Apr', and 'Jul'. An icon depicting a sleek house silhouette is positioned in the upper left corner of the chart's plotting area. Two distinct lines represent the categories, characterized by cyclical seasonal bumps in the summer months. Both lines have square markers at each data point, with numerical values displayed near them. The lighter line for 'Spring/Summer Originations' plots a value of 8.0 in Jan 2018, rising to 12.5 in Apr, peaking at 16.0 in Jul, dipping to 11.0 in Oct, dropping further to 7.5 in Jan 2019, climbing to 13.0 in Apr, and reaching 17.5 in Jul. The darker line for 'Fall/Winter Originations' mirrors this pattern, starting at 6.5 in Jan 2018, increasing to 9.0 in Apr, hitting 14.5 in Jul, falling to 10.0 in Oct, bottoming out at 6.0 in Jan 2019, rising to 10.5 in Apr, and ending at 15.0 in Jul. The given data is : [{"category": "Spring/Summer Originations", "date": "2018-01", "value": 8.0}, {"category": "Fall/Winter Originations", "date": "2018-01", "value": 6.5}, {"category": "Spring/Summer Originations", "date": "2018-04", "value": 12.5}, {"category": "Fall/Winter Originations", "date": "2018-04", "value": 9.0}, {"category": "Spring/Summer Originations", "date": "2018-07", "value": 16.0}, {"category": "Fall/Winter Originations", "date": "2018-07", "value": 14.5}, {"category": "Spring/Summer Originations", "date": "2018-10", "value": 11.0}, {"category": "Fall/Winter Originations", "date": "2018-10", "value": 10.0}, {"category": "Spring/Summer Originations", "date": "2019-01", "value": 7.5}, {"category": "Fall/Winter Originations", "date": "2019-01", "value": 6.0}, {"category": "Spring/Summer Originations", "date": "2019-04", "value": 13.0}, {"category": "Fall/Winter Originations", "date": "2019-04", "value": 10.5}, {"category": "Spring/Summer Originations", "date": "2019-07", "value": 17.5}, {"category": "Fall/Winter Originations", "date": "2019-07", "value": 15.0}] |
||
Text Rendering Accuracy and Size Appropriateness
| U1-8B-MoT | 8B-MoT-Infographic | U1-8B-MoT | 8B-MoT-Infographic |
|---|---|---|---|
Prompt่ฏฅไฟกๆฏๅพไปฅๆ็ป็ฌ่ฎฐๆฌ้ฃๆ ผๅ็ฐ๏ผๆ ้ขไธบโๅไผๅกๅๅธฆไฝ ๆธธ๏ผๅ ๆณฐ็ฝๅฐผไบๅฝๅฎถ่บๆฏๅ็ฉ้ฆ๏ผMNAC๏ผไธๅคฉไธคๅคไธ็ป่ทฏๆป็ฅโ๏ผๅฏๆ ้ขไธบโ่ก็จ่ทฏ็บฟไธๆถ้ดๅฎๆ๏ผไธญๆๆธ
ๆฐ็๏ผโใๆดไฝ้็จๆ้ป่ฒ่ฐ่ๆฏ๏ผๆญ้
ๆฃ่ฒ่พนๆกๅ่บๆ่ฃ
่ฎข็บฟ่ฎพ่ฎก๏ผ่ฅ้ ๅบๆธฉ้ฆจๅฏ็ฑ็ๆ
่กๆๅๆฐๅดใๅ
ๅฎนๅไธบไธไธชไธป่ฆๅ็ดๅบๅ๏ผๅๅซๅฏนๅบDAY 1ใDAY 2ใDAY 3๏ผๆฏไธชๅบๅ้กถ้จๆๅๅฝขๆถ้ๅพๆ ๅโDAY Xโๆ ็ญพ๏ผ็ปๆๆธ
ๆฐใ ๆฏไธชๆฅๆๅบๅๅ ๅไปฅๆถ้ด่ฝดๅฝขๅผๅๅบๅ ทไฝ่ก็จ๏ผไฝฟ็จๅ็น่ฟๆฅๆถ้ด็นไธๆดปๅจๆ่ฟฐ๏ผๅณไพง้ ๆๅไผๅกๅ็ณปๅ็ๅฏ็ฑๅก้ๅฝข่ฑกๆ็ป๏ผๅฆ็ฝ็ใ่็ซใๅ ๅญ็ญ๏ผ๏ผๅขๅผบ่ถฃๅณๆงใๆๆๆๅญๅไธบ็ฎไฝไธญๆ๏ผๅญไฝๆธ ๆฐๆ่ฏป๏ผ่ง่งๅฑๆฌกๅๆใ --- **DAY 1๏ผๆต่พพไธๅๆข** - **10:00** ๆต่พพๅทดๅก็ฝ้ฃ๏ผ้ ๅบๅ็ๅ ฅไฝ (Poble Secๅบ) โโ ้ ๆ็ฝ็ๆ็่กๆ็ฎฑ็ๆ็ปใ - **12:00** ๅ้ค๏ผ่ฅฟ็ญ็Tapas โโ ๆ็ปๆชๆพ็คบใ - **14:00** ๅๅพ่ฅฟ็ญ็ๅนฟๅบ (Plaza de Espaรฑa)๏ผ่ฟ็บMNACๅ จๆฏ โโ ้ ๆ่ฅฟ็ญ็ๅนฟๅบๅปบ็ญๆ็ปๅๅฐๅพ็ฎญๅคดใ - **16:00** ๅ่งMNACๅค้จๅปบ็ญไธๅจๅด่ฑๅญ โโ ้ ๆ่็ซๅจ่ฑไธไธญ่ทณ่ท็ๆ็ปใ - **19:00** ๆฌฃ่ต้ญๅนปๅทๆณ่กจๆผ (Magic Fountain) โโ ้ ๆๅธฆ้ชๅ ๆๆ็็ฝ็ๆ็ปใ - **20:30** ๆ้ค๏ผ้่ฟ้คๅ โโ ๆ็ปๆชๆพ็คบใ --- **DAY 2๏ผMNACๆทฑๅบฆ่บๆฏไนๆ ** - **09:30** ๆฉ้ค๏ผๆญฅ่ก่ณMNACๅ ฅๅฃ โโ ้ ๆ็ฝ็ๅ้ขๅ ็ๆ็ปใ - **10:00** ่ฟๅ ฅMNAC (ๅปบ่ฎฎๆๅ่ดญ็ฅจ)๏ผๅ่ง็ฝ้ฉฌๅผ่บๆฏ้ฆ่ โโ ้ ๆๅคๅ ธๆฒน็ปๆ็ปใ - **12:30** ้ฆๅ ็ฎ้คๆ้่ฟๅไผ โโ ๆ็ปๆชๆพ็คบใ - **14:00** ๅ่งๅฅ็นๅผใๆ่บๅคๅ ดๅๅทดๆดๅ ่บๆฏ้ฆ่ โโ ้ ๆ่ๅจไธฝ่้ฃๆ ผ่ๅ็ปๆ็ปๅ่็ซๅฝข่ฑกใ - **16:30** ๆข็ดข็ฐไปฃ่บๆฏ้ฆ่ (ๅ ๆณฐ็ฝๅฐผไบ็ฐไปฃไธปไน) โโ ้ ๆๆฝ่ฑก่บๆฏ้ฃๆ ผๆ็ปใ - **18:30** ๅๅพMNACๅฑ้กถ่งๆฏๅฐ๏ผไฟฏ็ฐๅๅธๆฅ่ฝ โโ ้ ๆๅ ๅญไธพๆๆบๆ็ ง็ๆ็ปใ - **20:00** ๆ้ค๏ผArenasๅๅบ้่ฟ โโ ๆ็ปๆชๆพ็คบใ --- **DAY 3๏ผ่็นๆ ๅฅๅฑฑๅจ่พนไธ่ฟ็จ** - **09:00** ๆฉ้ค๏ผ้ๆฟๅฏๅญ่กๆ โโ ๆ็ปๆชๆพ็คบใ - **10:00** ไนๅ็ผ่ฝฆๅๅพ่็นๆ ๅฅๅๅ ก (Montjuรฏc Castle) โโ ้ ๆ็ผ่ฝฆๆ็ป๏ผๅ ๅซไธๅชๅก้ๅจ็ฉใ - **12:00** ๅ่ง็ฑณ็ฝๅบ้ไผ (Joan Mirรณ Foundation) โโ ้ ๆ็ฑณ็ฝ้ฃๆ ผๆฝ่ฑก้ๅกๆ็ปใ - **13:30** ๅ้ค๏ผๅฅฅๆๅนๅ ๆธฏ้่ฟๆตท้ฒ้ฅญ โโ ๆ็ปๆชๆพ็คบใ - **15:00** ๆผซๆญฅๅฅฅๆๅนๅ ๅ ฌๅญ โโ ๆ็ปๆชๆพ็คบใ - **16:30** ๆๅ่กๆ๏ผๅๅพๆบๅบ/่ฝฆ็ซ่ฟ็จ โโ ้ ๆๅผๅฟๆฅๆ็็ฝ็ๆ็ปใ --- **ๅบ้จไบค้่ดดๅฃซๆ **๏ผ ้ ๆๅ ฌไบค่ฝฆใๅฐ้ใๆญฅ่ก้ๅพๆ ๏ผๆๅญไธบ๏ผโไบค้่ดดๅฃซ๏ผๅ็จT-casualไบค้ๅก๏ผๆญฅ่กๆข็ดขๆดไฝณ๏ผโ --- ๆดไฝๅพ่กจ็ฑปๅไธบๆถ้ดๅบๅๆต็จๅพ๏ผ้่ฟๅ็ดๅๆ ไธๆฐดๅนณๆถ้ด่ฝด็ปๅ็ๆนๅผ็ป็ปไฟกๆฏใๆฐๆฎ็ผ็ ๆนๅผๅ ๆฌๆถ้ด็น๏ผ็ฒพ็กฎๅฐๅ้๏ผใๅฐ็นๅ็งฐใๆดปๅจๆ่ฟฐๅ้ ๅฅๆ็ป๏ผๆๆไฟกๆฏๅๆ้ป่พ้กบๅบๆๅ๏ผไพฟไบ็จๆทๅฟซ้็่งฃๅนถๆง่กไธๅคฉ่ก็จ่ฎกๅใ่ง่งๅ ็ด ไธฐๅฏ๏ผๅ ผๅ ทๅฎ็จๆงๅ่ถฃๅณๆง๏ผ้ๅๆ ๆธธๆป็ฅ็ฑปๅ ๅฎนไผ ๆญใ |
PromptThe infographic presents a comprehensive architectural and structural analysis of the Temple of Kom Ombo, an ancient Egyptian temple located on the west bank of the Nile River. The title "TEMPLE OF KOM OMBO" is prominently displayed in a hand-drawn, white-bordered box in the lower-right corner of the image, set against a brown background that mimics sandstone or earth tones. The overall layout is divided into multiple sections: a central photographic image of the temple ruins under a clear blue sky, surrounded by illustrative technical diagrams, annotated floor plans, and textual data blocks, all rendered in white line art and text for high contrast. The central photograph shows the main hypostyle hall and surrounding structures of the temple, with visitors walking among the columns and courtyards, providing a sense of scale. In the background, the Nile River and palm trees are visible, situating the temple in its natural environment. The ruins are constructed from light-colored sandstone blocks, consistent with the material noted in the text. In the upper-left quadrant, a 3D axonometric diagram illustrates the overall dimensions of the temple complex: approximately 62 meters by 51 meters, labeled along the axes. Adjacent to this, a list of key structural facts is presented in bullet points: - TEMPLE AXIS: DOUBLE SANCTUARY FOR SOBEK & HORUS - OVERALL DIMENSIONS (APPROX. 62M x 51M) - CONSTRUCTION MATERIAL: SANDSTONE BLOCKS - COLUMN HEIGHTS: UP TO 12 METERS Above the central photo, two schematic diagrams illustrate architectural details: - A top-down view of the hypostyle hall showing 30 columns arranged in a grid, labeled โHYPOSTYLE HALL (30 COLUMNS)โ and pointing to โTWO SANCTUARIES.โ - A cross-section labeled โPYLON AND HYPOSTYLE SECTION,โ which includes a detailed vertical cutaway showing the roofing system supported by columns, with arrows indicating load paths down to foundations. To the right of the central image, text notes โTWO ENTRANCES SYMBOLIZING DUALITY,โ emphasizing the templeโs unique dual dedication. This concept is reinforced in the lower section of the infographic, where a detailed floor plan is overlaid on the brown ground area. The floor plan, drawn in white lines, is annotated with various features: - INNER TEMPLE (FOR SOBEK) โ marked with a rectangular inner sanctum. - INNER TEMPLE (FOR HAROERIS) โ another distinct inner sanctum, indicating the dual religious function. - NILOMETER โ a structure used to measure the Nileโs water level. - BIRTH HOUSE (MAMMISHI) โ a smaller chamber associated with fertility rituals. - MUMMIFIED CROCODILE MUSEUM SITE โ indicating a location within the temple complex for sacred crocodile mummies. - TWO ENTRANCES SYMBOLIZING DUALITY โ shown as two separate entryways on the plan. Surrounding the floor plan are inset images of relief carvings, each labeled: - MEDICAL INSTRUMENT RELIEFS โ depicting figures with tools. - TWO ENTRANCES RELIEFS โ showing doorways flanked by deities. - CALENDAR RELIEFS โ illustrating scenes related to timekeeping or agricultural cycles. Additional annotations point to structural aspects: - โSTRUCTURAL LOAD PATHS FROM COLUMNS TO FOUNDATIONSโ โ illustrated with curved arrows tracing the force transfer from columns through the walls to the ground. - The pylon and hypostyle section diagram also labels โROOFING SYSTEMโ and shows how the roof beams rest on column capitals. All textual content is in English, using a clean, sans-serif font that enhances readability. The visual style blends real photography with technical illustrations and hand-drawn elements, creating an educational and engaging format suitable for tourists, students, or archaeologists. The infographic effectively communicates both the physical characteristics and symbolic significance of the Temple of Kom Ombo, highlighting its duality, engineering, and cultural importance. |
||
Prompt่ฏฅไฟกๆฏๅพไปฅ้ปๆฟ้ฃๆ ผ่ฎพ่ฎก๏ผๆ ้ขไธบโๅฐๆน็น่ฒ&ๆดปๅจๅพฎไฟกๅ
ฌไผๅทๆจๅนฟๅ
จๆๅโ๏ผๆดไฝ้็จๆ็ป็ฒ็ฌๅญๆๆ๏ผ้
ไปฅๅฝฉ่ฒๅพๆ ๅ็ฎญๅคด๏ผ่ง่งไธๆจกๆ็ๅฎ้ปๆฟไนฆๅๅบๆฏใๅ
ๅฎน็ปๆๆธ
ๆฐ๏ผๅไธบไธไธชไธป่ฆ้จๅ๏ผ้่ฟ็ฐ่ฒๅผงๅฝข็ฎญๅคด่ฟๆฅ๏ผๅฝขๆ้ป่พ้่ฟๅ
ณ็ณป๏ผไปๆจๅนฟๅ
ๅฎนๆ ธๅฟๆนๅ โ ้ซ่ฝฌๅๆดปๅจๆจๅนฟ็ฉๆณ โ ๅพฎไฟกๅ
ฌไผๅท็ๆ้้
ๆจๅนฟๆๅทงใ ็ฌฌไธ้จๅ๏ผโๆจๅนฟๅ ๅฎนๆ ธๅฟๆนๅ๏ผๆทฑๆๆฌๅฐ็น่ฒ่ฎฐๅฟ็นโ๏ผๅผบ่ฐ้่ฟไธ็ฑป้ซๆต้ๆฌๅฐๅ ๅฎนๅธๅผ็จๆทๅ ฑ้ธฃๅนถๅธๅผๅคๅฐๆธธๅฎขๆๅก๏ผ - **ๆฌๅ็พ้ฃ**๏ผ้ป่ฒๆคญๅๆ ็ญพ๏ผ๏ผๅ ๅซ่ๅญๅทๅฐๅใๅญฃ่ๆง็น่ฒ้ฃไฟใ็คพๅบ้่ๅฐๅบๆขๅบๅ ๅฎน๏ผ้ ๆ็ญๆฑค็ขไธ็ญทๅญๅพๆ ใ - **ไบบๆ้ฃ็ฉ**๏ผๆฃ่ฒๆคญๅๆ ็ญพ๏ผ๏ผๆถต็้้ๆ่บไผ ๆฟๆ ไบใ่่ก่ๅททๅๅฒใๆฌๅฐๅไบบๆงๅฑ ๆข่ฎฟๅ ๅฎน๏ผ้ ๆไผ ็ปๅปบ็ญไธๅธ้ๅพๆ ใ - **ไพฟๆฐ็ฆๅฉ**๏ผ็ฒ่ฒๆคญๅๆ ็ญพ๏ผ๏ผๅ ๆฌๆฌๅฐไธๅฑๆถ่ดนๅธใๆฏๅบๅ ็ฅจๆฟ็ญใ่ๅบๆดปๅจ้ขๅ็ญๅ ๅฎน๏ผ้ ๆไผๆ ๅธไธ็คผ็ๅพๆ ใ ็ฌฌไบ้จๅ๏ผโ้ซ่ฝฌๅๆดปๅจๆจๅนฟ3็งๅฎ็จ็ฉๆณโ๏ผๆจๅจๆๆปกๅไธ่ฝฌๅ็๏ผ - **่ๅบๅธ้็ฉๆณ**๏ผๆฉ่ฒๆคญๅๆ ็ญพ๏ผ๏ผๅ ฌไผๅท้ข็ญๅๆฉ้ธ็ฅจ+็่จๆฝๅ ่ดนๅไธๅ้ข+็ฐๅบๆๅก่ฟ็ฐ๏ผ้ ๆ็ฏ็ฌผไธๆไฝๅพๆ ใ - **้้ไฝ้ช็ฉๆณ**๏ผ็ปฟ่ฒๆคญๅๆ ็ญพ๏ผ๏ผๅผๆพๅ ฌไผๅทไธๅฑๆฅๅ้้+ๆๅๅๅธไฝ้ชๅฎ้ขๅๅ ๅฎน+ๆดปๅจๅ็จๆทๆ็จฟ่ฟ็ฐ๏ผ้ ๆ้ถ่บไธ็ปๅธๆบๅพๆ ใ - **ๆถ่ดนไฟ่ฟ็ฉๆณ**๏ผ็ดซ่ฒๆคญๅๆ ็ญพ๏ผ๏ผ่ๅๆฌๅฐๅๅฎถๆจๅบๅ ฌไผๅทไธๅฑๆถ่ดนๅธๅ +ๅฐๅบๆ ธ้้ๅฎๅถๅจ่พน๏ผ้ ๆ่ดญ็ฉ่ขไธ้ถ่กๅกๅพๆ ใ ็ฌฌไธ้จๅ๏ผโๅพฎไฟกๅ ฌไผๅท็ๆ้้ ๆจๅนฟๆๅทงโ๏ผ่็ฆ้ไฝๆจๅนฟๆๆฌ๏ผ - **ๅ ๅฎนๅ็ฐๆๅทง**๏ผ่่ฒๆคญๅๆ ็ญพ๏ผ๏ผๅฐ้ขๅพ็จๆฌๅฐๆ ๅฟๆงๅปบ็ญ/็พ้ฃๅ่ง่ง็ฌฆๅท๏ผ้ฆๅพๆพ็ฝฎๆดปๅจๅ่ฎกๆถๆตทๆฅ๏ผๆๆซๅ ไธ้ฎๆฅๅ่ทณ่ฝฌ้พๆฅ๏ผ้ ๆๆๆบๅพๆ ใ - **ๆธ ้่ๅจๆๅทง**๏ผ้ป่ฒๆคญๅๆ ็ญพ๏ผ๏ผ่ง้ขๅทๅๅธๆดปๅจ่ฑ็ตฎๆ่ฝฝๅ ฌไผๅท้พๆฅ๏ผๆๅๅๅนฟๅๅฎๅๆจ้็ปๆฌๅฐ18-60ๅฒไบบ็พค๏ผๆฌๅฐ็คพ็พค่ฝฌๅๅธฆไธๅฑๆฝๅฅ็ ๏ผ้ ๆไธไบบ็คพไบค็ฝ็ปๅพๆ ใ - **็งๅ็ๅญๆๅทง**๏ผ็ปฟ่ฒๆคญๅๆ ็ญพ๏ผ๏ผๆดปๅจๅไธ่ ๅผๅฏผๆทปๅ ไผไธๅพฎไฟก๏ผๆๅ ฅๆฌๅฐ็ฆๅฉ็พคๅ็ปญๆ็ปญๆจ้ๆดปๅจไฟกๆฏ๏ผ้ ๆๅพฎไฟกๅฏน่ฏๆฐๆณกๅพๆ ใ ๆดไธชไฟกๆฏๅพๅธๅฑๅๅ็ดๆต็บฟๅ๏ผๅๆจกๅไน้ดไปฅๆฒ็บฟ็ฎญๅคด่ฟๆฅ๏ผๅณไพง็น็ผๆ็ฎ็ฌๅฐไบบๅๆๅนๅท็ญ่ฃ ้ฅฐๅ ็ด ๏ผๅขๅผบ่ถฃๅณๆงๅๅฏ่ฏปๆงใๆๅญๆ็ๅฑๆฌกๅๆ๏ผไธปๆ ้ข็ฝ่ฒ็ฒไฝ๏ผๅฏๆ ้ขไธๆ ธๅฟๆฆๅฟตไฝฟ็จ้ป่ฒๆๅฝฉ่ฒ็ชๅบ๏ผ็ป่่ฏดๆๅไธบ็ฝ่ฒๅธธ่งๅญไฝใๆๆๆๆฌๅไธบไธญๆ๏ผๆ ่ฑๆๆๅ ถไป่ฏญ่จๅ ๅฎนใ |
Prompt่ฏฅไฟกๆฏๅพ้ขไธบใๅฟ็ซฅ่ฅๅ
ป่กฅๅ
ๅ
จๆๅ๏ผ็งๅญฆๅปบ่ฎฎ+ไบงๅ้่ดญ่ฆ็นใ๏ผ้็จๆผซ็ป้ฃๆ ผ่ฎพ่ฎก๏ผ่ฒๅฝฉ้ฒๆ๏ผไปฅ็บขใ้ปใ่ไธบไธป่ฒ่ฐ๏ผๅธๅฑๆธ
ๆฐๅไธบๅทฆๅณไธคๅคงๆฟๅ๏ผๆฏไธชๆฟๅๅ็ปๅไธบๅคไธชๆจกๅ๏ผๅพๆๅนถ่ๅฐๅ็ฐไบๅฟ็ซฅ่ฅๅ
ป่กฅๅ
็็งๅญฆๆๅฏผไธๅฎ็จๅปบ่ฎฎใ ๆดไฝ็ปๆๅไธบโ็งๅญฆๅ่ๆๅผโๅโๅฎๆๅบ็จๆๅโไธคๅคงๆ ธๅฟ้จๅ๏ผ้่ฟๅก้ๆๅพใๅพๆ ใ็็ธๅผๅฏน่ฏๆกใๆ ็ญพ็ญ่ง่งๅ ็ด ๅขๅผบๅฏ่ฏปๆงไธๅธๅผๅใ --- **็ฌฌไธ้จๅ๏ผ็งๅญฆๅ่ๆๅผ** 1. **ๅ้พ่ฅๅ ป่กฅๅ ้็นๆธ ๅ** - ๆ ้ข๏ผโๅ้พ่ฅๅ ป่กฅๅ ้็นๆธ ๅโ๏ผๅฏๆ ้ข๏ผโๅ้พ่กฅ่ฅๅ ป๏ผ็ฒพๅๆด้ซๆ๏ผๅฏนๅบๅนด้พๆฎตๆ้่กฅๅ ๏ผ้ฟๅ ่ฟๅบฆๆๅ ฅโ - ๅ ๅฎนๆๅนด้พๅไธไธช้ถๆฎต๏ผ - **0-6ๆ้พ**๏ผๆฏๆฅๅธธ่ง่กฅๅ ็ปด็็ด D 400IU๏ผ็บฏๆฏไนณๅๅ ปๅฎๅฎ้้ขๅค่กฅๅ ็ปด็็ด Kใ้ ๅพ๏ผๅฉดๅฟๅคดๅใVit DๆณจๅฐๅจใVit K่ถๅใ - **7ๆ้พ-3ๅฒ**๏ผ้็น่กฅๅ ้๏ผFe๏ผใ้๏ผZn๏ผใDHA๏ผๆฏๆฅ็ปด็็ด D่กฅๅ ้็ปดๆๅจ400-600IUใ้ ๅพ๏ผๅนผๅฟๅคดๅใๆพๅคง้่งๅฏ่ถๅใFeๅZn็ฌฆๅทใ - **4-12ๅฒ**๏ผ้็น่กฅๅ ้๏ผCa๏ผใ็ปด็็ด AใBๆ็ปด็็ด ๏ผB_B๏ผ๏ผไฟ่ฏๆฏๆฅ่็ฝ่ดจๆๅ ฅ้่พพๆ ใ้ ๅพ๏ผ็ทๅญฉๅคดๅใCaๆฐๆณกใB_Bๆฐๆณกใ้ธก่ใ็ๅฅถ็ถใ็ผ็ๅพๆ ใ 2. **่ฅๅ ป่กฅๅ ๅๅ&ๅธธ่ง้ฟๅๆๅ** - ๆ ้ข๏ผโ่ฅๅ ป่กฅๅ ๅๅ&ๅธธ่ง้ฟๅๆๅโ๏ผๅฏๆ ้ข๏ผโ็งๅญฆ่กฅ่ฅๅ ป๏ผ่ฟไบๅ่ฆ้ฟๅผโ - ๅ ๅซไธคไธชๆ ธๅฟๅๅ๏ผ - **ไผๅ ่ณ้ฃๆๅ ฅ**๏ผ็ปฟ่ฒๅฏนๅพ๏ผ๏ผๆ ธๅฟๅๅ1๏ผๆฅๅธธๅ่กก้ฅฎ้ฃๆฏ่ฅๅ ปๆๅ ฅ็้ฆ่ฆๆฅๆบ๏ผไธๅฏ็จ่กฅๅ ๅไปฃๆฟๆญฃๅธธไธ้คใ้ ๅพ๏ผๅญฉๅญ็จ้คๅบๆฏ๏ผ็ไธญๆ่ฌ่ใๆฐดๆใ่็ฑปใ - **ๆ้้้่กฅๅ **๏ผ็บข่ฒSTOPๆ ๅฟ๏ผ๏ผๆ ธๅฟๅๅ2๏ผ่ฅๅ ป็ด ่กฅๅ ๅนถ้่ถๅค่ถๅฅฝ๏ผ่ฟ้ๆๅ ฅ็ปด็็ด Aใ้็ญๅฏ่ฝๅผๅไธญๆฏๆไปฃ่ฐข่ดๆ ใ้ ๅพ๏ผๅค็ถ่กฅๅ่ขซ็บข่ฒๅๅท่ฆ็ใ - **้ฟๅๆๅ**๏ผ้ป่ฒๆ ็ญพ๏ผ๏ผ - โ ไธๅไฝๆฃ่ฏไผฐ็ฒ็ฎ่ท้ฃ่กฅ โ - โก ๆ็ฝ็บข่กฅๅๅฝ้ถ้ฃ็ปๅญฉๅญๅ โ - โข ็จๆไบบ่กฅๅ ๅๅ้็ปๅฟ็ซฅๆ็จ โ - ้ ๅพ๏ผ็บข่ฒโ้ฟๅโ็็ธๆก๏ผๅธฆๆ้ช็ตๆๆใ --- **็ฌฌไบ้จๅ๏ผๅฎๆๅบ็จๆๅ** 1. **ๅฟ็ซฅ่ฅๅ ป่กฅๅ ไบงๅ3ๆญฅ้่ดญๆณ** - ๆ ้ข๏ผโๅฟ็ซฅ่ฅๅ ป่กฅๅ ไบงๅ3ๆญฅ้่ดญๆณโ๏ผๅฏๆ ้ข๏ผโๅฟ็ซฅ่กฅๅ้่ดญ3ๆญฅๅคๆญๆณโ - ไธๆญฅๆณๅๅซ็ฑๆพๅคง้ๅพๆ ๅผๅฏผ๏ผ - **็ๅ่งๆ ่ฏ**๏ผไผๅ ้ๆฉๅธฆ่ๅธฝๆ ่ฏ็ไฟๅฅ้ฃๅ๏ผๆๆๅฉดๅนผๅฟ/ๅฟ็ซฅไธ็จๅคๆกๆ ่ฏ็ๆญฃ่งไบงๅ๏ผๆ็ปไธๆ ไบงๅใ้ ๅพ๏ผๆพๅคง้่็ฆโ่ๅธฝโๆ ๅฟใ - **็้ ๆๆๅ**๏ผไผๅ ้ๆฉๆ ้ขๅคๆทปๅ ่็ณใ้ฆ็ฒพใไบบๅทฅ่ฒ็ด ใ้ฒ่ ๅ็ไบงๅ๏ผ่ดๆๅๆ ๆณจๆธ ๆฐๆ็กฎใ้ ๅพ๏ผๆไปถไธ่ดดๆโๆ ๆทปๅ โๅฐ็ซ ๏ผ็ปฟ่ฒๅฏนๅพใ - **็้้ ๅนด้พ**๏ผ้ๆฉๆ ๆณจๅฏนๅบ้็จๅนด้พๆฎต็ๅฟ็ซฅไธ็จไบงๅ๏ผไธ่ฆ่ช่กๅฐๆไบบ่กฅๅ ๅๅ้็ปๅญฉๅญๆ็จใ้ ๅพ๏ผ่ฏ็ถๆ ็ญพไธโๅนด้พโ่ขซ็บขๅ็ชๅบใ 2. **ๅธธ่งๅฟ็ซฅ่กฅๅ้็จๅบๆฏๅฏน็ ง่กจ** - ๆ ้ข๏ผโๅธธ่งๅฟ็ซฅ่กฅๅ้็จๅบๆฏๅฏน็ ง่กจโ - ่กจๆ ผๅฝขๅผ๏ผไธคๅ๏ผๅทฆไพงโ่กฅๅ็ฑปๅโ๏ผๅณไพงโ้็จๅบๆฏโ๏ผ่ๆฏ่ฒไบคๆฟไธบ็บขใ่ใ - ๅ ทไฝๅ ๅฎน๏ผ - **็ปด็็ด Dๆปดๅ** โ ๅ จๅนด้พๆฎตๅฟ็ซฅๆฅๅธธๅธธ่ง่กฅๅ ๏ผ้ข้ฒไฝๅป็ ใไฟ่ฟ้ๅธๆถใ้ ๅพ๏ผๆปด็ฎก็ถใ้ชจๅคดๅพๆ ใ - **้ๅ** โ ไฝๆฃ็กฎ่ฏ็ผบ้ๆง่ดซ่ก๏ผๆๆฅๅธธ็บข่ใๅจ็ฉ่่ๆๅ ฅไธ่ถณ็ๅฟ็ซฅใ้ ๅพ๏ผๆปด็ฎก็ถใๅฟ็ซฅๅคดๅใ - **DHA่ปๆฒน** โ ๆฅๅธธๆทฑๆตท้ฑผๆๅ ฅไธ่ถณ็ๅฟ็ซฅ๏ผ่พ ๅฉไฟ่ฟ่ง็ฝ่ๅๅคง่ๅ่ฒใ้ ๅพ๏ผ้ฑผๅฝข่ถๅใๅคง่ไธ็ผ็ๅพๆ ใ - **้ๅ** โ ๆฅๅธธๅฅถ้ไธ่ถณใ่บซ้ซๅข้ฟๅ็ผ๏ผ็ปไฝๆฃ็กฎ่ฎค็ผบ้็ๅฟ็ซฅใ้ ๅพ๏ผ็ฝ่ฒ่ฏ็ใๅฟ็ซฅๆต้่บซ้ซๅพใ --- **่ง่งไธๆ็็นๅพ๏ผ** - ๆดไฝ้็จ็ฝๆ ผๅๅธๅฑ๏ผๅไธชไธป่ฆๆจกๅๅๅธๅจ2x2็่ฑก้ไธญใ - ไฝฟ็จๅคง้ๆผซ็ปๅ ็ด ๏ผๅฆ็็ธๆกใๅฏน่ฏๆฐๆณกใ็ฎญๅคดใๆๅนๅทใ็ฆๆญข็ฌฆๅท็ญใ - ๅพๆ ็ณป็ปไธฐๅฏ๏ผVit DใFeใZnใCaใB_Bใ่ๅธฝใๆ ๆทปๅ ใๅนด้พใSTOP็ญๅๆไธๅฑๅพๅฝขๆ ่ฏใ - ๅญไฝๅ ็ฒใ้ดๅฝฑใ่พนๆกๅผบ่ฐๅ ณ้ฎไฟกๆฏ๏ผๅฆๆ ้ขใๆฐๅญใ่ญฆ็คบ่ฏญใ - ่ฒๅฝฉ็ผ็ ๆ็กฎ๏ผ้ป่ฒ็จไบๆ็คบ้็น๏ผ่่ฒ็จไบ่ฏดๆๆญฅ้ชค๏ผ็บข่ฒ็จไบ่ญฆ็คบๆ็ฆๆญขใ ่ฏฅไฟกๆฏๅพๅ ๅฎนๅ จ้ข๏ผ้ป่พๆธ ๆฐ๏ผๅ ผๅ ท็งๅญฆๆงๅๅฎ็จๆง๏ผ้ๅๅฎถ้ฟๅฟซ้ๆๆกๅฟ็ซฅ่ฅๅ ป่กฅๅ ็ๆ ธๅฟ็ฅ่ฏไธ้่ดญๆๅทงใ |
||
Paper Rendering Quality
| U1-8B-MoT | 8B-MoT-Infographic | U1-8B-MoT | 8B-MoT-Infographic |
|---|---|---|---|
Prompt[typesetting] The page is laid out with two tables at the top, followed by a two-column text layout. The tables span the full width of the text area. The text includes a section heading. [paragraphs] the TOPIC MODELER, the GENDER SEGMENTER, and an OTHER module (transcript length and duration). We test for a linear relationship between each pair of variables: $H_O : r = 0$, $H_A : r \neq 0$, where $H_O$ is the origi-nal hypothesis, $H_A$ is the alternate hypothesis, and $r$ is the Pearsonโs correlation coefficient. We follow Reddy et al. (2021) and Yang et al. (2019) and apply a Bonferroni cor-rection to our $\alpha$ value of $0.05$, setting $\alpha = 0.05/z$, where $z = \binom{124}{2} = 7,626$ for LDA, representing the number of feature relationships we consider. Hence, we reject $H_O$ in favor of $H_A$ if $p \leq \alpha$. Given the largeness of $z$, our $\alpha$ value becomes small, making our criteria for significance strict and thus suitable for investigating our research ques-tions. Furthermore, we filter our correlations $r$, such that $\Vert r\Vert > 0.1$ for our LDA experiments, and $\Vert r\Vert > 0.05$ for our BERTopic experiments (due to the smaller sample size of 10,000 podcasts, and fewer samples may have higher vari-ance). Our results focus on a selection of these significant correlations; the full results are available on the project web-site: https://www.gendered-discourse.net/extended-results. ### RQ0: How Are Women and Menโs Discourse Different? Using GDCF, our Gendered Discourse Correlation Frame-work shown in Figure 2, we then analyze significant corre-lations between between the gender features from the GEN-DER SEGMENTER module (Doukhan et al. 2018a), and the topic features from the TOPIC MODELER module (Blei, Ng, and Jordan 2003). We use the discourse topics to automati-cally form gendered discourse word lists via their significant correlations. Starting with the first row of Table 1, we see that Topic 3โs word list returned by LDA with Non-Contextual Embed-dings (Bag-Of-Words) (via the TOPIC MODELER module) contains the words women, woman, men, baby, pregnant, girls, men, doctor, health, birth (in descending weighted or-der). Based on this word list, we manually interpret this topic as being a content topic, specifically about pregnancy, as noted in the column โTopic N Categories.โ Then, we look to the gender correlations in the columns โGenderโ and โ$r$,โ and see that $r(\text{Topic 3, Women}) = +0.15$ and $r(\text{Topic 3, Men}) = -0.14$. This indicates that the topic of pregnancy positively correlates with women (identified via the GENDER SEGMENTER module), and negatively corre-lates with men. Therefore, we associate Topic 3 (Content - Pregnancy) with Women, as noted in the โTopic N Genderโ column. Similarly, we make these associations in the โTopic N Genderโ column for Topics 10, 49, and 71. Next, we focus on the Topic 54 row. This topic is inter-preted using the word list get, like, know, right, people, go-ing, podcast, make, want, one. This word list does not refer to any content, hence, we manually interpret this topic as being a discourse topic. Moving to the gender correlations, we see that $r(\text{Topic 54, Women}) = \emptyset$ and $r(\text{Topic 3, Men}) = +0.12$. The reason for $r(\text{Topic 54, Women}) = \emptyset$ is because the correlation between the features Topic 54 and Women did not come back as significant. However, due to the positive correlation of $0.12$ for Topic 3 and Men, we manually as-sociate Topic 3 with Men in the โTopic N Genderโ column. [tables] Table 1: LDA with Non-Contextual Embeddings (Bag-Of-Words): The complete set of significant correlations between gender features and topic features โ both content topics and discourse topics. Based on $r$, the Topic N Gender forms the gendered (discourse) word lists via Topics 54 and 60 (the masculine word lists) and Topic 62 (the feminine word list). | Topic N | Gender | $r$ | Topic N Word List | Topic N Categories | Topic N Gender | |---|---|---|---|---|---| | Topic 3 | Women Men | 0.15 -0.14 | women, woman, men, baby, pregnant, girls, men, doctor, health, birth | Content - Pregnancy | Women | | Topic 10 | Women Men | 0.10 -0.12 | energy, body, feel, mind, space, yoga, love, beautiful, feeling, meditation | Content - Yoga | Women | | Topic 49 | Women Men | -0.21 0.17 | game, know, think, team, going, mean, play, year, one, good | Content - Sports | Men | | Topic 71 | Women Men | 0.14 -0.14 | christmas, sex, girl, hair, love, get, date, girls, let, wear | Content - Dating | Women | | Topic 54 | Women Men | โ 0.12 | get, like, know, right, people, going, podcast, make, want, one | Discourse | Men | | Topic 60 | Women Men | -0.27 0.20 | going, know, think, get, got, one, really, good, well, yeah | Discourse | Men | | Topic 62 | Women Men | 0.33 -0.28 | like, know, really, going, people, want, think, get, things, life | Discourse | Women | Table 2: BERTopic with Contextual Embeddings (BERT, ChatGPT, Llama): The complete set of significant correlations between gender features and topic features for discourse topics only (content topics are omitted). | Topic N | Gender | $r$ | Topic N Word List | Topic N Categories | Topic N Gender | |---|---|---|---|---|---| | Topic 0 | Women Men | -0.08 0.10 | like, yeah, know, oh, right, podcast, got, going, think, really | Discourse | Men | | Topic 2 | Women Men | 0.08 -0.08 | life, know, things, really, people, feel, like, want, love, going | Discourse | Women | | Topic 5 | Women Men | 0.08 โ | like, know, think, yeah, episode, really, going, anchor, kind, right | Discourse | Women | |
Prompt[typesetting] The page is a standard academic paper layout with a single column. The text is justified and divided into sections and subsections, indicated by numbered headings. Important terms at the beginning of some paragraphs are bolded. A horizontal rule separates the header from the main content, and another rule separates the main content from the footnote at the bottom. [paragraphs] Preprint Version. **FigureโTable Integration.** In addition to textual refinement, we extend the refinement process to include multimodal elements, to further enhance readability. For each section, the model first generates visualization requirements, such as tables with structured comparisons or figures with explanatory diagrams, together with natural language descriptions. Based on these descriptions, candidate figures and tables are synthesized. The compiled outputs are then fed back to an LLM for quality assessment, enabling automatic detection of issues such as oversized layouts or unreadable text. The LLM provides corrective suggestions, which are applied to improve the final visualizations. Finally, the text is refined again to ensure that all generated figures and tables are properly referenced within the survey. # 4 EXPERIMENTS ## 4.1 EXPERIMENTAL SETTINGS **Implementation Details.** Following Wang et al. (2024b), we adopt **GPT-4o-mini** as our genera-tion model for its balance of responsiveness and cost. Our retrieval database contains 680K computer science papers from arXiv, with PDFs converted into structured Markdown using MinerU (Wang et al., 2024a) for consistent formatting. The details of the retrieval process are provided in App. A.1. In outline generation, the system consults 1000โ1200 papers, with a maximum of 8 sections. For section drafting, each subsection retrieves up to 60 additional relevant papers, combined with those linked during outline generation. Finally, we apply two iterations of the review-and-refine loop to enhance coherence across sections and improve overall readability. Illustrative outputs compared with AutoSurvey are provided in App. A.8. **Baselines.** We compare IterSurvey with a set of baselines, ranging from simple retrieval-augmented generation (Naive RAG), which directly drafts from retrieved documents, to more ad-vanced state-of-the-art systems. Specifically, we evaluate against AutoSurvey (Wang et al., 2024b), the first systematic framework for this task; SurveyForge (Yan et al., 2025), which combines heuris-tic outline generation based on the logical structures of human-written surveys with a memory-driven scholar navigation agent for high-quality retrieval; and SurveyGo (Wang et al., 2025), which em-ploys the LLMรMapReduce-V2 algorithm to address the long-context challenge. We also compare with SurveyX (Liang et al., 2025), which introduces an Attribute Tree-based outlining mechanism; however, due to access restrictions, we include SurveyX only in arena experiments. All methods are evaluated on the same retrieval database with generation hyperparameters aligned to their original settings for fairness. ## 4.2 AUTOMATIC EVALUATION RESULTS **Evaluation Setup.** We employ multiple complementary protocols to evaluate the quality of gen-erated surveys. On the 20-topic suite from Wang et al. (2024b), we adopt multi-dimensional scoring with LLM-as-a-judge. Content quality is assessed along three dimensions: coverage, structure, and relevance followed from Wang et al. (2024b). Besides, citation quality is evaluated using the NLI-based protocol of Gao et al. (2023), reporting both recall and precision: _Citation Recall_ measures whether all statements in the generated text are fully supported by the cited passages, while _Citation Precision_ identifies irrelevant citations to ensure that references are pertinent and directly support the claims. To improve scoring stability and reliability, prompts are standardized and judges must pro-vide a rationale before assigning scores. For additional robustness, we aggregate outputs from three judge models: GPT-4o, Claude-3.5-Haiku, and GLM-4.5V.1 Full prompts are provided in App. A.7. **Results.** The results on the 20 topics from Wang et al. (2024b) are reported in Tab. 1. Statistical significance was confirmed via paired t-tests, indicating that IterSurvey consistently outperforms baseline models ($p < 0.05$). We summarize the main observations below. - **Overall superiority.** IterSurvey consistently outperforms all baselines across both content and citation quality, achieving the highest overall average score (4.75). This demonstrates that the proposed framework is effective and robust across multiple evaluation dimensions. [page_number] 6 [footnotes] 1Specifically, we use `chatgpt-4o-latest`, `claude-3-5-haiku-20241022`, and `glm-4.5v`. |
||
Prompt[typesetting] This is a single-column page containing mostly text, structured with section headings and bold inline subheadings. URLs are formatted in a monospaced font and hyperlinked. [paragraphs] # A Image generation models This section details the two diffusion image generation models used in this work, namely Stable Diffusion 1.4 and 1.5. **Stable Diffusion 1.4** The Stable Diffusion model is a text-conditioned image generator model that combines an autoencoder with a diffusion model to create a latent diffusion model. The autoencoder encodes images into latent representations with a reduced dimensionality when compared to the input image, reducing the computational needs during the training phase. Text prompts, on the other hand, are encoded using a text encoder and are then cross-attended by the UNet backbone of the latent diffusion model. Finally, the loss is computed using a reconstruction objective between the noise added to the latent representation and the prediction made by the UNet. Stable Diffusion 1.4 (https://huggingface.co/CompVis/stable-diffusion-v1-4) had several rounds of training on the LAION dataset (https://laion.ai/), with each round changing the input image dimension, aesthetic score, and the probability of dropping the text-conditioning to improve classifier-free guidance. **Stable Diffusion 1.5** SD 1.5, in turn, has the same architecture and even the same starting point as 1.4, with the difference being how long the model was fine-tuned on top of SD 1.2. The 1.4 version is fine-tuned for 225 thousand steps at resolution 512x512 on โlaion-aesthetics v2 5+โ with a 10% probability of dropping the text-conditioning, and version 1.5 for 595 thousand steps. As demonstrated in Section D Stable Diffusion 1.4 has better performance than 1.5 in our approach, therefore, we will adopt SD 1.4 for most of the experiments in this paper. # B Large language models Here we give additional details on the large language models that we used in our experiments. **Gemma** (Mesnard et al., 2024), trained on a diverse 6 Trillion token dataset comprising web documents, code and mathematical texts. We resorted to the 7 Billion parameter instruction-tuned decoder-only model, named _gemma-7b-it_ (https://huggingface.co/google/gemma-7b-it). This model uses a chat template, which we employ during inference. **Llama 2** (Touvron et al., 2023), of which we used the 7 Billion parameter, pre-trained-only model, _Llama-2-7b_ (https://huggingface.co/meta-llama/Llama-2-7b-hf). This model was trained with a mix of publicly available data totalling 2 Trillion tokens. While its chat versions employ supervised fine-tuning and reinforcement learning with human feedback for alignment with human preferences in helpfulness and safety, the pre-trained-only model does not. This results in a less constrained model, but it may also cause it to disperse from the task at hand. Since this model is a pre-trained-only no chat template is needed. **Mistral** (Jiang et al., 2023) fine-tuned on various HuggingFace instruction datasets. We resorted to the 7 Billion _Mistral-7B-Instruct-v0.2_ model (https://huggingface.co/mistralai/ Mistral-7B-Instruct-v0.2) and used the respective chat template during inference. **Phi-2** (Gunasekar et al., 2023) is a compact 2.7 Billion model (https://huggingface.co/microsoft/ phi-2). Despite its size, it offers a competitive performance with respect to models several times its size. It was trained on 250 Billion tokens, obtained through a combination of NLP synthetic data created by GPT-3.5 and filtered web data from Falcon RefinedWeb and SlimPajama, which was assessed by GPT-4. This model was not fine-tuned through reinforcement learning from human feedback and does not have guardrails. **Model ranking** A ranking of these models in terms of their performance can be found in the HuggingFace leaderboard (https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) which assesses several LLMs that are trained under the same criteria and tested on the same benchmarks, including reasoning |
Prompt[typesetting] The page is a standard academic paper layout, likely from a preprint server like arXiv. It features a title, author list with affiliations, an abstract, and the beginning of the "Introduction" section. A preprint notification ("Preprint. Under review.") is present at the bottom left. The text on the left margin ("arXiv:2502.01522v2 [cs.CV] 30 May 2025") is a vertical stamp typical of arXiv submissions. [paragraphs] arXiv:2502.01522v2 [cs.CV] 30 May 2025 # Unpaired Deblurring via Decoupled Diffusion Model **Junhao Cheng**$^1$, **Wei-Ting Chen**$^2$, **Xi Lu**$^1$, **Ming-Hsuan Yang**$^3$ $^1$Sun Yat-sen University $^2$ Microsoft $^3$ University of California, Merced https://github.com/donahowe/UID-Diff **Abstract** Generative diffusion models trained on large-scale datasets have achieved remarkable progress in image synthesis. In favor of their ability to supplement missing details and generate aesthetically pleasing contents, recent works have applied them to image deblurring via training an adapter on blurry-sharp image pairs to provide structural conditions for restoration. However, acquiring substantial amounts of realistic paired data is challenging and costly in real-world scenarios. On the other hand, relying solely on synthetic data often results in overfitting, leading to unsatisfactory performance when confronted with unseen blur patterns. To tackle this issue, we propose UID-Diff, a generative-diffusion-based model designed to enhance deblurring performance on unknown domains by decoupling structural features and blur patterns through joint training on three specially designed tasks. We employ two Q-Formers as structural features and blur patterns extractors separately. The features extracted by them will be used for the supervised deblurring task on synthetic data and the unsupervised blur-transfer task by leveraging unpaired blurred images from the target domain simultaneously. We further introduce a reconstruction task to make the structural features and blur patterns complementary. This blur-decoupled learning process enhances the generalization capabilities of UID-Diff when encountering unknown blur patterns. Experiments on real-world datasets demonstrate that UID-Diff outperforms existing state-of-the-art methods in blur removal and structural preservation in various challenging scenarios. # 1 Introduction Dynamic blur occurs when the camera and subject move relative to each other during the exposure time, resulting in a smeared and blurred image. Deblurring, the process of removing the blur pattern while preserving the underlying structure of degraded images, is essential for restoring high-quality images for human perception and low-level computer vision applications. With the rapid advancement of photographic technology, a wide range of imaging devices are now employed to capture images in real-world scenarios. Due to their diverse lenses and structural designs, these devices may produce distinct blur patterns [1, 2, 3]. This diversity makes it challenging to develop an all-in-one method for deblurring images from arbitrary and varied sources. Consequently, focusing on deblurring algorithms tailored to specific domains has become increasingly significant. As deep learning has advanced in recent years, existing deblurring models predominantly build on data-driven approaches that employ neural networks trained via supervised learning on synthetic paired data. Existing works have made efforts to develop deblurring models upon CNN [4, 5], Transformer [6, 7], and GAN [8, 9]. Recently, a new wave of research [10, 11, 12] has begun to investigate the integration of pre-trained generative diffusion models [13], such as Stable Diffusion (SD) [14], with an adapter designed to provide structural guidance for deblurring. These approaches aim to harness the generative capabilities of diffusion models to supplement missing details and generate aesthetically pleasing outputs. However, since paired blurry-sharp training data is limited in [footnotes] Preprint. Under review. |
||
Overall Layout and Content Understanding
| U1-8B-MoT | 8B-MoT-Infographic | U1-8B-MoT | 8B-MoT-Infographic |
|---|---|---|---|
Prompt่ฏฅไฟกๆฏๅพไปฅโๆฒๅฐผๅธ็นโไธบๆ ้ข๏ผๆดไฝ้็จๆต
่็ฝ่ฒ่ฐ๏ผๅธๅฑๆธ
ๆฐ๏ผๅไธบๅคไธชๆจกๅๅๅบๅ๏ผๅด็ปไธญๅคฎ็้ๆ่ถๅๅพๅๅฑๅผใๅณไธ่งๅฑ็คบๆฒๅฐผๅธ็น็ๅๅญฆ็ปๆๅผๅๅ
ถๅๅญๅผ CโโHโโNOโใ **1. ๆดปๆงๆๅๆฐๆฎ๏ผๅทฆไธ๏ผ** - ไปฅ็ฏๅฝขๅพๅฝขๅผๅฑ็คบๆๅๆฏไพ๏ผ - ๆฒๅฐผๅธ็น >98% - ่พ ๆ <2% - ๅพไธๆนๆ ๆณจ๏ผโ็บฏๅบฆ้ซ๏ผไธดๅบ็บงๆ ๅโ **2. ้ๅบ็๏ผๅณไธ๏ผ** - ้่ฟไธไธชๅพๆ ๅๆๅญ่ฏดๆ๏ผ - ้ผป้จๅพๆ ๏ผ่ฟๆๆง็พ็ - ็ฎ่ค็บน็ๅพๆ ๏ผ็บค็ปดๅ - ็ค็ๅพๆ ๏ผ็ข็็็ฉ **3. ๅ้็ฉ้ต๏ผไธญๅทฆ๏ผ** - ่กจๆ ผๅฝขๅผ๏ผๅ ๅซไธคๅ๏ผโๅฃๆโๅโ้ข็โ - ๆไบบ๏ผ100mg / ๆฌก๏ผ้ข็๏ผ1-3 ๆฌก / ๅคฉ - ๅฟ็ซฅ๏ผๅจ่ฏขๅป็๏ผ้ข็๏ผ้ตๅปๅฑ **4. ่ฏไปฃๅจๅๅญฆๆถ้ด่ฝด๏ผไธญๅณ๏ผ** - ๆ็บฟๅพ๏ผๆจช่ฝดไธบๆถ้ด๏ผ0h ่ณ 24h๏ผ๏ผ็บต่ฝดไธบๆตๅบฆ๏ผๆ ๅปๅบฆ๏ผ - 0h๏ผๅธๆถๅผๅง๏ผๆฐดๆปดๅพๆ ๏ผ - 1-2h๏ผๅณฐๅผๆตๅบฆ๏ผๅฑฑๅณฐๅพๆ ๏ผ - 4-6h๏ผๅๅธ/ไปฃ่ฐข๏ผๅพช็ฏ็ฎญๅคดๅพๆ ๏ผ - 24h๏ผๆๆณ๏ผๅๅพๆกถๅพๆ ๏ผ - ๅพไธญๆ ๆณจๅ่กฐๆ โ 5-8h **5. ่ญฆๅ็ฝๆ ผ๏ผๅทฆไธ๏ผ** - ๅไธบๅไธช่ฑก้๏ผๆฏไธช้ ๆๅพๆ ๅๆๅญ๏ผ - ็ธไบไฝ็จ๏ผCYP้ ถๆๅถๅ/่ฏฑๅฏผๅ๏ผ้ฝฟ่ฝฎๅพๆ ๏ผ - ๅฏไฝ็จ๏ผ่่ ้ไธ้๏ผ็ฎ็น๏ผ่ๅพๆ ๏ผ - ่ๅ่ฝ๏ผๅฎๆ็ๆต๏ผ่่ๅพๆ ๏ผ - ่พๅ่ฝ๏ผๆ ็จ๏ผ่พ่ๅพๆ ๏ผ **6. ๆฃ่ ้็จๆง๏ผไธญไธ๏ผ** - ไธคไธชๅพๆ ็ปๅ๏ผ - ๆไบบ๏ผไบบ็ฉๅพๆ + ๅฏนๅพ๏ผๆ ๆณจโๆไบบ ้็จโ - ๅฟ็ซฅ๏ผไบบ็ฉๅพๆ + ้ฎๅท + ๅป็ๅพๆ ๏ผๆ ๆณจโๅฟ็ซฅ ๅจ่ฏขๅป็โ **7. ๅจๅญๆๅ๏ผๅณไธ๏ผ** - ไธไธชๅพๆ ๅนถๅ๏ผ - ๆธฉๅบฆ่ฎกๅพๆ ๏ผ2-25โ ๅฎคๆธฉ - ๅฏๅฐ็ถๅพๆ ๏ผๅฏ้ญ - ้ฎๅ ๅพๆ ๏ผๅคช้ณๅ ๆ็บฟ๏ผ๏ผ้ฟๅ ๆดไฝ่ฎพ่ฎก้ฃๆ ผ็ฐไปฃใไธไธ๏ผไฝฟ็จๅคง้ๅพๆ ่พ ๅฉ็่งฃ๏ผๆฐๆฎๅฏ่งๅๆธ ๆฐ๏ผ้ๅๅป็ๆ่ฏๅๅฎฃไผ ๅบๆฏใๆๆๆๆฌๅไธบไธญๆ๏ผ่ฏญ่จๅ็กฎ๏ผๆ ๅไฝๆ่ฟฐใ |
PromptThe infographic presents an augmented reality (AR) shopping experience overlaid on a real-world retail environment. The scene is set in a brightly lit cosmetics aisle of a store, with shelves stocked with beauty products visible in the background. In the foreground, a pair of hands holds a black rectangular compact labeled "ANASTASIA BEVERLY HILLS BROW POWDER DUO" with "EBONY" and "NET WT. 2.5 OZ." printed below. A gold ring is visible on the left handโs ring finger, and a black wristband is partially seen on the left wrist. Overlaid on the image are several semi-transparent, rounded-corner UI elements resembling AR pop-ups or digital cards, providing contextual information about the product and the userโs shopping list. On the left side, a vertical panel titled "SHOPPING LIST" lists four items: 1. Face Wash โ marked with an โXโ (completed) 2. Shampoo โ marked with an โXโ (completed) 2. Eye Cream โ marked with an empty checkbox (not completed; duplicated item number) 3. Eye Cream โ marked with an empty checkbox (not completed) This suggests a possible error or duplication in the list, with two entries for "Eye Cream". In the center-right, a speech-bubble-shaped label displays the price: "$23.00". To the right of the product, a larger panel titled "PRODUCT DETAILS:" provides information about the "ABH Brow Powder Duo". It features two color swatches: - Left swatch: labeled "DEEP BROWN" - Right swatch: labeled "BLACK" Below the swatches, a star rating system shows four and a half filled stars, accompanied by the text "4.5 out of 5 stars". Underneath the rating, a section titled "COMMON USES:" states: "DEFINES & FILLS BROWS". Further down, a smaller rectangular box labeled "KEY INGREDIENTS" lists: - Vitamin E - Finely Milled Pigments At the bottom right, another box titled "APPLICATION TIPS" includes a video icon (a rectangle with a play triangle) and the word "Video", indicating a multimedia tutorial is available. The overall layout mimics an immersive AR interface, likely from a smart glasses or smartphone application, designed to enhance in-store shopping by providing instant, interactive product data directly within the userโs field of view. The visual style uses dark gray, translucent backgrounds with white text for high contrast and readability against the busy store backdrop. The design emphasizes usability, with clear categorization of information into distinct panels and intuitive icons. All textual content is in English, and no other languages are present. |
||
Prompt่ฏฅไฟกๆฏๅพไปฅๆทฑ่่ฒ็งๆๆ่ๆฏไธบไธป๏ผ้
ไปฅ็ดซ่ฒๅ้่ฒ็็ต่ทฏๆฟๅพๆก่พนๆก๏ผ่ฅ้ ๅบๆชๆฅๆฐๅญ่ฎพๅค็่ง่งๆฐๅดใๆ ้ขโ่ฐทๆญๆๆฐ่กๆฐงไปชๆบๅๅๆฐๅฏนๆฏ๏ผ็คพๅช็๏ผโไฝไบ้กถ้จไธญๅคฎ๏ผไฝฟ็จๅๅ
็ฝ่ฒๅญไฝ๏ผ็ชๅบไธป้ขใๆดไฝๅธๅฑไธบๆจชๅไธๆ ๅผๅฏนๆฏ็ปๆ๏ผๅทฆไพงไธบๅๆฐ็ฑปๅซๆ ็ญพๅ๏ผไธญ้ดๅๅณไพงๅๅซไธบไธๆฌพๆบ่ฝ็ฉฟๆด่ฎพๅค็ๅๆฐ่ฏฆๆ
ใ ๅทฆไพงๅๆฐ็ฑปๅซๅไปฅๅพๆ +ๆๅญๅฝขๅผๅ็ดๆๅ๏ผๅ ๆฌ๏ผ - ่ฏ็๏ผๅพๆ ไธบ่ฏ็็ฌฆๅท๏ผ - ็ตๆฑ ๏ผๅพๆ ไธบ็ตๆฑ ็ฌฆๅท๏ผ - ๅ่ฝ๏ผๅพๆ ไธบๅฟ็ตๆณขๅฝข็ฌฆๅท๏ผ - ้้๏ผๅพๆ ไธบ็งค็็ฌฆๅท๏ผ - ไปทๆ ผ๏ผๅพๆ ไธบไปทๆ ผๆ ็ญพ็ฌฆๅท๏ผ - ๅๅฎๆถ้ด๏ผๅพๆ ไธบๆฅๅ็ฌฆๅท๏ผ ไธญ้ดไธๆ ๅๅซๅฏนๅบไธๆฌพไบงๅ๏ผ 1. **้ซไบฎๆจ่ๆบๅ๏ผGoogle Pixel Pulse๏ผๆๆฐๆจ่๏ผ** - ๆ ้ขไธๆนๆ้่ฒๆๅฝขๅพฝ็ซ โโ ้ซไบฎๆจ่ๆบๅโ๏ผๅนถ็จ้่ฒ่พนๆก้ซไบฎๆพ็คบใ - ่ฏ็๏ผTensor G4ๅฎๅถ่ฏ็ - ็ตๆฑ ๏ผ7ๅคฉ็ปญ่ช๏ผๅฟซๅ - ๅ่ฝ๏ผ่ฟ็ปญ่กๆฐง็ๆต๏ผ็ก็ /ๅๅ่ฟฝ่ธช๏ผAIๅฅๅบทๆๅฏผ - ้้๏ผ28ๅ ๏ผ่ฝป็๏ผ - ไปทๆ ผ๏ผยฅ1999 - ๅๅฎๆถ้ด๏ผ2024ๅนด10ๆ 2. **็ซๅA๏ผไพๅฆ๏ผApple Watch S9๏ผ** - ่ฏ็๏ผS9 SiP่ฏ็ - ็ตๆฑ ๏ผ18ๅฐๆถ๏ผๆญฃๅธธไฝฟ็จ๏ผ - ๅ่ฝ๏ผๆ้่กๆฐง๏ผๅฟ็ตๅพAPP๏ผๆๅๆฃๆต - ้้๏ผ32ๅ - ไปทๆ ผ๏ผยฅ3199 - ๅๅฎๆถ้ด๏ผ2023ๅนด9ๆ 3. **็ซๅB๏ผไพๅฆ๏ผGarmin Venu 3๏ผ** - ่ฏ็๏ผElevated V5ไผ ๆๅจ - ็ตๆฑ ๏ผ14ๅคฉ๏ผๆบ่ฝๆจกๅผ๏ผ - ๅ่ฝ๏ผๅ จๅคฉๅ่กๆฐง๏ผ่บซไฝ็ต้๏ผGPS่ฟๅจ - ้้๏ผ35ๅ - ไปทๆ ผ๏ผยฅ2499 - ๅๅฎๆถ้ด๏ผ2023ๅนด8ๆ ๆๆๆฐๆฎๅ้็จๆธ ๆฐ็ๆจชๅๅ้็บฟ็ป็ป๏ผๆฏ้กนๅๆฐๅ ๅฎนๅฑ ไธญๅฏน้ฝ๏ผๅญไฝไธบ็ฎๆด็ฐไปฃ็ๆ ่กฌ็บฟไฝ๏ผ้ข่ฒไธบๆต ่ๆ็ฝ่ฒ๏ผ็กฎไฟๅฏ่ฏปๆงใ้ซไบฎๆจ่ๆบๅไฝฟ็จ้่ฒ่พนๆกๅๆดๆไบฎ็ๆๅญ๏ผๅฝขๆ่ง่ง็ฆ็นใ ๅบ้จๆไธ่กๆณจ้ๆๅญ๏ผโๆณจ๏ผไปฅไธๅๆฐไป ไพๅ่๏ผๅ ทไฝไปฅๅฎๆนๅๅธไธบๅใ#็งๆ #ๅฅๅบท #่ฐทๆญๆฐๅ #่กๆฐงไปชๅฏนๆฏโ๏ผๅญไฝ่พๅฐ๏ผ้ข่ฒ่พๆ๏ผไฝไธบ่กฅๅ ่ฏดๆใ ๆดไฝ่ฎพ่ฎก้ฃๆ ผ็ฐไปฃใ็งๆๆๅผบ๏ผ้่ฟ่ฒๅฝฉๅฏนๆฏใ่พนๆก้ซไบฎๅๅพๆ ่พ ๅฉ๏ผๆๆไผ ่พพไบๅๆบๅๅจๅ ณ้ฎๆง่ฝๆๆ ไธ็ๅทฎๅผ๏ผๅฐคๅ ถ็ชๅบไบGoogle Pixel Pulseๅจ็ปญ่ชใไปทๆ ผๅๅ่ฝ้ๆๆน้ข็ไผๅฟใ |
Prompt่ฏฅไฟกๆฏๅพไปฅๅคๅคๆ็ป้ฃๆ ผๅ็ฐ๏ผๆดไฝๅธๅฑๅฆไธๆฌๆๅผ็ๆณ้ปไนฆ้กต๏ผ่ๆฏไธบ็ฑณ้ป่ฒไปฟๆง็บธๅผ ่ดจๆ๏ผ่พน็ผๅธฆๆไธ่งๅๆ่ฃๆๆใๆ ้ขโๅ็ฉ้ฆๆธธ่งๆฉๅฑๅ
ๅฎนไธ่ฆ็นโไฝไบ้กถ้จไธญๅคฎ๏ผๅญไฝไธบๆทฑๆฃ่ฒ่บๆฏๅญ๏ผไธคไพง้ฅฐๆๅทๆฒ่ฑ็บน่ฃ
้ฅฐ๏ผ่ง่งไธ็ชๅบไธป้ขใ ๅ จๅพ้็จๅ ญ็นๅผ็ปๆๅๅธๅฑ๏ผๅด็ปไธญๅฟๅๅธๅ ญไธชๆ ธๅฟๆจกๅ๏ผๆฏไธชๆจกๅๅ้ ๆ็ฌ็ซๆ็ปใ็ผๅทๆ ้ขๅ่ฏดๆๆๅญ๏ผ้่ฟ่ฃ ้ฅฐๆง่พนๆกใ่ฑ็ฏใไธๅธฆ็ญๅ ็ด ่ฟ่กๅบๅไธ็พๅใๆดไฝ่ฎพ่ฎก้ฃๆ ผๆธฉ้ฆจใๆ่บ๏ผ่ๅไบ้ณไน็ฌฆๅทใๆๆใ่ฐ่กฃ่ใไบๆต็ญ็น็ผๅ ็ด ๏ผ่ฅ้ ๅบ่ฝปๆพๆๆฆ็ๆๅๆข็ดขๆฐๅดใ ๅๆจกๅๅ ๅฎนๅฆไธ๏ผ 1. **ๆฒๆตธๅผไฝ้ช** - ๆ ้ข๏ผโ1. ๆฒๆตธๅผไฝ้ชโ - ่ฏดๆๆๅญ๏ผโๅไธไบๅจๅฑ่ง๏ผๆๅๅๅฒๅบๆฏ่ฟๅ๏ผ่บซไธดๅ ถๅขใโ - ่ง่งๅ ็ด ๏ผๅทฆไพงๆ็ปไธไฝ้ๅ็ทๅญฉๆๆๆพๅคง้่งๅฏไธไธชๅพฎ็ผฉๅๅฒ่กๆฏๆจกๅ๏ผๅ ๅซๆฟๅฑใๆไฝๅไบบ็ฉ๏ผ๏ผไธๆนๆ้ฝฟ่ฝฎไธ็ฏๆณก็ปๆ็ๆ่ๆฐๆณก๏ผ่ฑกๅพๆข็ดขไธๅ็ฐใๅณไพง้ ๆไธไธช็ณป็็ฒ่ฒ่ด่ถ็ป็็คผ็ฉ็๏ผๆ ็ญพๅๆโSURPRISEโใ 2. **ไธป้ข่ฎฒๅบงไธๅทฅไฝๅ** - ๆ ้ข๏ผโ2. ไธป้ข่ฎฒๅบงไธๅทฅไฝๅโ - ่ฏดๆๆๅญ๏ผโ่ๅฌไธๅฎถๆทฑๅบฆ่งฃ่ฏป๏ผไบฒๆๅถไฝๆๅทฅ่บๅ๏ผๅญฆไน ๆฐ็ฅใโ - ่ง่งๅ ็ด ๏ผๅณไพงๅฑ็คบไธๅผ ๆจๆก๏ผๆกไธๆๆพ้ถๅฃถใ้ถ็ฝใๅปๅ็ญๆๅทฅๅทฅๅ ท๏ผๆ่พนๅ ๅ ไนฆ็ฑไธๅท่ฝด๏ผๅจๅด็ฏ็ปๆฉๆฆๆ่ฑ็ฏ๏ผไธๆนๆฌๆไธไธฒ้ฃ้๏ผๅซๆไบฎใๆๆไธ้้๏ผ๏ผ่ๆฏ็น็ผไบๆตไธๆๅ ใ 3. **้ฆ่็ๅๆข็ดข** - ๆ ้ข๏ผโ3. ้ฆ่็ๅๆข็ดขโ - ่ฏดๆๆๅญ๏ผโๅฏปๆพ้้ฆไนๅฎ๏ผไบ่งฃ่ๅ็ๆ ไบไธๆๅไปทๅผ๏ผๆทฑๅบฆๆๆใโ - ่ง่งๅ ็ด ๏ผๅทฆไพงๆฏไธไธชๆๅผ็ๆจ่ดจๅฎ็ฎฑ๏ผๅ ๆ้้้ผ็ถๆ็ฉไธๅๅ ๅท่ฝด๏ผๆๆ็ปฟ่ฒ็็งๅๅ ใๆฃ่ฝ้้ฑ๏ผไปฅๅไธๆฏ็น็็็ฝ่ฒ่ก็๏ผ็ๅฐ่ฃ ้ฅฐๆ่ฐ่กฃ่ไธๅฐ่ฑๆใ 4. **็น่ฒๅฏผ่ง่ทฏ็บฟ** - ๆ ้ข๏ผโ4. ็น่ฒๅฏผ่ง่ทฏ็บฟโ๏ผ็ฝฎไบ็ฑณ่ฒไธๅธฆๆจชๅน ไธญ๏ผ - ่ฏดๆๆๅญ๏ผโ่ท้ๅฎๅถ่ทฏ็บฟ๏ผๅ็ฐ้็ง่ง่ฝไธ็ฌ็น่ง่ง๏ผๅซๆ ท็ฒพๅฝฉใโ - ่ง่งๅ ็ด ๏ผไธๆนๆฏไธๅผ ๅฑๅผ็ๅคๅคๅฐๅพ๏ผๆ ๆๆฑ้จใๅไบญใไฝๅใ้ๅก็ญๆฏ็น๏ผไปฅ็บข่ฒ่็บฟ่ฟๆฅ๏ผๅนถ้ ๆๆๅ้ๅพๆ ๏ผไฝ็ฐ่ทฏๅพ่งๅๆฆๅฟตใ 5. **ๆฐๅญๅไบๅจ** - ๆ ้ข๏ผโ5. ๆฐๅญๅไบๅจโ๏ผ็ฝฎไบๅๅฝขๆณข็น่พนๆกๅ ๏ผ - ่ฏดๆๆๅญ๏ผโๅฉ็จAR/VRๆๆฏ๏ผๆ็ ดๆถ็ฉบ้ๅถ๏ผไฝ้ช่ๆ็ฐๅฎใโ - ่ง่งๅ ็ด ๏ผๅณไพงๆ็ปไธไฝๆดVR็ผ้็ไบบๆญฃๅจ่งฆๆง็ฉบไธญๆฌๆตฎ็้ถ็ฝๅพๅ๏ผๅจๅดๆWi-Fiไฟกๅทใๆฐๆฎๅพ่กจใๅฃฐๆณขๅพ็ญ็งๆๅ ็ด ๏ผไฝ็ฐๆฐๅญไบคไบๅบๆฏใ 6. **ๆๅ่ก็ๅ** - ๆ ้ข๏ผโ6. ๆๅ่ก็ๅโ - ่ฏดๆๆๅญ๏ผโ้่ดญ็ฌ็น็บชๅฟตๅ๏ผๅฐๅ็ฉ้ฆ่ฎฐๅฟๅธฆๅๅฎถ๏ผๅปถ็ปญ็พๅฅฝใโ - ่ง่งๅ ็ด ๏ผๅทฆไธ่ง้ๅๅค็งๆๅๅๅ๏ผๅ ๆฌๅฐๆๅ็ฉ้ฆๅปบ็ญๅพๆก็ๅธๅธ่ข๏ผๆ ๆโMUSEUMโ๏ผใ็ฌ่ฎฐๆฌใๆไฟก็ใๅพฝ็ซ ๏ผๅณไธ่งๅๆฏไธ็็ฒพ่ดไธๆๆฒป๏ผ้ขๅ ไธ็ๆไบ่งๆๅพๆก๏ผ๏ผ้ ่่ไธๅท้ฅผ๏ผๆๆไธๅชๆดๆดพๅฏนๅธฝใ็ณป่่ฒ่ด่ถ็ป็็ฝ้น ๏ผๅฃไธญๅทๅบ้ณ็ฌฆ๏ผๅ ๆปก็ซฅ่ถฃใ ๆดๅผ ไฟกๆฏๅพ้่ฟๅพๆ็ปๅ็ๆนๅผ๏ผ็ณป็ปไป็ปไบๅ็ฉ้ฆๅ่ง็ๅ ญๅคงๅปถไผธๆดปๅจ๏ผๆขไผ ่พพๅฎ็จไฟกๆฏ๏ผๅๅ ผๅ ท็พๅญฆๆๆๅ๏ผ้ๅ็จไบๅฎฃไผ ๅใๆ่ฒๆตทๆฅๆ็บฟไธๆจๅนฟๆๆใๆๆๆๆฌๅไธบไธญๆ๏ผๆ ่ฑๆๆๆฐๅญ็ผ็ ๏ผ่ฏญ่จ้ฃๆ ผไบฒๅ่ช็ถ๏ผ็ฌฆๅๅคงไผไผ ๆญ้ๆฑใ |
||
๐ ๏ธ Quick Start
๐ Use with SenseNova-Studio
The fastest way to experience SenseNova-U1 is through SenseNova-Studio โ a ๐ free online playground where you can try the model directly in your browser, no installation or GPU required.
Note: To serve more users, U1-Fast has undergone step and CFG distillation, and is dedicated to infographic generation.
๐ฆ Use with SenseNova-Skills (OpenClaw)
The easiest way to integrate SenseNova-U1 into your own agent or application is through our companion repository SenseNova-Skills (OpenClaw) ๐ฆ, which ships SenseNova-U1 as a ready-to-use skill with a unified tool-calling interface.
Refer to the SenseNova-Skills README for installation and usage details.
โจ Click to collapse and view interesting cases made through Skills and Studio
๐ค Run with transformers (Default)
Setup: Follow the Installation Guide to clone the repo and install dependencies with uv.
๐ Generate High-Quality Infographics
For generating complex infographics, we highly recommend using the following parameters: --cfg_scale 4.0, --timestep_shift 3.0, and --num_steps 50.
python examples/t2i/inference.py \
--model_path sensenova/SenseNova-U1-8B-MoT-Infographic \
--prompt "่ฟๅผ ไฟกๆฏๅพ็ๆ ้ขๆฏโSenseNova-U1โ๏ผ้็จ็ฐไปฃๆ็ฎ็งๆ็ฉ้ต้ฃๆ ผใๆดไฝๅธๅฑไธบๆฐดๅนณไธๅ็ฝๆ ผ็ปๆ๏ผ่ๆฏๆฏๅธฆๆๆๆต
้ถ็ฐ่ฒ็ปๅฏ็น้ต็ๅๅ
็บฏ็ฝ้ซ็บง็บธๅผ ็บน็๏ผ็ป้ข้ฟๅฎฝๆฏไธบ16:9ใ\n\nๆ็้็จไธฅ่ฐจ็่ง่งๅฑ็บง๏ผไธปๆ ้ขไฝฟ็จ็ฒไฝๆ ่กฌ็บฟ้ปไฝๅญ๏ผๆญฃๆไฝฟ็จๆธ
ๆฐ็็ฐไปฃ็ญๅฎฝๅญไฝใ้
่ฒๆนๆกๆๅ
ถๅ
ๅถ๏ผไปฅ็บฏ็ฝ่ฒไธบๅบ๏ผๆทฑ็ญ้ปไธบไธป่ง่งๆๅญๅ่พนๆก๏ผๆต
็ณๆฟ็ฐ็จไบ่ๆฏ่ฒๅๅๆฌก่ฆไฟกๆฏๅบๅ๏ผๅพๆ ้็จ็ฒพ่ด็้ถ็ฐ่ฒ็บฟๆก็ปๅถใ\n\nๅจ็ป้ขๆญฃไธๆนๅฑ
ไธญไฝ็ฝฎ๏ผไฝฟ็จ้็ฎ็ๆทฑ็ญ้ป็ฒไฝๅญๆๅธ็ๅคงๆ ้ขโSenseNova-U1โใๆ ้ขๆญฃไธๆนๆฏๆต
็ณๆฟ็ฐ่ฒ็็ญๅฎฝๅญไฝๅฏๆ ้ขโๆฐไธไปฃ็ซฏๅฐ็ซฏ็ปไธๅคๆจกๆๅคงๆจกๅๅฎถๆโใ\n\n็ป้ขไธปไฝๅไธบๅทฆใไธญใๅณไธไธช็ธ็ญ็ๅ็ดไฟกๆฏๅบๅ๏ผๅบๅไน้ด้่ฟๅ
่ถณ็่ด็ฉบ้ด่ฟ่ก็ฉ็้็ฆปใ\n\nๅทฆไพงๅบๅ็ไธป้ขๆฏๆฆ่ฟฐใ้กถ้จๆไธไธช้ถ็ฐ่ฒ็บฟๆก็ปๅถ็ใ็ฑๆพๅคง้ๅ้ฝฟ่ฝฎไบค็ป็ๅพๆ ๏ผๆ่พนๆฏ็ฒไฝๅฐๆ ้ขโOverviewโใ่ฏฅๅบๅๅ
ไปไธๅฐไธๅ็ดๆๅ็ไธไธช่ฆ็น๏ผ็ฌฌไธไธช่ฆ็นๆ่พนๆฏไธไธชไปฃ่กจๆๆกฃไธ็
ง็้ๅ ็ๆ็ฎๅพๆ ๏ผ็ดง่ท็ๆๅญโๅคๆจกๆๆจกๅๅฎถๆ๏ผ็ปไธๆๆฌ/ๅพๅ็่งฃๅ็ๆโใๅไธๆฏ็ฑไธคไธช็ธ่ฟ็ๅๅฟๅ็ปๆ็ๆถๆๅพๆ ๏ผ้
ๆๆๅญโๅบไบNEO-Unifyๆถๆ๏ผ็ซฏๅฐ็ซฏ็ปไธ็่งฃๅ็ๆ๏ผโใๆไธๆนๆฏไธไธชๅธฆๆๆ็บฟๅๆ็็ผ็ๅๆผๆๅฝข็ถ็ๅพๆ ๏ผๆ็กฎๆ็คบๆๆฌโๆ ้่ง่ง็ผ็ ๅจ(VE)ๅๅๅ่ช็ผ็ ๅจ(VAE)โใ\n\nไธญ้ดๅบๅๅฑ็คบๆจกๅ็ฉ้ตใ้กถ้จๆฏไธไธชๅ
ๅซไธคไธชๅๆฏ่็น็ๆ ็ถ็ฝ็ปๅพๆ ๏ผๆ่พนๆฏ็ฒไฝๅฐๆ ้ขโไธคไธชๆจกๅ่งๆ ผโใๅบๅๅ
ๅไธบไธไธไธคไธชๅ
่ฃนๅจๆต
็ณๆฟ็ฐ่ฒๆ็ป่พนๆกๅ
็ๅก็ใไธๆน็ๅก็ๅ
็ป็ไธไธชไปฃ่กจ้ซๅฏๅบฆ็ๅฎๅฟๅ ไฝ็ซๆนไฝๅพๆ ๏ผๅคงๅญๆ ๆณจโSenseNova-U1-8B-MoTโ๏ผไธๆนๆฏ็ญๅฎฝๅญไฝ่ฏดๆโ8B MoT ๅฏ้ไธปๅนฒๆจกๅโใไธๆน็ๅก็ๅ
็ป็ไธไธชๅธฆๆ้ช็ต็ฌฆๅท็็ฝ็ถๅๅ
ๅคง่ๅพๆ ๏ผๅคงๅญๆ ๆณจโSenseNova-U1-A3B-MoTโ๏ผไธๆนๆฏ็ญๅฎฝๅญไฝ่ฏดๆโA3B MoT ๆททๅไธๅฎถ๏ผMoE๏ผไธปๅนฒๆจกๅโใๅจ่ฟไธคไธช็ฌ็ซๅก็็ๆญฃไธๆน๏ผๅทฆไพงๆพ็ฝฎไธไธช็ฌ่ธ่ฝฎๅปๅพๆ ๆญ้
ๆๅญโๅฐๅจHF็ญๅนณๅฐๅ
ฌๅผโ๏ผๅณไพงๆพ็ฝฎไธไธชๅธฆๆๆ่ง็ไนฆ้ขๆฅๅๅพๆ ๆญ้
ๆๅญโๅฐๅๅธๆๆฏๆฅๅโใ\n\nๅณไพงๅบๅๅ็ฐๆ ธๅฟไผๅฟใ้กถ้จๆฏไธไธชไปฃ่กจๅท
ๅณฐ็ไธๅ้ถๆขฏๆ็บฟๅพๅพๆ ๏ผๆ่พนๆฏ็ฒไฝๅฐๆ ้ขโHighlightsโใ่ฏฅๅบๅๅ
้จๅ็ดๅๅธ็ๅไธชๅธฆๆๆต
็ณๆฟ็ฐๅบ่ฒ็้ฟๆนๅฝข่ฒๅ๏ผๆฏไธช่ฒๅๅ
้จๅทฆไพงๅฏนๅบไธไธชๅ
ทไฝ็ๅพๆ ๏ผๅณไพงไธบๆๅญใ็ฌฌไธไธช่ฒๅๅ
ๆฏไธไธชๆ ็ผ็ธ่ฟ็่ซๆฏไนๆฏ็ฏๅพๆ ๏ผ้
ๆโๅ็็ปไธๆถๆ๏ผๆ VEๅVAEโใ็ฌฌไบไธช่ฒๅๅ
ๆฏไธไธช้กถ็ซฏๅธฆๆๆๆ็ๅฅๆฏๅพๆ ๏ผ้
ๆโๅไธ็ปไธๆจกๅๅจ็่งฃๅ็ๆไปปๅกไธๅ่พพๅฐSOTAๆง่ฝโใ็ฌฌไธไธช่ฒๅๅ
ๆฏไปฃ่กจๆๆฌ่กไธๆ็ซๅพ็
ง็ไบคๆฟ็ฉฟๆ็ๅพๆ ๏ผ้
ๆโๅผบๅคง็ๅ็ไบค้ๆจ็่ฝๅ๏ผๆจกๅๅ็็ๆๅพๅ่ฟ่กๆจ็๏ผโใๆๅไธไธช่ฒๅๅ
ๆฏไธไธช่ขซๅๅๅบไธๅฐๅ็็กฌๅธไธ่ฏฆ็ป้ฅผ็ถๅพ็ปๅ็ๅพๆ ๏ผ้
ๆโ่ฝ็ๆๅคๆไฟกๆฏๅพ่กจ๏ผๆงไปทๆฏๅบ่ฒโใ" \
--width 2720 --height 1536 \
--cfg_scale 4.0 --cfg_norm none --timestep_shift 3.0 --num_steps 50 \
--output output.png --profile
Default resolution is 2048ร2048 (1:1). See supported resolution buckets for other aspect ratios.
For high-quality infographic generation, it is recommended to apply prompt enhancement before generating images.
๐พ Memory-efficient inference (GGUF + VRAM modes)
For users running on a single consumer GPU, two complementary features lower the VRAM footprint of the transformers path. They can be combined freely.
--vram_mode: single-GPU layer offload
Pass --vram_mode to keep the language-model layers resident on CPU pinned memory and stream them onto the GPU on-demand during forward, freeing weight VRAM while keeping activations on-device.
| Mode | Behavior | When to use |
|---|---|---|
full (default) |
No offload; whole model on GPU | Plenty of VRAM, best speed |
low |
Synchronous per-layer CPUโGPU swap | Lowest VRAM footprint |
balanced |
Async prefetch overlaps H2D copy with compute | Tight on VRAM but want to recover speed |
python examples/t2i/inference.py \
--model_path sensenova/SenseNova-U1-8B-MoT-Infographic \
--vram_mode balanced \
--prompt "..." --output output.png
--gguf_checkpoint and --vram_mode compose: a Q4 GGUF + balanced is the recommended setup for ~10โ12 GB consumer cards.
โก Run with LightLLM + LightX2V (Recommended)
For production serving, we co-design a dedicated inference stack on top of LightLLM (understanding) and LightX2V (generation). The two engines are disaggregated so that each path can use its own parallelism and resource budget, with a low-overhead transfer channel in between.
On a single node with TP2 + CFG2, this stack delivers roughly ~0.15 s/step and ~9 s end-to-end for a 2048ร2048 image on H100 / H200, with a ~2.4โ3.2ร prefill speedup from our FA3-based hybrid-mask attention over the Triton baseline. Full per-GPU performance are reported in docs/inference_infra.md.
An official docker image is provided for one-command deployment:
docker pull lightx2v/lightllm_lightx2v:20260407
โ๏ธ Deployment guide (Docker, launch flags, modes, quantization, API test): see
docs/deployment.md.๐ Full design and performance profiling: see
docs/inference_infra.md.
๐ Join the Community!
Join our growing community to share feedback, get support, and stay updated on the latest SenseNova-U1 developments โ we'd love to hear from you!
| Discord | WeChat Group |
![]() |
![]() |
โ๏ธ License
This project is released under the Apache 2.0 License.
- Downloads last month
- 16


# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("hardcoremoore/SenseNova-U1-8B-MoT-Infographic", trust_remote_code=True, dtype="auto")