| --- |
| language: |
| - en |
| license: apache-2.0 |
| library_name: transformers |
| pipeline_tag: text-generation |
| tags: |
| - graphic-design |
| - design-generation |
| - layout-planning |
| - qwen3 |
| base_model: Qwen/Qwen3-8B |
| --- |
| |
| # DesignAsCode Semantic Planner |
|
|
| The Semantic Planner for the [DesignAsCode](https://github.com/liuziyuan1109/design-as-code) pipeline. Given a natural-language design request, it generates a structured design plan β including layout reasoning, layer grouping, image generation prompts, and text element specifications. |
|
|
| ## Model Details |
|
|
| | | | |
| |---|---| |
| | **Base Model** | Qwen3-8B | |
| | **Fine-tuning** | Supervised Fine-Tuning (SFT) | |
| | **Size** | 16 GB (fp16) | |
| | **Context Window** | 8,192 tokens | |
|
|
| ## Training Data |
|
|
| Trained on ~10k examples sampled from the [DesignAsCode Training Data](https://huggingface.co/datasets/Tony1109/DesignAsCode-training-data), which contains 19,479 design samples distilled from the [Crello](https://huggingface.co/datasets/cyberagent/crello) dataset using GPT-4o and GPT-o3. No additional data was used. |
|
|
| ### Training Format |
|
|
| - **Input:** `prompt` β natural-language design request |
| - **Output:** `layout_thought` + `grouping` + `image_generator` + `generate_text` |
|
|
| See the [training data repo](https://huggingface.co/datasets/Tony1109/DesignAsCode-training-data) for field details. |
|
|
| ## Training Configuration |
|
|
| | | | |
| |---|---| |
| | **Batch Size** | 1 | |
| | **Gradient Accumulation** | 2 | |
| | **Learning Rate** | 5e-5 (AdamW) | |
| | **Epochs** | 2 | |
| | **Max Sequence Length** | 8,192 tokens | |
| | **Precision** | bfloat16 | |
| | **Loss** | Completion-only (only on generated tokens) | |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| import torch |
| |
| model_path = "Tony1109/DesignAsCode-planner" |
| tokenizer = AutoTokenizer.from_pretrained(model_path) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_path, |
| torch_dtype=torch.float16, |
| device_map="auto" |
| ) |
| ``` |
|
|
| For full pipeline usage (plan β implement β reflection), see the [project repo](https://github.com/liuziyuan1109/design-as-code) and [Quick Start](https://github.com/liuziyuan1109/design-as-code#quick-start). |
|
|
| ## Outputs |
|
|
| The model generates semi-structured text with XML tags: |
|
|
| - `<layout_thought>...</layout_thought>` β detailed layout reasoning |
| - `<grouping>...</grouping>` β JSON array grouping related layers with thematic labels |
| - `<image_generator>...</image_generator>` β JSON array of per-layer image generation prompts |
| - `<generate_text>...</generate_text>` β JSON array of text element specifications (font, size, alignment, etc.) |
|
|
| ## Ethical Considerations |
|
|
| - Designs should be reviewed by humans before production use. |
| - May reflect biases present in the training data. |
| - Generated content should be checked for copyright compliance. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{liu2026designascode, |
| title = {DesignAsCode: Bridging Structural Editability and |
| Visual Fidelity in Graphic Design Generation}, |
| author = {Liu, Ziyuan and Sun, Shizhao and Huang, Danqing |
| and Shi, Yingdong and Zhang, Meisheng and Li, Ji |
| and Yu, Jingsong and Bian, Jiang}, |
| journal = {arXiv preprint arXiv:2602.17690}, |
| year = {2026}, |
| url = {https://arxiv.org/abs/2602.17690} |
| } |
| ``` |
|
|