Update README.md
Browse files
README.md
CHANGED
|
@@ -29,31 +29,31 @@ pipeline_tag: image-text-to-text
|
|
| 29 |
[](https://huggingface.co/kadalicious22/snapgate-VL-4B)
|
| 30 |
[](https://snapgate.tech)
|
| 31 |
|
| 32 |
-
**snapgate-code-4B**
|
| 33 |
|
| 34 |
-
*
|
| 35 |
|
| 36 |
</div>
|
| 37 |
|
| 38 |
---
|
| 39 |
|
| 40 |
-
## ๐ง
|
| 41 |
|
| 42 |
-
|
|
| 43 |
|-----------|-----------|
|
| 44 |
-
| ๐ป **Code Generation & Review** |
|
| 45 |
-
| ๐จ **UI/UX Design Analysis** |
|
| 46 |
-
| ๐ผ๏ธ **Design to Code** |
|
| 47 |
-
| ๐๏ธ **Diagram & Architecture** |
|
| 48 |
-
| ๐ธ **Code from Image** |
|
| 49 |
-
| ๐ **Technical Documentation** |
|
| 50 |
|
| 51 |
---
|
| 52 |
|
| 53 |
## ๐ง Training Configuration
|
| 54 |
|
| 55 |
<details>
|
| 56 |
-
<summary><b>
|
| 57 |
|
| 58 |
| Parameter | Value |
|
| 59 |
|-----------|-------|
|
|
@@ -71,10 +71,10 @@ pipeline_tag: image-text-to-text
|
|
| 71 |
| ๐๏ธ Precision | `bfloat16` |
|
| 72 |
| ๐ฅ๏ธ Hardware | NVIDIA T4 ยท Google Colab |
|
| 73 |
| ๐ฆ Dataset | 200 samples internal Snapgate |
|
| 74 |
-
| ๐ท๏ธ
|
| 75 |
| ๐ Format | ShareGPT |
|
| 76 |
|
| 77 |
-
**
|
| 78 |
`code_generation` ยท `code_review` ยท `debugging` ยท `refactoring` ยท `ui_html_css` ยท `ui_react` ยท `ui_tailwind` ยท `design_system` ยท `ux_analysis` ยท `design_to_code`
|
| 79 |
|
| 80 |
</details>
|
|
@@ -83,7 +83,7 @@ pipeline_tag: image-text-to-text
|
|
| 83 |
|
| 84 |
## ๐ Training Progress
|
| 85 |
|
| 86 |
-
Loss
|
| 87 |
|
| 88 |
```
|
| 89 |
Step 5 โโโโโโโโโโโโโโโโโโโโโโ Loss: 1.242
|
|
@@ -105,7 +105,7 @@ Step 75 โโโโโโโโโโโโโโโโโโโโโโ Los
|
|
| 105 |
|
| 106 |
---
|
| 107 |
|
| 108 |
-
## ๐
|
| 109 |
|
| 110 |
### 1. Install Dependencies
|
| 111 |
|
|
@@ -129,11 +129,11 @@ model = Qwen3VLForConditionalGeneration.from_pretrained(
|
|
| 129 |
trust_remote_code=True,
|
| 130 |
)
|
| 131 |
|
| 132 |
-
SYSTEM_PROMPT = """
|
| 133 |
-
|
| 134 |
```
|
| 135 |
|
| 136 |
-
### 3. Inference
|
| 137 |
|
| 138 |
```python
|
| 139 |
from qwen_vl_utils import process_vision_info
|
|
@@ -144,7 +144,7 @@ messages = [
|
|
| 144 |
"role": "user",
|
| 145 |
"content": [
|
| 146 |
{"type": "image", "image": "path/to/your/image.png"},
|
| 147 |
-
{"type": "text", "text": "
|
| 148 |
],
|
| 149 |
},
|
| 150 |
]
|
|
@@ -166,12 +166,12 @@ response = processor.batch_decode(generated, skip_special_tokens=True)[0]
|
|
| 166 |
print(response)
|
| 167 |
```
|
| 168 |
|
| 169 |
-
### 4.
|
| 170 |
|
| 171 |
```python
|
| 172 |
messages = [
|
| 173 |
{"role": "system", "content": SYSTEM_PROMPT},
|
| 174 |
-
{"role": "user", "content": "
|
| 175 |
]
|
| 176 |
|
| 177 |
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
|
@@ -189,18 +189,18 @@ print(response)
|
|
| 189 |
|
| 190 |
---
|
| 191 |
|
| 192 |
-
## โ ๏ธ
|
| 193 |
|
| 194 |
-
- ๐ฆ
|
| 195 |
-
- ๐
|
| 196 |
-
- ๐ฏ
|
| 197 |
-
- ๐ฅ๏ธ
|
| 198 |
|
| 199 |
---
|
| 200 |
|
| 201 |
-
## ๐
|
| 202 |
|
| 203 |
-
|
| 204 |
|
| 205 |
---
|
| 206 |
|
|
@@ -210,7 +210,6 @@ Dirilis di bawah lisensi **Apache 2.0**, mengikuti lisensi base model [Qwen3-VL-
|
|
| 210 |
|---|---|
|
| 211 |
| ๐ Website | [snapgate.tech](https://snapgate.tech) |
|
| 212 |
| ๐ค Base Model | [Qwen/Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) |
|
| 213 |
-
| ๐ง Contact | Via
|
| 214 |
-
|
| 215 |
-
---
|
| 216 |
|
|
|
|
|
|
| 29 |
[](https://huggingface.co/kadalicious22/snapgate-VL-4B)
|
| 30 |
[](https://snapgate.tech)
|
| 31 |
|
| 32 |
+
**snapgate-code-4B** is a multimodal vision-language model fine-tuned from [Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) using **QLoRA**, specifically optimized for **developers** and **designers** โ understanding both images and text with high precision.
|
| 33 |
|
| 34 |
+
*Developed by [Snapgate](https://snapgate.tech) ยท Made with โค๏ธ in Indonesia ๐ฎ๐ฉ*
|
| 35 |
|
| 36 |
</div>
|
| 37 |
|
| 38 |
---
|
| 39 |
|
| 40 |
+
## ๐ง Core Capabilities
|
| 41 |
|
| 42 |
+
| Capability | Description |
|
| 43 |
|-----------|-----------|
|
| 44 |
+
| ๐ป **Code Generation & Review** | Write, analyze, debug, and optimize code (Python, JS, TS, HTML/CSS, SQL, etc.) |
|
| 45 |
+
| ๐จ **UI/UX Design Analysis** | Analyze interface screenshots, provide design suggestions, identify UX issues |
|
| 46 |
+
| ๐ผ๏ธ **Design to Code** | Convert mockups, wireframes, or UI screenshots into HTML/CSS/React/Tailwind code |
|
| 47 |
+
| ๐๏ธ **Diagram & Architecture** | Understand flowcharts, system architecture, ERDs, and technical diagrams |
|
| 48 |
+
| ๐ธ **Code from Image** | Read and explain code from screenshots or photos |
|
| 49 |
+
| ๐ **Technical Documentation** | Generate clear, structured, and professional technical documentation |
|
| 50 |
|
| 51 |
---
|
| 52 |
|
| 53 |
## ๐ง Training Configuration
|
| 54 |
|
| 55 |
<details>
|
| 56 |
+
<summary><b>Click to view training details</b></summary>
|
| 57 |
|
| 58 |
| Parameter | Value |
|
| 59 |
|-----------|-------|
|
|
|
|
| 71 |
| ๐๏ธ Precision | `bfloat16` |
|
| 72 |
| ๐ฅ๏ธ Hardware | NVIDIA T4 ยท Google Colab |
|
| 73 |
| ๐ฆ Dataset | 200 samples internal Snapgate |
|
| 74 |
+
| ๐ท๏ธ Categories | 10 categories ยท 20 samples each |
|
| 75 |
| ๐ Format | ShareGPT |
|
| 76 |
|
| 77 |
+
**Dataset Categories:**
|
| 78 |
`code_generation` ยท `code_review` ยท `debugging` ยท `refactoring` ยท `ui_html_css` ยท `ui_react` ยท `ui_tailwind` ยท `design_system` ยท `ux_analysis` ยท `design_to_code`
|
| 79 |
|
| 80 |
</details>
|
|
|
|
| 83 |
|
| 84 |
## ๐ Training Progress
|
| 85 |
|
| 86 |
+
Loss decreased consistently throughout training โ from **1.242 โ 0.444** โ
|
| 87 |
|
| 88 |
```
|
| 89 |
Step 5 โโโโโโโโโโโโโโโโโโโโโโ Loss: 1.242
|
|
|
|
| 105 |
|
| 106 |
---
|
| 107 |
|
| 108 |
+
## ๐ Usage
|
| 109 |
|
| 110 |
### 1. Install Dependencies
|
| 111 |
|
|
|
|
| 129 |
trust_remote_code=True,
|
| 130 |
)
|
| 131 |
|
| 132 |
+
SYSTEM_PROMPT = """You are Snapgate AI, a multimodal AI assistant by Snapgate \
|
| 133 |
+
specialized in coding and UI/UX design."""
|
| 134 |
```
|
| 135 |
|
| 136 |
+
### 3. Inference with Image
|
| 137 |
|
| 138 |
```python
|
| 139 |
from qwen_vl_utils import process_vision_info
|
|
|
|
| 144 |
"role": "user",
|
| 145 |
"content": [
|
| 146 |
{"type": "image", "image": "path/to/your/image.png"},
|
| 147 |
+
{"type": "text", "text": "Analyze the UI from this image and generate its HTML/CSS code."},
|
| 148 |
],
|
| 149 |
},
|
| 150 |
]
|
|
|
|
| 166 |
print(response)
|
| 167 |
```
|
| 168 |
|
| 169 |
+
### 4. Text-Only Inference
|
| 170 |
|
| 171 |
```python
|
| 172 |
messages = [
|
| 173 |
{"role": "system", "content": SYSTEM_PROMPT},
|
| 174 |
+
{"role": "user", "content": "Write a Python function to validate email using regex."},
|
| 175 |
]
|
| 176 |
|
| 177 |
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
|
|
|
| 189 |
|
| 190 |
---
|
| 191 |
|
| 192 |
+
## โ ๏ธ Limitations
|
| 193 |
|
| 194 |
+
- ๐ฆ Trained on a relatively small internal Snapgate dataset (200 samples) โ performance will improve as more data is added
|
| 195 |
+
- ๐ Optimized for Indonesian and English; other languages have not been tested
|
| 196 |
+
- ๐ฏ Best performance on coding and UI analysis tasks; less optimal for other domains (e.g., science, law, medicine)
|
| 197 |
+
- ๐ฅ๏ธ A GPU with at least 8GB VRAM is recommended for comfortable inference
|
| 198 |
|
| 199 |
---
|
| 200 |
|
| 201 |
+
## ๐ License
|
| 202 |
|
| 203 |
+
Released under the **Apache 2.0** license, following the base model license of [Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct).
|
| 204 |
|
| 205 |
---
|
| 206 |
|
|
|
|
| 210 |
|---|---|
|
| 211 |
| ๐ Website | [snapgate.tech](https://snapgate.tech) |
|
| 212 |
| ๐ค Base Model | [Qwen/Qwen3-VL-4B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct) |
|
| 213 |
+
| ๐ง Contact | Via Snapgate website |
|
|
|
|
|
|
|
| 214 |
|
| 215 |
+
---
|