Spaces:
Running
Running
Complete README with architecture docs, VLM provider findings, and agentic refinement design
Browse files
README.md
CHANGED
|
@@ -16,38 +16,265 @@ tags:
|
|
| 16 |
- computer-vision
|
| 17 |
---
|
| 18 |
|
| 19 |
-
# 🧵 Garment
|
| 20 |
|
| 21 |
-
Upload a garment image or describe one
|
| 22 |
|
| 23 |
-
##
|
| 24 |
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
-
##
|
| 30 |
|
| 31 |
-
|
| 32 |
-
- ✍️ **From Text**: Describe a garment and get pattern pieces
|
| 33 |
-
- 📐 **Manual**: Fine-tune every measurement with sliders
|
| 34 |
-
- Supports: shirts, dresses, skirts, pants, jackets, hoodies, vests
|
| 35 |
-
- Pattern pieces include: cut lines, seam lines, fold lines, grain lines, notches
|
| 36 |
|
| 37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
-
|
| 40 |
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
|
| 44 |
-
|
|
| 45 |
-
|
|
| 46 |
-
|
|
| 47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
## Related Resources
|
| 50 |
|
| 51 |
- [ChatGarment Dataset](https://huggingface.co/datasets/sy000/ChatGarmentDataset) — 362GB training data
|
| 52 |
- [GarmageSet](https://huggingface.co/datasets/Style3D/GarmageSet) — 14,801 professional garments
|
| 53 |
- [GarmentCode DSL](https://github.com/maria-korosteleva/GarmentCode) — Parametric pattern compiler
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
- computer-vision
|
| 17 |
---
|
| 18 |
|
| 19 |
+
# 🧵 Garment Pattern Studio
|
| 20 |
|
| 21 |
+
Upload a garment image or describe one → get flat 2D sewing pattern pieces → see them assembled as a 3D garment on a mannequin → iteratively refine until it matches.
|
| 22 |
|
| 23 |
+
## Architecture
|
| 24 |
|
| 25 |
+
```
|
| 26 |
+
┌─────────────┐ ┌──────────────────┐ ┌─────────────────┐ ┌──────────────┐
|
| 27 |
+
│ Input Image │────▶│ VLM Analysis │────▶│ 2D Pattern Gen │────▶│ 3D Assembly │
|
| 28 |
+
│ or Text │ │ (garment_type, │ │ (flat pieces: │ │ (wrap pieces │
|
| 29 |
+
│ or Manual │ │ measurements, │ │ bodice, sleeve,│ │ onto body │
|
| 30 |
+
│ │ │ features → JSON) │ │ skirt, etc.) │ │ mannequin) │
|
| 31 |
+
└─────────────┘ └──────────────────┘ └─────────────────┘ └──────────────┘
|
| 32 |
+
│ │
|
| 33 |
+
▼ ▼
|
| 34 |
+
pattern_generator.py garment_3d.py
|
| 35 |
+
(matplotlib 2D image) (Plotly Mesh3d)
|
| 36 |
+
```
|
| 37 |
|
| 38 |
+
### Key Design: 3D is Built FROM the 2D Pattern Pieces
|
| 39 |
|
| 40 |
+
The 3D view is **not** an independent visualization. Each 3D surface is constructed by:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
1. **Triangulating** the 2D pattern piece polygon (UV grid mesh, ~150-200 triangles per piece)
|
| 43 |
+
2. **Wrapping** it onto the correct body region using geometric projection:
|
| 44 |
+
- Front/Back Bodice → cylindrical wrap around torso (θ = -π/2 to π/2 / π/2 to 3π/2)
|
| 45 |
+
- Sleeve → tilted tube from shoulder outward/downward
|
| 46 |
+
- Front/Back Skirt → cone projection with flare from waist down
|
| 47 |
+
- Front/Back Pant → half-tube legs offset from center at ±hip_rx×0.45
|
| 48 |
+
- Collar, Cuff, Waistband, Hood → each wrapped from its actual 2D piece shape
|
| 49 |
+
3. **Rendering** as Plotly `go.Mesh3d` traces — each piece is a named, colored trace in the legend
|
| 50 |
|
| 51 |
+
This means: when you change a measurement, both the 2D pattern AND the 3D garment update consistently.
|
| 52 |
|
| 53 |
+
## Files
|
| 54 |
+
|
| 55 |
+
| File | Purpose |
|
| 56 |
+
|------|---------|
|
| 57 |
+
| `app.py` | Gradio UI — 5 tabs (Image, Text, Manual, Chat, Agentic Refinement) |
|
| 58 |
+
| `pattern_generator.py` | Parametric 2D sewing pattern engine (matplotlib) |
|
| 59 |
+
| `garment_3d.py` | 2D pieces → 3D assembly on mannequin (Plotly Mesh3d) |
|
| 60 |
+
| `refinement_loop.py` | Agentic convergence loop (⚠️ WIP — see below) |
|
| 61 |
+
|
| 62 |
+
## Current Status
|
| 63 |
+
|
| 64 |
+
### ✅ Working
|
| 65 |
+
- **From Text** — describe a garment, get 2D pattern + 3D preview
|
| 66 |
+
- **Manual Parameters** — sliders for all measurements, instant 2D+3D
|
| 67 |
+
- **Chat & Edit** — natural language edits ("make sleeves longer", "add hood")
|
| 68 |
+
- **2D→3D pipeline** — pattern pieces correctly assembled on mannequin
|
| 69 |
+
|
| 70 |
+
### ⚠️ Broken: VLM Image Analysis & Agentic Refinement
|
| 71 |
+
The VLM calls via HF Inference Providers are failing. See [VLM Provider Issues](#vlm-provider-issues) below.
|
| 72 |
+
|
| 73 |
+
---
|
| 74 |
+
|
| 75 |
+
## VLM Provider Issues
|
| 76 |
+
|
| 77 |
+
### Problem
|
| 78 |
+
All VLM model+provider combinations tested return errors when called via `router.huggingface.co`:
|
| 79 |
+
|
| 80 |
+
| Model | Provider | Error |
|
| 81 |
+
|-------|----------|-------|
|
| 82 |
+
| `Qwen/Qwen2.5-VL-72B-Instruct` | together | `Model not supported by provider together` |
|
| 83 |
+
| `Qwen/Qwen3.5-9B` | together | Returns text in `reasoning` only, `content` empty — JSON extraction works but model seems to not actually see images (classifies everything as shirt) |
|
| 84 |
+
| `google/gemma-4-31B-it` | together | `Unable to access model` (not available on Together's serverless) |
|
| 85 |
+
| `google/gemma-4-31B-it` | novita | `Model not supported by provider novita` |
|
| 86 |
+
| `moonshotai/Kimi-K2.5` | fireworks-ai | `Model not supported by provider fireworks-ai` |
|
| 87 |
+
| `moonshotai/Kimi-K2.6` | together | Returns empty `content`, answer in `reasoning` — may or may not see images |
|
| 88 |
+
|
| 89 |
+
### What Actually Works (verified 2026-04-25)
|
| 90 |
+
|
| 91 |
+
**Confirmed working with image inputs:**
|
| 92 |
+
|
| 93 |
+
| Model | Provider | Image Support | Notes |
|
| 94 |
+
|-------|----------|---------------|-------|
|
| 95 |
+
| `meta-llama/Llama-4-Scout-17B-16E-Instruct` | `nscale` | ✅ YES | Answers in `content` field. Correctly identifies garment types. **Best option.** |
|
| 96 |
+
| `moonshotai/Kimi-K2.6` | `together` | ✅ YES (partial) | Sees images but answers in `reasoning` field, `content` is empty. Need to extract from `reasoning`. |
|
| 97 |
+
| `Qwen/Qwen3.5-9B` | `together` | ⚠️ UNCLEAR | Responds in `reasoning` field. May or may not process image pixels. |
|
| 98 |
+
|
| 99 |
+
**How to find current working models:**
|
| 100 |
+
```bash
|
| 101 |
+
# List models available on a provider
|
| 102 |
+
curl -s -H "Authorization: Bearer $HF_TOKEN" \
|
| 103 |
+
"https://router.huggingface.co/{provider}/v1/models"
|
| 104 |
+
|
| 105 |
+
# Test a model with image
|
| 106 |
+
curl -s -X POST "https://router.huggingface.co/{provider}/v1/chat/completions" \
|
| 107 |
+
-H "Authorization: Bearer $HF_TOKEN" \
|
| 108 |
+
-H "Content-Type: application/json" \
|
| 109 |
+
-d '{"model": "{model_id}", "messages": [{"role": "user", "content": [
|
| 110 |
+
{"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,{b64}"}},
|
| 111 |
+
{"type": "text", "text": "What garment is this?"}
|
| 112 |
+
]}], "max_tokens": 50}'
|
| 113 |
+
```
|
| 114 |
+
|
| 115 |
+
### Fix Required
|
| 116 |
+
Update `VISION_MODELS` in `app.py` and the `models` list in `refinement_loop.py`:
|
| 117 |
+
|
| 118 |
+
```python
|
| 119 |
+
VISION_MODELS = [
|
| 120 |
+
("meta-llama/Llama-4-Scout-17B-16E-Instruct", "nscale", "Llama-4-Scout"),
|
| 121 |
+
("moonshotai/Kimi-K2.6", "together", "Kimi K2.6"),
|
| 122 |
+
("Qwen/Qwen3.5-9B", "together", "Qwen 3.5 9B"),
|
| 123 |
+
]
|
| 124 |
+
```
|
| 125 |
+
|
| 126 |
+
Also ensure `_extract_response_text()` checks both `content` and `reasoning` fields (it already does).
|
| 127 |
+
|
| 128 |
+
---
|
| 129 |
+
|
| 130 |
+
## Agentic Refinement Loop — Design Document
|
| 131 |
+
|
| 132 |
+
### Goal
|
| 133 |
+
Iteratively refine garment pattern parameters until the 3D garment projection visually matches the original input image.
|
| 134 |
+
|
| 135 |
+
### Loop Architecture
|
| 136 |
+
|
| 137 |
+
```
|
| 138 |
+
INPUT IMAGE
|
| 139 |
+
│
|
| 140 |
+
▼
|
| 141 |
+
[1] VLM Analysis → initial garment params JSON
|
| 142 |
+
│
|
| 143 |
+
▼
|
| 144 |
+
┌─── LOOP (max 8-15 iterations) ──────────────────────────────────┐
|
| 145 |
+
│ │
|
| 146 |
+
│ [2] Pattern Generator → 2D sewing pattern pieces │
|
| 147 |
+
│ │ │
|
| 148 |
+
│ [3] 3D Assembly → wrap pieces onto mannequin (Plotly Mesh3d) │
|
| 149 |
+
│ │ │
|
| 150 |
+
│ [4] 3D→2D Projection → matplotlib renders front-view PNG │
|
| 151 |
+
│ │ │
|
| 152 |
+
│ [5] Similarity Metrics (CPU): │
|
| 153 |
+
│ • SSIM (structural similarity) │
|
| 154 |
+
│ • Edge-SSIM (Sobel edge comparison) │
|
| 155 |
+
│ • Composite = 0.4×SSIM + 0.3×MSE + 0.3×Edge-SSIM │
|
| 156 |
+
│ │ │
|
| 157 |
+
│ [6] CONVERGENCE CHECK: │
|
| 158 |
+
│ • composite ≥ 0.82 → DONE │
|
| 159 |
+
│ • VLM declares "converged" → DONE │
|
| 160 |
+
│ • Score plateau for 3 iterations → DONE │
|
| 161 |
+
│ • VLM confidence < 0.2 → DONE │
|
| 162 |
+
│ • Max iterations → DONE │
|
| 163 |
+
│ │ │
|
| 164 |
+
│ [7] VLM Visual Comparison: │
|
| 165 |
+
│ Send [original_image, projection_image] to VLM │
|
| 166 |
+
│ VLM returns JSON: │
|
| 167 |
+
│ { │
|
| 168 |
+
│ "differences": ["sleeves too short", "missing collar"], │
|
| 169 |
+
│ "adjustments": {"sleeve_length": 65, "has_collar": true},│
|
| 170 |
+
│ "confidence": 0.7, │
|
| 171 |
+
│ "converged": false │
|
| 172 |
+
│ } │
|
| 173 |
+
│ │ │
|
| 174 |
+
│ [8] Apply Adjustments: │
|
| 175 |
+
│ • Damped update: new = old + 0.7 × (suggested - old) │
|
| 176 |
+
│ • Keep-best tracking (only update if score improves) │
|
| 177 |
+
│ │ │
|
| 178 |
+
│ └─── LOOP BACK to [2] ─────────────────────────────────────────┘
|
| 179 |
+
```
|
| 180 |
+
|
| 181 |
+
### Implementation Status
|
| 182 |
+
|
| 183 |
+
| Component | File | Status |
|
| 184 |
+
|-----------|------|--------|
|
| 185 |
+
| 3D→2D projection (matplotlib renderer) | `refinement_loop.py` → `render_3d_to_image()` | ✅ Working |
|
| 186 |
+
| SSIM + Edge-SSIM metrics | `refinement_loop.py` → `compute_similarity()` | ✅ Working |
|
| 187 |
+
| VLM comparison prompt | `refinement_loop.py` → `vlm_compare_and_adjust()` | ⚠️ Needs working VLM provider |
|
| 188 |
+
| Damped parameter updates | `refinement_loop.py` → `apply_adjustments()` | ✅ Working |
|
| 189 |
+
| Keep-best convergence loop | `refinement_loop.py` → `refinement_loop()` | ✅ Working (logic) |
|
| 190 |
+
| Gradio UI tab | `app.py` → "🔄 Agentic Refinement" tab | ✅ Working (UI) |
|
| 191 |
+
|
| 192 |
+
### What Needs Fixing
|
| 193 |
+
1. **VLM provider** — switch to `meta-llama/Llama-4-Scout-17B-16E-Instruct` via `nscale` (confirmed working with images)
|
| 194 |
+
2. **Similarity metric calibration** — current SSIM scores are very high (~0.95+) even for different garments because both images have large white backgrounds. Options:
|
| 195 |
+
- Crop to garment bounding box before comparison
|
| 196 |
+
- Use foreground mask (non-white pixels) before SSIM
|
| 197 |
+
- Rely primarily on VLM visual comparison, use SSIM only as a secondary signal
|
| 198 |
+
3. **VLM prompt tuning** — the comparison prompt may need iteration to reliably produce valid JSON adjustments
|
| 199 |
+
|
| 200 |
+
### Research References for the Approach
|
| 201 |
+
|
| 202 |
+
| Paper | Key Idea Used |
|
| 203 |
+
|-------|---------------|
|
| 204 |
+
| [NGL-Prompter](https://arxiv.org/abs/2602.20700) (2025) | Discrete semantic parameter schema — VLMs are better at categorical choices than numeric values |
|
| 205 |
+
| [RRVF](https://arxiv.org/abs/2507.20766) (2025) | Two-role pattern: generator + qualitative assessor. Natural language diff feedback. Score progression: 75→83→85 across 3 turns |
|
| 206 |
+
| [SceneAssistant](https://arxiv.org/abs/2603.12238) (2026) | Constrained action API for VLM (don't ask for full JSON, ask for atomic adjustments). Max 20 steps, VLM calls `Finish` to stop |
|
| 207 |
+
| [AutoFigure](https://arxiv.org/abs/2602.03828) (2026) | Keep-best strategy prevents oscillation. Critic+generator are same LLM with different prompts. 5-10 iterations typical |
|
| 208 |
+
| [RLRF](https://arxiv.org/abs/2505.20793) (2025) | SSIM + Edge-SSIM for CPU similarity. CLIP rewards found NOT effective (ablation study). Canny edge reward + pixel L2 accelerates convergence |
|
| 209 |
+
|
| 210 |
+
---
|
| 211 |
+
|
| 212 |
+
## Body Mannequin
|
| 213 |
+
|
| 214 |
+
17-landmark anatomical profile (female, 170cm):
|
| 215 |
+
|
| 216 |
+
```
|
| 217 |
+
Z (cm) Body Part RX (cm) RY (cm)
|
| 218 |
+
0 Feet 4.0 3.0
|
| 219 |
+
7 Ankle 3.0 2.5
|
| 220 |
+
30 Calf 4.5 4.0
|
| 221 |
+
45 Knee 4.0 4.0
|
| 222 |
+
65 Thigh 6.5 6.0
|
| 223 |
+
82 Crotch 8.0 7.5
|
| 224 |
+
88 Hip 9.5 8.5
|
| 225 |
+
100 Abdomen 8.0 7.5
|
| 226 |
+
104 Waist 7.0 6.5
|
| 227 |
+
112 Ribs 8.5 8.0
|
| 228 |
+
120 Bust 10.0 9.0
|
| 229 |
+
132 Chest 9.0 8.0
|
| 230 |
+
140 Upper chest 8.5 7.5
|
| 231 |
+
145 Shoulder 11.5 8.0
|
| 232 |
+
150 Neck base 3.5 3.5
|
| 233 |
+
158 Neck 3.0 3.0
|
| 234 |
+
162 Head 5.5 5.0
|
| 235 |
+
170 Head top 4.5 4.5
|
| 236 |
+
```
|
| 237 |
+
|
| 238 |
+
Garment radii are derived from circumference measurements: `radius = circumference / (2π)`, with elliptical cross-sections (RX = circ/2π × 1.1, RY = circ/2π × 0.9).
|
| 239 |
+
|
| 240 |
+
## Supported Garment Types
|
| 241 |
+
|
| 242 |
+
| Type | Pattern Pieces | 3D Wrapping |
|
| 243 |
+
|------|---------------|-------------|
|
| 244 |
+
| Shirt/Blouse/Top | Front Bodice, Back Bodice, Sleeve ×2, Collar, Cuff ×2 | Cylinder + tilted tubes |
|
| 245 |
+
| Dress | Front Bodice, Back Bodice, Sleeve ×2, Front Skirt, Back Skirt | Cylinder + cone |
|
| 246 |
+
| Skirt | Front Skirt, Back Skirt, Waistband | Cone + cylinder |
|
| 247 |
+
| Pants/Jeans | Front Pant ×2, Back Pant ×2, Waistband | Half-tube legs + cylinder |
|
| 248 |
+
| Jacket/Blazer | Front Bodice, Back Bodice, Sleeve ×2, Collar | Wider cylinder + tubes |
|
| 249 |
+
| Hoodie | Front Bodice, Back Bodice, Sleeve ×2, Hood ×2 | Cylinder + tubes + dome |
|
| 250 |
+
| Vest | Front Bodice, Back Bodice | Cylinder (no sleeves) |
|
| 251 |
|
| 252 |
## Related Resources
|
| 253 |
|
| 254 |
- [ChatGarment Dataset](https://huggingface.co/datasets/sy000/ChatGarmentDataset) — 362GB training data
|
| 255 |
- [GarmageSet](https://huggingface.co/datasets/Style3D/GarmageSet) — 14,801 professional garments
|
| 256 |
- [GarmentCode DSL](https://github.com/maria-korosteleva/GarmentCode) — Parametric pattern compiler
|
| 257 |
+
|
| 258 |
+
## Development
|
| 259 |
+
|
| 260 |
+
```bash
|
| 261 |
+
# Local testing
|
| 262 |
+
pip install gradio Pillow matplotlib numpy scipy plotly scikit-image requests
|
| 263 |
+
|
| 264 |
+
# Run locally
|
| 265 |
+
python app.py
|
| 266 |
+
|
| 267 |
+
# Test pattern generation
|
| 268 |
+
python -c "from pattern_generator import get_pattern_pieces; print(get_pattern_pieces('dress', {'bust':90,'waist':72,'hip':96,'shoulder_width':40,'bodice_length':42,'sleeve_length':25,'skirt_length':55,'neckline_depth':12,'neckline_width':8,'bicep':28,'wrist':17,'cap_height':12,'flare':8,'has_collar':False,'has_cuffs':False,'has_pockets':False,'has_hood':False,'fit':'fitted'}))"
|
| 269 |
+
|
| 270 |
+
# Test 3D assembly
|
| 271 |
+
python -c "
|
| 272 |
+
from pattern_generator import get_pattern_pieces
|
| 273 |
+
from garment_3d import create_3d_figure
|
| 274 |
+
analysis = {'garment_type':'dress','measurements':{'bust':90,'waist':72,'hip':96,'shoulder_width':40,'bodice_length':42,'sleeve_length':25,'skirt_length':55,'neckline_depth':12,'neckline_width':8,'bicep':28,'wrist':17,'cap_height':12,'flare':8},'features':{'has_collar':False,'has_cuffs':False,'has_pockets':False,'has_hood':False,'fit':'fitted'}}
|
| 275 |
+
pieces = get_pattern_pieces('dress', {**analysis['measurements'], **analysis['features']})
|
| 276 |
+
fig = create_3d_figure(analysis, pattern_pieces=pieces)
|
| 277 |
+
fig.write_html('test_dress.html')
|
| 278 |
+
print(f'{len(fig.data)} traces')
|
| 279 |
+
"
|
| 280 |
+
```
|