File size: 14,704 Bytes
6ad1f9b
5bebd96
 
 
 
6ad1f9b
 
 
 
5bebd96
 
 
 
 
 
 
6ad1f9b
 
6df9de5
5bebd96
6df9de5
5bebd96
6df9de5
5bebd96
6df9de5
 
 
 
 
 
 
 
 
 
 
 
5bebd96
6df9de5
5bebd96
6df9de5
5bebd96
6df9de5
 
 
 
 
 
 
 
5bebd96
6df9de5
5bebd96
6df9de5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5bebd96
 
 
 
 
 
6df9de5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
---
title: Garment Image  2D Sewing Pattern
emoji: 🧵
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: 6.13.0
app_file: app.py
pinned: false
tags:
  - fashion
  - sewing-pattern
  - garment
  - pattern-making
  - vlm
  - computer-vision
---

# 🧵 Garment Pattern Studio

Upload a garment image or describe one → get flat 2D sewing pattern pieces → see them assembled as a 3D garment on a mannequin → iteratively refine until it matches.

## Architecture

```
┌─────────────┐     ┌──────────────────┐     ┌─────────────────┐     ┌──────────────┐
│ Input Image │────▶│ VLM Analysis     │────▶│ 2D Pattern Gen  │────▶│ 3D Assembly  │
│ or Text     │     │ (garment_type,   │     │ (flat pieces:   │     │ (wrap pieces │
│ or Manual   │     │  measurements,   │     │  bodice, sleeve,│     │  onto body   │
│             │     │  features → JSON) │     │  skirt, etc.)   │     │  mannequin)  │
└─────────────┘     └──────────────────┘     └─────────────────┘     └──────────────┘
                                                      │                       │
                                                      ▼                       ▼
                                              pattern_generator.py    garment_3d.py
                                              (matplotlib 2D image)   (Plotly Mesh3d)
```

### Key Design: 3D is Built FROM the 2D Pattern Pieces

The 3D view is **not** an independent visualization. Each 3D surface is constructed by:

1. **Triangulating** the 2D pattern piece polygon (UV grid mesh, ~150-200 triangles per piece)
2. **Wrapping** it onto the correct body region using geometric projection:
   - Front/Back Bodice → cylindrical wrap around torso (θ = -π/2 to π/2 / π/2 to 3π/2)
   - Sleeve → tilted tube from shoulder outward/downward
   - Front/Back Skirt → cone projection with flare from waist down
   - Front/Back Pant → half-tube legs offset from center at ±hip_rx×0.45
   - Collar, Cuff, Waistband, Hood → each wrapped from its actual 2D piece shape
3. **Rendering** as Plotly `go.Mesh3d` traces — each piece is a named, colored trace in the legend

This means: when you change a measurement, both the 2D pattern AND the 3D garment update consistently.

## Files

| File | Purpose |
|------|---------|
| `app.py` | Gradio UI — 5 tabs (Image, Text, Manual, Chat, Agentic Refinement) |
| `pattern_generator.py` | Parametric 2D sewing pattern engine (matplotlib) |
| `garment_3d.py` | 2D pieces → 3D assembly on mannequin (Plotly Mesh3d) |
| `refinement_loop.py` | Agentic convergence loop (⚠️ WIP — see below) |

## Current Status

### ✅ Working
- **From Text** — describe a garment, get 2D pattern + 3D preview
- **Manual Parameters** — sliders for all measurements, instant 2D+3D
- **Chat & Edit** — natural language edits ("make sleeves longer", "add hood")
- **2D→3D pipeline** — pattern pieces correctly assembled on mannequin

### ⚠️ Broken: VLM Image Analysis & Agentic Refinement
The VLM calls via HF Inference Providers are failing. See [VLM Provider Issues](#vlm-provider-issues) below.

---

## VLM Provider Issues

### Problem
All VLM model+provider combinations tested return errors when called via `router.huggingface.co`:

| Model | Provider | Error |
|-------|----------|-------|
| `Qwen/Qwen2.5-VL-72B-Instruct` | together | `Model not supported by provider together` |
| `Qwen/Qwen3.5-9B` | together | Returns text in `reasoning` only, `content` empty — JSON extraction works but model seems to not actually see images (classifies everything as shirt) |
| `google/gemma-4-31B-it` | together | `Unable to access model` (not available on Together's serverless) |
| `google/gemma-4-31B-it` | novita | `Model not supported by provider novita` |
| `moonshotai/Kimi-K2.5` | fireworks-ai | `Model not supported by provider fireworks-ai` |
| `moonshotai/Kimi-K2.6` | together | Returns empty `content`, answer in `reasoning` — may or may not see images |

### What Actually Works (verified 2026-04-25)

**Confirmed working with image inputs:**

| Model | Provider | Image Support | Notes |
|-------|----------|---------------|-------|
| `meta-llama/Llama-4-Scout-17B-16E-Instruct` | `nscale` | ✅ YES | Answers in `content` field. Correctly identifies garment types. **Best option.** |
| `moonshotai/Kimi-K2.6` | `together` | ✅ YES (partial) | Sees images but answers in `reasoning` field, `content` is empty. Need to extract from `reasoning`. |
| `Qwen/Qwen3.5-9B` | `together` | ⚠️ UNCLEAR | Responds in `reasoning` field. May or may not process image pixels. |

**How to find current working models:**
```bash
# List models available on a provider
curl -s -H "Authorization: Bearer $HF_TOKEN" \
  "https://router.huggingface.co/{provider}/v1/models"

# Test a model with image
curl -s -X POST "https://router.huggingface.co/{provider}/v1/chat/completions" \
  -H "Authorization: Bearer $HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "{model_id}", "messages": [{"role": "user", "content": [
    {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,{b64}"}},
    {"type": "text", "text": "What garment is this?"}
  ]}], "max_tokens": 50}'
```

### Fix Required
Update `VISION_MODELS` in `app.py` and the `models` list in `refinement_loop.py`:

```python
VISION_MODELS = [
    ("meta-llama/Llama-4-Scout-17B-16E-Instruct", "nscale", "Llama-4-Scout"),
    ("moonshotai/Kimi-K2.6", "together", "Kimi K2.6"),
    ("Qwen/Qwen3.5-9B", "together", "Qwen 3.5 9B"),
]
```

Also ensure `_extract_response_text()` checks both `content` and `reasoning` fields (it already does).

---

## Agentic Refinement Loop — Design Document

### Goal
Iteratively refine garment pattern parameters until the 3D garment projection visually matches the original input image.

### Loop Architecture

```
INPUT IMAGE


[1] VLM Analysis → initial garment params JSON


┌─── LOOP (max 8-15 iterations) ──────────────────────────────────┐
│                                                                   │
│  [2] Pattern Generator → 2D sewing pattern pieces                │
│       │                                                           │
│  [3] 3D Assembly → wrap pieces onto mannequin (Plotly Mesh3d)    │
│       │                                                           │
│  [4] 3D→2D Projection → matplotlib renders front-view PNG        │
│       │                                                           │
│  [5] Similarity Metrics (CPU):                                   │
│       • SSIM (structural similarity)                             │
│       • Edge-SSIM (Sobel edge comparison)                        │
│       • Composite = 0.4×SSIM + 0.3×MSE + 0.3×Edge-SSIM         │
│       │                                                           │
│  [6] CONVERGENCE CHECK:                                          │
│       • composite ≥ 0.82 → DONE                                 │
│       • VLM declares "converged" → DONE                          │
│       • Score plateau for 3 iterations → DONE                    │
│       • VLM confidence < 0.2 → DONE                             │
│       • Max iterations → DONE                                    │
│       │                                                           │
│  [7] VLM Visual Comparison:                                      │
│       Send [original_image, projection_image] to VLM             │
│       VLM returns JSON:                                           │
│       {                                                           │
│         "differences": ["sleeves too short", "missing collar"],  │
│         "adjustments": {"sleeve_length": 65, "has_collar": true},│
│         "confidence": 0.7,                                       │
│         "converged": false                                       │
│       }                                                           │
│       │                                                           │
│  [8] Apply Adjustments:                                          │
│       • Damped update: new = old + 0.7 × (suggested - old)      │
│       • Keep-best tracking (only update if score improves)       │
│       │                                                           │
│  └─── LOOP BACK to [2] ─────────────────────────────────────────┘
```

### Implementation Status

| Component | File | Status |
|-----------|------|--------|
| 3D→2D projection (matplotlib renderer) | `refinement_loop.py``render_3d_to_image()` | ✅ Working |
| SSIM + Edge-SSIM metrics | `refinement_loop.py``compute_similarity()` | ✅ Working |
| VLM comparison prompt | `refinement_loop.py``vlm_compare_and_adjust()` | ⚠️ Needs working VLM provider |
| Damped parameter updates | `refinement_loop.py``apply_adjustments()` | ✅ Working |
| Keep-best convergence loop | `refinement_loop.py``refinement_loop()` | ✅ Working (logic) |
| Gradio UI tab | `app.py` → "🔄 Agentic Refinement" tab | ✅ Working (UI) |

### What Needs Fixing
1. **VLM provider** — switch to `meta-llama/Llama-4-Scout-17B-16E-Instruct` via `nscale` (confirmed working with images)
2. **Similarity metric calibration** — current SSIM scores are very high (~0.95+) even for different garments because both images have large white backgrounds. Options:
   - Crop to garment bounding box before comparison
   - Use foreground mask (non-white pixels) before SSIM
   - Rely primarily on VLM visual comparison, use SSIM only as a secondary signal
3. **VLM prompt tuning** — the comparison prompt may need iteration to reliably produce valid JSON adjustments

### Research References for the Approach

| Paper | Key Idea Used |
|-------|---------------|
| [NGL-Prompter](https://arxiv.org/abs/2602.20700) (2025) | Discrete semantic parameter schema — VLMs are better at categorical choices than numeric values |
| [RRVF](https://arxiv.org/abs/2507.20766) (2025) | Two-role pattern: generator + qualitative assessor. Natural language diff feedback. Score progression: 75→83→85 across 3 turns |
| [SceneAssistant](https://arxiv.org/abs/2603.12238) (2026) | Constrained action API for VLM (don't ask for full JSON, ask for atomic adjustments). Max 20 steps, VLM calls `Finish` to stop |
| [AutoFigure](https://arxiv.org/abs/2602.03828) (2026) | Keep-best strategy prevents oscillation. Critic+generator are same LLM with different prompts. 5-10 iterations typical |
| [RLRF](https://arxiv.org/abs/2505.20793) (2025) | SSIM + Edge-SSIM for CPU similarity. CLIP rewards found NOT effective (ablation study). Canny edge reward + pixel L2 accelerates convergence |

---

## Body Mannequin

17-landmark anatomical profile (female, 170cm):

```
Z (cm)   Body Part        RX (cm)  RY (cm)
  0      Feet              4.0      3.0
  7      Ankle             3.0      2.5
 30      Calf              4.5      4.0
 45      Knee              4.0      4.0
 65      Thigh             6.5      6.0
 82      Crotch            8.0      7.5
 88      Hip               9.5      8.5
100      Abdomen           8.0      7.5
104      Waist             7.0      6.5
112      Ribs              8.5      8.0
120      Bust             10.0      9.0
132      Chest             9.0      8.0
140      Upper chest       8.5      7.5
145      Shoulder         11.5      8.0
150      Neck base         3.5      3.5
158      Neck              3.0      3.0
162      Head              5.5      5.0
170      Head top          4.5      4.5
```

Garment radii are derived from circumference measurements: `radius = circumference / (2π)`, with elliptical cross-sections (RX = circ/2π × 1.1, RY = circ/2π × 0.9).

## Supported Garment Types

| Type | Pattern Pieces | 3D Wrapping |
|------|---------------|-------------|
| Shirt/Blouse/Top | Front Bodice, Back Bodice, Sleeve ×2, Collar, Cuff ×2 | Cylinder + tilted tubes |
| Dress | Front Bodice, Back Bodice, Sleeve ×2, Front Skirt, Back Skirt | Cylinder + cone |
| Skirt | Front Skirt, Back Skirt, Waistband | Cone + cylinder |
| Pants/Jeans | Front Pant ×2, Back Pant ×2, Waistband | Half-tube legs + cylinder |
| Jacket/Blazer | Front Bodice, Back Bodice, Sleeve ×2, Collar | Wider cylinder + tubes |
| Hoodie | Front Bodice, Back Bodice, Sleeve ×2, Hood ×2 | Cylinder + tubes + dome |
| Vest | Front Bodice, Back Bodice | Cylinder (no sleeves) |

## Related Resources

- [ChatGarment Dataset](https://huggingface.co/datasets/sy000/ChatGarmentDataset) — 362GB training data
- [GarmageSet](https://huggingface.co/datasets/Style3D/GarmageSet) — 14,801 professional garments
- [GarmentCode DSL](https://github.com/maria-korosteleva/GarmentCode) — Parametric pattern compiler

## Development

```bash
# Local testing
pip install gradio Pillow matplotlib numpy scipy plotly scikit-image requests

# Run locally
python app.py

# Test pattern generation
python -c "from pattern_generator import get_pattern_pieces; print(get_pattern_pieces('dress', {'bust':90,'waist':72,'hip':96,'shoulder_width':40,'bodice_length':42,'sleeve_length':25,'skirt_length':55,'neckline_depth':12,'neckline_width':8,'bicep':28,'wrist':17,'cap_height':12,'flare':8,'has_collar':False,'has_cuffs':False,'has_pockets':False,'has_hood':False,'fit':'fitted'}))"

# Test 3D assembly
python -c "
from pattern_generator import get_pattern_pieces
from garment_3d import create_3d_figure
analysis = {'garment_type':'dress','measurements':{'bust':90,'waist':72,'hip':96,'shoulder_width':40,'bodice_length':42,'sleeve_length':25,'skirt_length':55,'neckline_depth':12,'neckline_width':8,'bicep':28,'wrist':17,'cap_height':12,'flare':8},'features':{'has_collar':False,'has_cuffs':False,'has_pockets':False,'has_hood':False,'fit':'fitted'}}
pieces = get_pattern_pieces('dress', {**analysis['measurements'], **analysis['features']})
fig = create_3d_figure(analysis, pattern_pieces=pieces)
fig.write_html('test_dress.html')
print(f'{len(fig.data)} traces')
"
```