vikashmakeit commited on
Commit
6df9de5
·
verified ·
1 Parent(s): 99a8fd3

Complete README with architecture docs, VLM provider findings, and agentic refinement design

Browse files
Files changed (1) hide show
  1. README.md +248 -21
README.md CHANGED
@@ -16,38 +16,265 @@ tags:
16
  - computer-vision
17
  ---
18
 
19
- # 🧵 Garment Image → 2D Sewing Pattern
20
 
21
- Upload a garment image or describe one get flat 2D sewing pattern pieces with seam allowances, grain lines, and notches.
22
 
23
- ## How It Works
24
 
25
- 1. **Image Analysis**: A Vision-Language Model (Qwen2.5-VL) analyzes the garment to identify type, style, and proportions
26
- 2. **Parameter Extraction**: Structured measurements and features are extracted as JSON
27
- 3. **Pattern Generation**: A parametric pattern engine generates anatomically-correct 2D sewing pattern pieces
 
 
 
 
 
 
 
 
 
28
 
29
- ## Features
30
 
31
- - 📸 **From Image**: Upload any garment photo for AI-powered analysis
32
- - ✍️ **From Text**: Describe a garment and get pattern pieces
33
- - 📐 **Manual**: Fine-tune every measurement with sliders
34
- - Supports: shirts, dresses, skirts, pants, jackets, hoodies, vests
35
- - Pattern pieces include: cut lines, seam lines, fold lines, grain lines, notches
36
 
37
- ## Research Background
 
 
 
 
 
 
 
38
 
39
- Inspired by the latest research in garment-to-pattern conversion:
40
 
41
- | Paper | Year | Approach |
42
- |-------|------|----------|
43
- | [ChatGarment](https://arxiv.org/abs/2412.17811) | 2024 | VLM → GarmentCode JSON → sewing patterns |
44
- | [NGL-Prompter](https://arxiv.org/abs/2602.20700) | 2025 | Training-free VLM + Natural Garment Language |
45
- | [SewFormer](https://arxiv.org/abs/2311.04218) | 2023 | Two-level Transformer for pattern reconstruction |
46
- | [GarmentDiffusion](https://arxiv.org/abs/2504.21476) | 2025 | DiT-based multimodal pattern generation |
47
- | [GarmageNet](https://arxiv.org/abs/2504.01483) | 2025 | Geometry image diffusion for sewing patterns |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
  ## Related Resources
50
 
51
  - [ChatGarment Dataset](https://huggingface.co/datasets/sy000/ChatGarmentDataset) — 362GB training data
52
  - [GarmageSet](https://huggingface.co/datasets/Style3D/GarmageSet) — 14,801 professional garments
53
  - [GarmentCode DSL](https://github.com/maria-korosteleva/GarmentCode) — Parametric pattern compiler
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  - computer-vision
17
  ---
18
 
19
+ # 🧵 Garment Pattern Studio
20
 
21
+ Upload a garment image or describe one get flat 2D sewing pattern pieces see them assembled as a 3D garment on a mannequin → iteratively refine until it matches.
22
 
23
+ ## Architecture
24
 
25
+ ```
26
+ ┌─────────────┐ ┌──────────────────┐ ┌─────────────────┐ ┌──────────────┐
27
+ Input Image │────▶│ VLM Analysis │────▶│ 2D Pattern Gen │────▶│ 3D Assembly │
28
+ │ or Text │ │ (garment_type, │ │ (flat pieces: │ │ (wrap pieces │
29
+ │ or Manual │ │ measurements, │ │ bodice, sleeve,│ │ onto body │
30
+ │ │ │ features → JSON) │ │ skirt, etc.) │ │ mannequin) │
31
+ └─────────────┘ └──────────────────┘ └─────────────────┘ └──────────────┘
32
+ │ │
33
+ ▼ ▼
34
+ pattern_generator.py garment_3d.py
35
+ (matplotlib 2D image) (Plotly Mesh3d)
36
+ ```
37
 
38
+ ### Key Design: 3D is Built FROM the 2D Pattern Pieces
39
 
40
+ The 3D view is **not** an independent visualization. Each 3D surface is constructed by:
 
 
 
 
41
 
42
+ 1. **Triangulating** the 2D pattern piece polygon (UV grid mesh, ~150-200 triangles per piece)
43
+ 2. **Wrapping** it onto the correct body region using geometric projection:
44
+ - Front/Back Bodice → cylindrical wrap around torso (θ = -π/2 to π/2 / π/2 to 3π/2)
45
+ - Sleeve → tilted tube from shoulder outward/downward
46
+ - Front/Back Skirt → cone projection with flare from waist down
47
+ - Front/Back Pant → half-tube legs offset from center at ±hip_rx×0.45
48
+ - Collar, Cuff, Waistband, Hood → each wrapped from its actual 2D piece shape
49
+ 3. **Rendering** as Plotly `go.Mesh3d` traces — each piece is a named, colored trace in the legend
50
 
51
+ This means: when you change a measurement, both the 2D pattern AND the 3D garment update consistently.
52
 
53
+ ## Files
54
+
55
+ | File | Purpose |
56
+ |------|---------|
57
+ | `app.py` | Gradio UI 5 tabs (Image, Text, Manual, Chat, Agentic Refinement) |
58
+ | `pattern_generator.py` | Parametric 2D sewing pattern engine (matplotlib) |
59
+ | `garment_3d.py` | 2D pieces 3D assembly on mannequin (Plotly Mesh3d) |
60
+ | `refinement_loop.py` | Agentic convergence loop (⚠️ WIP — see below) |
61
+
62
+ ## Current Status
63
+
64
+ ### ✅ Working
65
+ - **From Text** — describe a garment, get 2D pattern + 3D preview
66
+ - **Manual Parameters** — sliders for all measurements, instant 2D+3D
67
+ - **Chat & Edit** — natural language edits ("make sleeves longer", "add hood")
68
+ - **2D→3D pipeline** — pattern pieces correctly assembled on mannequin
69
+
70
+ ### ⚠️ Broken: VLM Image Analysis & Agentic Refinement
71
+ The VLM calls via HF Inference Providers are failing. See [VLM Provider Issues](#vlm-provider-issues) below.
72
+
73
+ ---
74
+
75
+ ## VLM Provider Issues
76
+
77
+ ### Problem
78
+ All VLM model+provider combinations tested return errors when called via `router.huggingface.co`:
79
+
80
+ | Model | Provider | Error |
81
+ |-------|----------|-------|
82
+ | `Qwen/Qwen2.5-VL-72B-Instruct` | together | `Model not supported by provider together` |
83
+ | `Qwen/Qwen3.5-9B` | together | Returns text in `reasoning` only, `content` empty — JSON extraction works but model seems to not actually see images (classifies everything as shirt) |
84
+ | `google/gemma-4-31B-it` | together | `Unable to access model` (not available on Together's serverless) |
85
+ | `google/gemma-4-31B-it` | novita | `Model not supported by provider novita` |
86
+ | `moonshotai/Kimi-K2.5` | fireworks-ai | `Model not supported by provider fireworks-ai` |
87
+ | `moonshotai/Kimi-K2.6` | together | Returns empty `content`, answer in `reasoning` — may or may not see images |
88
+
89
+ ### What Actually Works (verified 2026-04-25)
90
+
91
+ **Confirmed working with image inputs:**
92
+
93
+ | Model | Provider | Image Support | Notes |
94
+ |-------|----------|---------------|-------|
95
+ | `meta-llama/Llama-4-Scout-17B-16E-Instruct` | `nscale` | ✅ YES | Answers in `content` field. Correctly identifies garment types. **Best option.** |
96
+ | `moonshotai/Kimi-K2.6` | `together` | ✅ YES (partial) | Sees images but answers in `reasoning` field, `content` is empty. Need to extract from `reasoning`. |
97
+ | `Qwen/Qwen3.5-9B` | `together` | ⚠️ UNCLEAR | Responds in `reasoning` field. May or may not process image pixels. |
98
+
99
+ **How to find current working models:**
100
+ ```bash
101
+ # List models available on a provider
102
+ curl -s -H "Authorization: Bearer $HF_TOKEN" \
103
+ "https://router.huggingface.co/{provider}/v1/models"
104
+
105
+ # Test a model with image
106
+ curl -s -X POST "https://router.huggingface.co/{provider}/v1/chat/completions" \
107
+ -H "Authorization: Bearer $HF_TOKEN" \
108
+ -H "Content-Type: application/json" \
109
+ -d '{"model": "{model_id}", "messages": [{"role": "user", "content": [
110
+ {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,{b64}"}},
111
+ {"type": "text", "text": "What garment is this?"}
112
+ ]}], "max_tokens": 50}'
113
+ ```
114
+
115
+ ### Fix Required
116
+ Update `VISION_MODELS` in `app.py` and the `models` list in `refinement_loop.py`:
117
+
118
+ ```python
119
+ VISION_MODELS = [
120
+ ("meta-llama/Llama-4-Scout-17B-16E-Instruct", "nscale", "Llama-4-Scout"),
121
+ ("moonshotai/Kimi-K2.6", "together", "Kimi K2.6"),
122
+ ("Qwen/Qwen3.5-9B", "together", "Qwen 3.5 9B"),
123
+ ]
124
+ ```
125
+
126
+ Also ensure `_extract_response_text()` checks both `content` and `reasoning` fields (it already does).
127
+
128
+ ---
129
+
130
+ ## Agentic Refinement Loop — Design Document
131
+
132
+ ### Goal
133
+ Iteratively refine garment pattern parameters until the 3D garment projection visually matches the original input image.
134
+
135
+ ### Loop Architecture
136
+
137
+ ```
138
+ INPUT IMAGE
139
+
140
+
141
+ [1] VLM Analysis → initial garment params JSON
142
+
143
+
144
+ ┌─── LOOP (max 8-15 iterations) ──────────────────────────────────┐
145
+ │ │
146
+ │ [2] Pattern Generator → 2D sewing pattern pieces │
147
+ │ │ │
148
+ │ [3] 3D Assembly → wrap pieces onto mannequin (Plotly Mesh3d) │
149
+ │ │ │
150
+ │ [4] 3D→2D Projection → matplotlib renders front-view PNG │
151
+ │ │ │
152
+ │ [5] Similarity Metrics (CPU): │
153
+ │ • SSIM (structural similarity) │
154
+ │ • Edge-SSIM (Sobel edge comparison) │
155
+ │ • Composite = 0.4×SSIM + 0.3×MSE + 0.3×Edge-SSIM │
156
+ │ │ │
157
+ │ [6] CONVERGENCE CHECK: │
158
+ │ • composite ≥ 0.82 → DONE │
159
+ │ • VLM declares "converged" → DONE │
160
+ │ • Score plateau for 3 iterations → DONE │
161
+ │ • VLM confidence < 0.2 → DONE │
162
+ │ • Max iterations → DONE │
163
+ │ │ │
164
+ │ [7] VLM Visual Comparison: │
165
+ │ Send [original_image, projection_image] to VLM │
166
+ │ VLM returns JSON: │
167
+ │ { │
168
+ │ "differences": ["sleeves too short", "missing collar"], │
169
+ │ "adjustments": {"sleeve_length": 65, "has_collar": true},│
170
+ │ "confidence": 0.7, │
171
+ │ "converged": false │
172
+ │ } │
173
+ │ │ │
174
+ │ [8] Apply Adjustments: │
175
+ │ • Damped update: new = old + 0.7 × (suggested - old) │
176
+ │ • Keep-best tracking (only update if score improves) │
177
+ │ │ │
178
+ │ └─── LOOP BACK to [2] ─────────────────────────────────────────┘
179
+ ```
180
+
181
+ ### Implementation Status
182
+
183
+ | Component | File | Status |
184
+ |-----------|------|--------|
185
+ | 3D→2D projection (matplotlib renderer) | `refinement_loop.py` → `render_3d_to_image()` | ✅ Working |
186
+ | SSIM + Edge-SSIM metrics | `refinement_loop.py` → `compute_similarity()` | ✅ Working |
187
+ | VLM comparison prompt | `refinement_loop.py` → `vlm_compare_and_adjust()` | ⚠️ Needs working VLM provider |
188
+ | Damped parameter updates | `refinement_loop.py` → `apply_adjustments()` | ✅ Working |
189
+ | Keep-best convergence loop | `refinement_loop.py` → `refinement_loop()` | ✅ Working (logic) |
190
+ | Gradio UI tab | `app.py` → "🔄 Agentic Refinement" tab | ✅ Working (UI) |
191
+
192
+ ### What Needs Fixing
193
+ 1. **VLM provider** — switch to `meta-llama/Llama-4-Scout-17B-16E-Instruct` via `nscale` (confirmed working with images)
194
+ 2. **Similarity metric calibration** — current SSIM scores are very high (~0.95+) even for different garments because both images have large white backgrounds. Options:
195
+ - Crop to garment bounding box before comparison
196
+ - Use foreground mask (non-white pixels) before SSIM
197
+ - Rely primarily on VLM visual comparison, use SSIM only as a secondary signal
198
+ 3. **VLM prompt tuning** — the comparison prompt may need iteration to reliably produce valid JSON adjustments
199
+
200
+ ### Research References for the Approach
201
+
202
+ | Paper | Key Idea Used |
203
+ |-------|---------------|
204
+ | [NGL-Prompter](https://arxiv.org/abs/2602.20700) (2025) | Discrete semantic parameter schema — VLMs are better at categorical choices than numeric values |
205
+ | [RRVF](https://arxiv.org/abs/2507.20766) (2025) | Two-role pattern: generator + qualitative assessor. Natural language diff feedback. Score progression: 75→83→85 across 3 turns |
206
+ | [SceneAssistant](https://arxiv.org/abs/2603.12238) (2026) | Constrained action API for VLM (don't ask for full JSON, ask for atomic adjustments). Max 20 steps, VLM calls `Finish` to stop |
207
+ | [AutoFigure](https://arxiv.org/abs/2602.03828) (2026) | Keep-best strategy prevents oscillation. Critic+generator are same LLM with different prompts. 5-10 iterations typical |
208
+ | [RLRF](https://arxiv.org/abs/2505.20793) (2025) | SSIM + Edge-SSIM for CPU similarity. CLIP rewards found NOT effective (ablation study). Canny edge reward + pixel L2 accelerates convergence |
209
+
210
+ ---
211
+
212
+ ## Body Mannequin
213
+
214
+ 17-landmark anatomical profile (female, 170cm):
215
+
216
+ ```
217
+ Z (cm) Body Part RX (cm) RY (cm)
218
+ 0 Feet 4.0 3.0
219
+ 7 Ankle 3.0 2.5
220
+ 30 Calf 4.5 4.0
221
+ 45 Knee 4.0 4.0
222
+ 65 Thigh 6.5 6.0
223
+ 82 Crotch 8.0 7.5
224
+ 88 Hip 9.5 8.5
225
+ 100 Abdomen 8.0 7.5
226
+ 104 Waist 7.0 6.5
227
+ 112 Ribs 8.5 8.0
228
+ 120 Bust 10.0 9.0
229
+ 132 Chest 9.0 8.0
230
+ 140 Upper chest 8.5 7.5
231
+ 145 Shoulder 11.5 8.0
232
+ 150 Neck base 3.5 3.5
233
+ 158 Neck 3.0 3.0
234
+ 162 Head 5.5 5.0
235
+ 170 Head top 4.5 4.5
236
+ ```
237
+
238
+ Garment radii are derived from circumference measurements: `radius = circumference / (2π)`, with elliptical cross-sections (RX = circ/2π × 1.1, RY = circ/2π × 0.9).
239
+
240
+ ## Supported Garment Types
241
+
242
+ | Type | Pattern Pieces | 3D Wrapping |
243
+ |------|---------------|-------------|
244
+ | Shirt/Blouse/Top | Front Bodice, Back Bodice, Sleeve ×2, Collar, Cuff ×2 | Cylinder + tilted tubes |
245
+ | Dress | Front Bodice, Back Bodice, Sleeve ×2, Front Skirt, Back Skirt | Cylinder + cone |
246
+ | Skirt | Front Skirt, Back Skirt, Waistband | Cone + cylinder |
247
+ | Pants/Jeans | Front Pant ×2, Back Pant ×2, Waistband | Half-tube legs + cylinder |
248
+ | Jacket/Blazer | Front Bodice, Back Bodice, Sleeve ×2, Collar | Wider cylinder + tubes |
249
+ | Hoodie | Front Bodice, Back Bodice, Sleeve ×2, Hood ×2 | Cylinder + tubes + dome |
250
+ | Vest | Front Bodice, Back Bodice | Cylinder (no sleeves) |
251
 
252
  ## Related Resources
253
 
254
  - [ChatGarment Dataset](https://huggingface.co/datasets/sy000/ChatGarmentDataset) — 362GB training data
255
  - [GarmageSet](https://huggingface.co/datasets/Style3D/GarmageSet) — 14,801 professional garments
256
  - [GarmentCode DSL](https://github.com/maria-korosteleva/GarmentCode) — Parametric pattern compiler
257
+
258
+ ## Development
259
+
260
+ ```bash
261
+ # Local testing
262
+ pip install gradio Pillow matplotlib numpy scipy plotly scikit-image requests
263
+
264
+ # Run locally
265
+ python app.py
266
+
267
+ # Test pattern generation
268
+ python -c "from pattern_generator import get_pattern_pieces; print(get_pattern_pieces('dress', {'bust':90,'waist':72,'hip':96,'shoulder_width':40,'bodice_length':42,'sleeve_length':25,'skirt_length':55,'neckline_depth':12,'neckline_width':8,'bicep':28,'wrist':17,'cap_height':12,'flare':8,'has_collar':False,'has_cuffs':False,'has_pockets':False,'has_hood':False,'fit':'fitted'}))"
269
+
270
+ # Test 3D assembly
271
+ python -c "
272
+ from pattern_generator import get_pattern_pieces
273
+ from garment_3d import create_3d_figure
274
+ analysis = {'garment_type':'dress','measurements':{'bust':90,'waist':72,'hip':96,'shoulder_width':40,'bodice_length':42,'sleeve_length':25,'skirt_length':55,'neckline_depth':12,'neckline_width':8,'bicep':28,'wrist':17,'cap_height':12,'flare':8},'features':{'has_collar':False,'has_cuffs':False,'has_pockets':False,'has_hood':False,'fit':'fitted'}}
275
+ pieces = get_pattern_pieces('dress', {**analysis['measurements'], **analysis['features']})
276
+ fig = create_3d_figure(analysis, pattern_pieces=pieces)
277
+ fig.write_html('test_dress.html')
278
+ print(f'{len(fig.data)} traces')
279
+ "
280
+ ```