Spaces:

vikashmakeit
/

garment-to-pattern

Running

App Files Files Community

vikashmakeit commited on 13 days ago

Commit

6df9de5

verified ·

1 Parent(s): 99a8fd3

Complete README with architecture docs, VLM provider findings, and agentic refinement design

Browse files

Files changed (1) hide show

README.md +248 -21

README.md CHANGED Viewed

@@ -16,38 +16,265 @@ tags:
   - computer-vision
 ---
-# 🧵 Garment Image → 2D Sewing Pattern
-Upload a garment image or describe one — get flat 2D sewing pattern pieces with seam allowances, grain lines, and notches.
-## How It Works
-1. **Image Analysis**: A Vision-Language Model (Qwen2.5-VL) analyzes the garment to identify type, style, and proportions
-2. **Parameter Extraction**: Structured measurements and features are extracted as JSON
-3. **Pattern Generation**: A parametric pattern engine generates anatomically-correct 2D sewing pattern pieces
-## Features
-- 📸 **From Image**: Upload any garment photo for AI-powered analysis
-- ✍️ **From Text**: Describe a garment and get pattern pieces
-- 📐 **Manual**: Fine-tune every measurement with sliders
-- Supports: shirts, dresses, skirts, pants, jackets, hoodies, vests
-- Pattern pieces include: cut lines, seam lines, fold lines, grain lines, notches
-## Research Background
-Inspired by the latest research in garment-to-pattern conversion:
-| Paper | Year | Approach |
-|-------|------|----------|
-| [ChatGarment](https://arxiv.org/abs/2412.17811) | 2024 | VLM → GarmentCode JSON → sewing patterns |
-| [NGL-Prompter](https://arxiv.org/abs/2602.20700) | 2025 | Training-free VLM + Natural Garment Language |
-| [SewFormer](https://arxiv.org/abs/2311.04218) | 2023 | Two-level Transformer for pattern reconstruction |
-| [GarmentDiffusion](https://arxiv.org/abs/2504.21476) | 2025 | DiT-based multimodal pattern generation |
-| [GarmageNet](https://arxiv.org/abs/2504.01483) | 2025 | Geometry image diffusion for sewing patterns |
 ## Related Resources
 - [ChatGarment Dataset](https://huggingface.co/datasets/sy000/ChatGarmentDataset) — 362GB training data
 - [GarmageSet](https://huggingface.co/datasets/Style3D/GarmageSet) — 14,801 professional garments
 - [GarmentCode DSL](https://github.com/maria-korosteleva/GarmentCode) — Parametric pattern compiler

   - computer-vision
 ---
+# 🧵 Garment Pattern Studio
+Upload a garment image or describe one → get flat 2D sewing pattern pieces → see them assembled as a 3D garment on a mannequin → iteratively refine until it matches.
+## Architecture
+```
+┌─────────────┐     ┌──────────────────┐     ┌─────────────────┐     ┌──────────────┐
+│ Input Image │────▶│ VLM Analysis     │────▶│ 2D Pattern Gen  │────▶│ 3D Assembly  │
+│ or Text     │     │ (garment_type,   │     │ (flat pieces:   │     │ (wrap pieces │
+│ or Manual   │     │  measurements,   │     │  bodice, sleeve,│     │  onto body   │
+│             │     │  features → JSON) │     │  skirt, etc.)   │     │  mannequin)  │
+└─────────────┘     └──────────────────┘     └─────────────────┘     └──────────────┘
+                                                      │                       │
+                                                      ▼                       ▼
+                                              pattern_generator.py    garment_3d.py
+                                              (matplotlib 2D image)   (Plotly Mesh3d)
+```
+### Key Design: 3D is Built FROM the 2D Pattern Pieces
+The 3D view is **not** an independent visualization. Each 3D surface is constructed by:
+1. **Triangulating** the 2D pattern piece polygon (UV grid mesh, ~150-200 triangles per piece)
+2. **Wrapping** it onto the correct body region using geometric projection:
+   - Front/Back Bodice → cylindrical wrap around torso (θ = -π/2 to π/2 / π/2 to 3π/2)
+   - Sleeve → tilted tube from shoulder outward/downward
+   - Front/Back Skirt → cone projection with flare from waist down
+   - Front/Back Pant → half-tube legs offset from center at ±hip_rx×0.45
+   - Collar, Cuff, Waistband, Hood → each wrapped from its actual 2D piece shape
+3. **Rendering** as Plotly `go.Mesh3d` traces — each piece is a named, colored trace in the legend
+This means: when you change a measurement, both the 2D pattern AND the 3D garment update consistently.
+## Files
+| File | Purpose |
+|------|---------|
+| `app.py` | Gradio UI — 5 tabs (Image, Text, Manual, Chat, Agentic Refinement) |
+| `pattern_generator.py` | Parametric 2D sewing pattern engine (matplotlib) |
+| `garment_3d.py` | 2D pieces → 3D assembly on mannequin (Plotly Mesh3d) |
+| `refinement_loop.py` | Agentic convergence loop (⚠️ WIP — see below) |
+## Current Status
+### ✅ Working
+- **From Text** — describe a garment, get 2D pattern + 3D preview
+- **Manual Parameters** — sliders for all measurements, instant 2D+3D
+- **Chat & Edit** — natural language edits ("make sleeves longer", "add hood")
+- **2D→3D pipeline** — pattern pieces correctly assembled on mannequin
+### ⚠️ Broken: VLM Image Analysis & Agentic Refinement
+The VLM calls via HF Inference Providers are failing. See [VLM Provider Issues](#vlm-provider-issues) below.
+---
+## VLM Provider Issues
+### Problem
+All VLM model+provider combinations tested return errors when called via `router.huggingface.co`:
+| Model | Provider | Error |
+|-------|----------|-------|
+| `Qwen/Qwen2.5-VL-72B-Instruct` | together | `Model not supported by provider together` |
+| `Qwen/Qwen3.5-9B` | together | Returns text in `reasoning` only, `content` empty — JSON extraction works but model seems to not actually see images (classifies everything as shirt) |
+| `google/gemma-4-31B-it` | together | `Unable to access model` (not available on Together's serverless) |
+| `google/gemma-4-31B-it` | novita | `Model not supported by provider novita` |
+| `moonshotai/Kimi-K2.5` | fireworks-ai | `Model not supported by provider fireworks-ai` |
+| `moonshotai/Kimi-K2.6` | together | Returns empty `content`, answer in `reasoning` — may or may not see images |
+### What Actually Works (verified 2026-04-25)
+**Confirmed working with image inputs:**
+| Model | Provider | Image Support | Notes |
+|-------|----------|---------------|-------|
+| `meta-llama/Llama-4-Scout-17B-16E-Instruct` | `nscale` | ✅ YES | Answers in `content` field. Correctly identifies garment types. **Best option.** |
+| `moonshotai/Kimi-K2.6` | `together` | ✅ YES (partial) | Sees images but answers in `reasoning` field, `content` is empty. Need to extract from `reasoning`. |
+| `Qwen/Qwen3.5-9B` | `together` | ⚠️ UNCLEAR | Responds in `reasoning` field. May or may not process image pixels. |
+**How to find current working models:**
+```bash
+# List models available on a provider
+curl -s -H "Authorization: Bearer $HF_TOKEN" \
+  "https://router.huggingface.co/{provider}/v1/models"
+# Test a model with image
+curl -s -X POST "https://router.huggingface.co/{provider}/v1/chat/completions" \
+  -H "Authorization: Bearer $HF_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{"model": "{model_id}", "messages": [{"role": "user", "content": [
+    {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,{b64}"}},
+    {"type": "text", "text": "What garment is this?"}
+  ]}], "max_tokens": 50}'
+```
+### Fix Required
+Update `VISION_MODELS` in `app.py` and the `models` list in `refinement_loop.py`:
+```python
+VISION_MODELS = [
+    ("meta-llama/Llama-4-Scout-17B-16E-Instruct", "nscale", "Llama-4-Scout"),
+    ("moonshotai/Kimi-K2.6", "together", "Kimi K2.6"),
+    ("Qwen/Qwen3.5-9B", "together", "Qwen 3.5 9B"),
+]
+```
+Also ensure `_extract_response_text()` checks both `content` and `reasoning` fields (it already does).
+---
+## Agentic Refinement Loop — Design Document
+### Goal
+Iteratively refine garment pattern parameters until the 3D garment projection visually matches the original input image.
+### Loop Architecture
+```
+INPUT IMAGE
+     │
+     ▼
+[1] VLM Analysis → initial garment params JSON
+     │
+     ▼
+┌─── LOOP (max 8-15 iterations) ──────────────────────────────────┐
+│                                                                   │
+│  [2] Pattern Generator → 2D sewing pattern pieces                │
+│       │                                                           │
+│  [3] 3D Assembly → wrap pieces onto mannequin (Plotly Mesh3d)    │
+│       │                                                           │
+│  [4] 3D→2D Projection → matplotlib renders front-view PNG        │
+│       │                                                           │
+│  [5] Similarity Metrics (CPU):                                   │
+│       • SSIM (structural similarity)                             │
+│       • Edge-SSIM (Sobel edge comparison)                        │
+│       • Composite = 0.4×SSIM + 0.3×MSE + 0.3×Edge-SSIM         │
+│       │                                                           │
+│  [6] CONVERGENCE CHECK:                                          │
+│       • composite ≥ 0.82 → DONE                                 │
+│       • VLM declares "converged" → DONE                          │
+│       • Score plateau for 3 iterations → DONE                    │
+│       • VLM confidence < 0.2 → DONE                             │
+│       • Max iterations → DONE                                    │
+│       │                                                           │
+│  [7] VLM Visual Comparison:                                      │
+│       Send [original_image, projection_image] to VLM             │
+│       VLM returns JSON:                                           │
+│       {                                                           │
+│         "differences": ["sleeves too short", "missing collar"],  │
+│         "adjustments": {"sleeve_length": 65, "has_collar": true},│
+│         "confidence": 0.7,                                       │
+│         "converged": false                                       │
+│       }                                                           │
+│       │                                                           │
+│  [8] Apply Adjustments:                                          │
+│       • Damped update: new = old + 0.7 × (suggested - old)      │
+│       • Keep-best tracking (only update if score improves)       │
+│       │                                                           │
+│  └─── LOOP BACK to [2] ─────────────────────────────────────────┘
+```
+### Implementation Status
+| Component | File | Status |
+|-----------|------|--------|
+| 3D→2D projection (matplotlib renderer) | `refinement_loop.py` → `render_3d_to_image()` | ✅ Working |
+| SSIM + Edge-SSIM metrics | `refinement_loop.py` → `compute_similarity()` | ✅ Working |
+| VLM comparison prompt | `refinement_loop.py` → `vlm_compare_and_adjust()` | ⚠️ Needs working VLM provider |
+| Damped parameter updates | `refinement_loop.py` → `apply_adjustments()` | ✅ Working |
+| Keep-best convergence loop | `refinement_loop.py` → `refinement_loop()` | ✅ Working (logic) |
+| Gradio UI tab | `app.py` → "🔄 Agentic Refinement" tab | ✅ Working (UI) |
+### What Needs Fixing
+1. **VLM provider** — switch to `meta-llama/Llama-4-Scout-17B-16E-Instruct` via `nscale` (confirmed working with images)
+2. **Similarity metric calibration** — current SSIM scores are very high (~0.95+) even for different garments because both images have large white backgrounds. Options:
+   - Crop to garment bounding box before comparison
+   - Use foreground mask (non-white pixels) before SSIM
+   - Rely primarily on VLM visual comparison, use SSIM only as a secondary signal
+3. **VLM prompt tuning** — the comparison prompt may need iteration to reliably produce valid JSON adjustments
+### Research References for the Approach
+| Paper | Key Idea Used |
+|-------|---------------|
+| [NGL-Prompter](https://arxiv.org/abs/2602.20700) (2025) | Discrete semantic parameter schema — VLMs are better at categorical choices than numeric values |
+| [RRVF](https://arxiv.org/abs/2507.20766) (2025) | Two-role pattern: generator + qualitative assessor. Natural language diff feedback. Score progression: 75→83→85 across 3 turns |
+| [SceneAssistant](https://arxiv.org/abs/2603.12238) (2026) | Constrained action API for VLM (don't ask for full JSON, ask for atomic adjustments). Max 20 steps, VLM calls `Finish` to stop |
+| [AutoFigure](https://arxiv.org/abs/2602.03828) (2026) | Keep-best strategy prevents oscillation. Critic+generator are same LLM with different prompts. 5-10 iterations typical |
+| [RLRF](https://arxiv.org/abs/2505.20793) (2025) | SSIM + Edge-SSIM for CPU similarity. CLIP rewards found NOT effective (ablation study). Canny edge reward + pixel L2 accelerates convergence |
+---
+## Body Mannequin
+17-landmark anatomical profile (female, 170cm):
+```
+Z (cm)   Body Part        RX (cm)  RY (cm)
+  0      Feet              4.0      3.0
+  7      Ankle             3.0      2.5
+ 30      Calf              4.5      4.0
+ 45      Knee              4.0      4.0
+ 65      Thigh             6.5      6.0
+ 82      Crotch            8.0      7.5
+ 88      Hip               9.5      8.5
+100      Abdomen           8.0      7.5
+104      Waist             7.0      6.5
+112      Ribs              8.5      8.0
+120      Bust             10.0      9.0
+132      Chest             9.0      8.0
+140      Upper chest       8.5      7.5
+145      Shoulder         11.5      8.0
+150      Neck base         3.5      3.5
+158      Neck              3.0      3.0
+162      Head              5.5      5.0
+170      Head top          4.5      4.5
+```
+Garment radii are derived from circumference measurements: `radius = circumference / (2π)`, with elliptical cross-sections (RX = circ/2π × 1.1, RY = circ/2π × 0.9).
+## Supported Garment Types
+| Type | Pattern Pieces | 3D Wrapping |
+|------|---------------|-------------|
+| Shirt/Blouse/Top | Front Bodice, Back Bodice, Sleeve ×2, Collar, Cuff ×2 | Cylinder + tilted tubes |
+| Dress | Front Bodice, Back Bodice, Sleeve ×2, Front Skirt, Back Skirt | Cylinder + cone |
+| Skirt | Front Skirt, Back Skirt, Waistband | Cone + cylinder |
+| Pants/Jeans | Front Pant ×2, Back Pant ×2, Waistband | Half-tube legs + cylinder |
+| Jacket/Blazer | Front Bodice, Back Bodice, Sleeve ×2, Collar | Wider cylinder + tubes |
+| Hoodie | Front Bodice, Back Bodice, Sleeve ×2, Hood ×2 | Cylinder + tubes + dome |
+| Vest | Front Bodice, Back Bodice | Cylinder (no sleeves) |
 ## Related Resources
 - [ChatGarment Dataset](https://huggingface.co/datasets/sy000/ChatGarmentDataset) — 362GB training data
 - [GarmageSet](https://huggingface.co/datasets/Style3D/GarmageSet) — 14,801 professional garments
 - [GarmentCode DSL](https://github.com/maria-korosteleva/GarmentCode) — Parametric pattern compiler
+## Development
+```bash
+# Local testing
+pip install gradio Pillow matplotlib numpy scipy plotly scikit-image requests
+# Run locally
+python app.py
+# Test pattern generation
+python -c "from pattern_generator import get_pattern_pieces; print(get_pattern_pieces('dress', {'bust':90,'waist':72,'hip':96,'shoulder_width':40,'bodice_length':42,'sleeve_length':25,'skirt_length':55,'neckline_depth':12,'neckline_width':8,'bicep':28,'wrist':17,'cap_height':12,'flare':8,'has_collar':False,'has_cuffs':False,'has_pockets':False,'has_hood':False,'fit':'fitted'}))"
+# Test 3D assembly
+python -c "
+from pattern_generator import get_pattern_pieces
+from garment_3d import create_3d_figure
+analysis = {'garment_type':'dress','measurements':{'bust':90,'waist':72,'hip':96,'shoulder_width':40,'bodice_length':42,'sleeve_length':25,'skirt_length':55,'neckline_depth':12,'neckline_width':8,'bicep':28,'wrist':17,'cap_height':12,'flare':8},'features':{'has_collar':False,'has_cuffs':False,'has_pockets':False,'has_hood':False,'fit':'fitted'}}
+pieces = get_pattern_pieces('dress', {**analysis['measurements'], **analysis['features']})
+fig = create_3d_figure(analysis, pattern_pieces=pieces)
+fig.write_html('test_dress.html')
+print(f'{len(fig.data)} traces')
+"
+```