Spaces:
Running
Running
Complete README with architecture docs, VLM provider findings, and agentic refinement design
6df9de5 verified | title: Garment Image → 2D Sewing Pattern | |
| emoji: 🧵 | |
| colorFrom: pink | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 6.13.0 | |
| app_file: app.py | |
| pinned: false | |
| tags: | |
| - fashion | |
| - sewing-pattern | |
| - garment | |
| - pattern-making | |
| - vlm | |
| - computer-vision | |
| # 🧵 Garment Pattern Studio | |
| Upload a garment image or describe one → get flat 2D sewing pattern pieces → see them assembled as a 3D garment on a mannequin → iteratively refine until it matches. | |
| ## Architecture | |
| ``` | |
| ┌─────────────┐ ┌──────────────────┐ ┌─────────────────┐ ┌──────────────┐ | |
| │ Input Image │────▶│ VLM Analysis │────▶│ 2D Pattern Gen │────▶│ 3D Assembly │ | |
| │ or Text │ │ (garment_type, │ │ (flat pieces: │ │ (wrap pieces │ | |
| │ or Manual │ │ measurements, │ │ bodice, sleeve,│ │ onto body │ | |
| │ │ │ features → JSON) │ │ skirt, etc.) │ │ mannequin) │ | |
| └─────────────┘ └──────────────────┘ └─────────────────┘ └──────────────┘ | |
| │ │ | |
| ▼ ▼ | |
| pattern_generator.py garment_3d.py | |
| (matplotlib 2D image) (Plotly Mesh3d) | |
| ``` | |
| ### Key Design: 3D is Built FROM the 2D Pattern Pieces | |
| The 3D view is **not** an independent visualization. Each 3D surface is constructed by: | |
| 1. **Triangulating** the 2D pattern piece polygon (UV grid mesh, ~150-200 triangles per piece) | |
| 2. **Wrapping** it onto the correct body region using geometric projection: | |
| - Front/Back Bodice → cylindrical wrap around torso (θ = -π/2 to π/2 / π/2 to 3π/2) | |
| - Sleeve → tilted tube from shoulder outward/downward | |
| - Front/Back Skirt → cone projection with flare from waist down | |
| - Front/Back Pant → half-tube legs offset from center at ±hip_rx×0.45 | |
| - Collar, Cuff, Waistband, Hood → each wrapped from its actual 2D piece shape | |
| 3. **Rendering** as Plotly `go.Mesh3d` traces — each piece is a named, colored trace in the legend | |
| This means: when you change a measurement, both the 2D pattern AND the 3D garment update consistently. | |
| ## Files | |
| | File | Purpose | | |
| |------|---------| | |
| | `app.py` | Gradio UI — 5 tabs (Image, Text, Manual, Chat, Agentic Refinement) | | |
| | `pattern_generator.py` | Parametric 2D sewing pattern engine (matplotlib) | | |
| | `garment_3d.py` | 2D pieces → 3D assembly on mannequin (Plotly Mesh3d) | | |
| | `refinement_loop.py` | Agentic convergence loop (⚠️ WIP — see below) | | |
| ## Current Status | |
| ### ✅ Working | |
| - **From Text** — describe a garment, get 2D pattern + 3D preview | |
| - **Manual Parameters** — sliders for all measurements, instant 2D+3D | |
| - **Chat & Edit** — natural language edits ("make sleeves longer", "add hood") | |
| - **2D→3D pipeline** — pattern pieces correctly assembled on mannequin | |
| ### ⚠️ Broken: VLM Image Analysis & Agentic Refinement | |
| The VLM calls via HF Inference Providers are failing. See [VLM Provider Issues](#vlm-provider-issues) below. | |
| --- | |
| ## VLM Provider Issues | |
| ### Problem | |
| All VLM model+provider combinations tested return errors when called via `router.huggingface.co`: | |
| | Model | Provider | Error | | |
| |-------|----------|-------| | |
| | `Qwen/Qwen2.5-VL-72B-Instruct` | together | `Model not supported by provider together` | | |
| | `Qwen/Qwen3.5-9B` | together | Returns text in `reasoning` only, `content` empty — JSON extraction works but model seems to not actually see images (classifies everything as shirt) | | |
| | `google/gemma-4-31B-it` | together | `Unable to access model` (not available on Together's serverless) | | |
| | `google/gemma-4-31B-it` | novita | `Model not supported by provider novita` | | |
| | `moonshotai/Kimi-K2.5` | fireworks-ai | `Model not supported by provider fireworks-ai` | | |
| | `moonshotai/Kimi-K2.6` | together | Returns empty `content`, answer in `reasoning` — may or may not see images | | |
| ### What Actually Works (verified 2026-04-25) | |
| **Confirmed working with image inputs:** | |
| | Model | Provider | Image Support | Notes | | |
| |-------|----------|---------------|-------| | |
| | `meta-llama/Llama-4-Scout-17B-16E-Instruct` | `nscale` | ✅ YES | Answers in `content` field. Correctly identifies garment types. **Best option.** | | |
| | `moonshotai/Kimi-K2.6` | `together` | ✅ YES (partial) | Sees images but answers in `reasoning` field, `content` is empty. Need to extract from `reasoning`. | | |
| | `Qwen/Qwen3.5-9B` | `together` | ⚠️ UNCLEAR | Responds in `reasoning` field. May or may not process image pixels. | | |
| **How to find current working models:** | |
| ```bash | |
| # List models available on a provider | |
| curl -s -H "Authorization: Bearer $HF_TOKEN" \ | |
| "https://router.huggingface.co/{provider}/v1/models" | |
| # Test a model with image | |
| curl -s -X POST "https://router.huggingface.co/{provider}/v1/chat/completions" \ | |
| -H "Authorization: Bearer $HF_TOKEN" \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"model": "{model_id}", "messages": [{"role": "user", "content": [ | |
| {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,{b64}"}}, | |
| {"type": "text", "text": "What garment is this?"} | |
| ]}], "max_tokens": 50}' | |
| ``` | |
| ### Fix Required | |
| Update `VISION_MODELS` in `app.py` and the `models` list in `refinement_loop.py`: | |
| ```python | |
| VISION_MODELS = [ | |
| ("meta-llama/Llama-4-Scout-17B-16E-Instruct", "nscale", "Llama-4-Scout"), | |
| ("moonshotai/Kimi-K2.6", "together", "Kimi K2.6"), | |
| ("Qwen/Qwen3.5-9B", "together", "Qwen 3.5 9B"), | |
| ] | |
| ``` | |
| Also ensure `_extract_response_text()` checks both `content` and `reasoning` fields (it already does). | |
| --- | |
| ## Agentic Refinement Loop — Design Document | |
| ### Goal | |
| Iteratively refine garment pattern parameters until the 3D garment projection visually matches the original input image. | |
| ### Loop Architecture | |
| ``` | |
| INPUT IMAGE | |
| │ | |
| ▼ | |
| [1] VLM Analysis → initial garment params JSON | |
| │ | |
| ▼ | |
| ┌─── LOOP (max 8-15 iterations) ──────────────────────────────────┐ | |
| │ │ | |
| │ [2] Pattern Generator → 2D sewing pattern pieces │ | |
| │ │ │ | |
| │ [3] 3D Assembly → wrap pieces onto mannequin (Plotly Mesh3d) │ | |
| │ │ │ | |
| │ [4] 3D→2D Projection → matplotlib renders front-view PNG │ | |
| │ │ │ | |
| │ [5] Similarity Metrics (CPU): │ | |
| │ • SSIM (structural similarity) │ | |
| │ • Edge-SSIM (Sobel edge comparison) │ | |
| │ • Composite = 0.4×SSIM + 0.3×MSE + 0.3×Edge-SSIM │ | |
| │ │ │ | |
| │ [6] CONVERGENCE CHECK: │ | |
| │ • composite ≥ 0.82 → DONE │ | |
| │ • VLM declares "converged" → DONE │ | |
| │ • Score plateau for 3 iterations → DONE │ | |
| │ • VLM confidence < 0.2 → DONE │ | |
| │ • Max iterations → DONE │ | |
| │ │ │ | |
| │ [7] VLM Visual Comparison: │ | |
| │ Send [original_image, projection_image] to VLM │ | |
| │ VLM returns JSON: │ | |
| │ { │ | |
| │ "differences": ["sleeves too short", "missing collar"], │ | |
| │ "adjustments": {"sleeve_length": 65, "has_collar": true},│ | |
| │ "confidence": 0.7, │ | |
| │ "converged": false │ | |
| │ } │ | |
| │ │ │ | |
| │ [8] Apply Adjustments: │ | |
| │ • Damped update: new = old + 0.7 × (suggested - old) │ | |
| │ • Keep-best tracking (only update if score improves) │ | |
| │ │ │ | |
| │ └─── LOOP BACK to [2] ─────────────────────────────────────────┘ | |
| ``` | |
| ### Implementation Status | |
| | Component | File | Status | | |
| |-----------|------|--------| | |
| | 3D→2D projection (matplotlib renderer) | `refinement_loop.py` → `render_3d_to_image()` | ✅ Working | | |
| | SSIM + Edge-SSIM metrics | `refinement_loop.py` → `compute_similarity()` | ✅ Working | | |
| | VLM comparison prompt | `refinement_loop.py` → `vlm_compare_and_adjust()` | ⚠️ Needs working VLM provider | | |
| | Damped parameter updates | `refinement_loop.py` → `apply_adjustments()` | ✅ Working | | |
| | Keep-best convergence loop | `refinement_loop.py` → `refinement_loop()` | ✅ Working (logic) | | |
| | Gradio UI tab | `app.py` → "🔄 Agentic Refinement" tab | ✅ Working (UI) | | |
| ### What Needs Fixing | |
| 1. **VLM provider** — switch to `meta-llama/Llama-4-Scout-17B-16E-Instruct` via `nscale` (confirmed working with images) | |
| 2. **Similarity metric calibration** — current SSIM scores are very high (~0.95+) even for different garments because both images have large white backgrounds. Options: | |
| - Crop to garment bounding box before comparison | |
| - Use foreground mask (non-white pixels) before SSIM | |
| - Rely primarily on VLM visual comparison, use SSIM only as a secondary signal | |
| 3. **VLM prompt tuning** — the comparison prompt may need iteration to reliably produce valid JSON adjustments | |
| ### Research References for the Approach | |
| | Paper | Key Idea Used | | |
| |-------|---------------| | |
| | [NGL-Prompter](https://arxiv.org/abs/2602.20700) (2025) | Discrete semantic parameter schema — VLMs are better at categorical choices than numeric values | | |
| | [RRVF](https://arxiv.org/abs/2507.20766) (2025) | Two-role pattern: generator + qualitative assessor. Natural language diff feedback. Score progression: 75→83→85 across 3 turns | | |
| | [SceneAssistant](https://arxiv.org/abs/2603.12238) (2026) | Constrained action API for VLM (don't ask for full JSON, ask for atomic adjustments). Max 20 steps, VLM calls `Finish` to stop | | |
| | [AutoFigure](https://arxiv.org/abs/2602.03828) (2026) | Keep-best strategy prevents oscillation. Critic+generator are same LLM with different prompts. 5-10 iterations typical | | |
| | [RLRF](https://arxiv.org/abs/2505.20793) (2025) | SSIM + Edge-SSIM for CPU similarity. CLIP rewards found NOT effective (ablation study). Canny edge reward + pixel L2 accelerates convergence | | |
| --- | |
| ## Body Mannequin | |
| 17-landmark anatomical profile (female, 170cm): | |
| ``` | |
| Z (cm) Body Part RX (cm) RY (cm) | |
| 0 Feet 4.0 3.0 | |
| 7 Ankle 3.0 2.5 | |
| 30 Calf 4.5 4.0 | |
| 45 Knee 4.0 4.0 | |
| 65 Thigh 6.5 6.0 | |
| 82 Crotch 8.0 7.5 | |
| 88 Hip 9.5 8.5 | |
| 100 Abdomen 8.0 7.5 | |
| 104 Waist 7.0 6.5 | |
| 112 Ribs 8.5 8.0 | |
| 120 Bust 10.0 9.0 | |
| 132 Chest 9.0 8.0 | |
| 140 Upper chest 8.5 7.5 | |
| 145 Shoulder 11.5 8.0 | |
| 150 Neck base 3.5 3.5 | |
| 158 Neck 3.0 3.0 | |
| 162 Head 5.5 5.0 | |
| 170 Head top 4.5 4.5 | |
| ``` | |
| Garment radii are derived from circumference measurements: `radius = circumference / (2π)`, with elliptical cross-sections (RX = circ/2π × 1.1, RY = circ/2π × 0.9). | |
| ## Supported Garment Types | |
| | Type | Pattern Pieces | 3D Wrapping | | |
| |------|---------------|-------------| | |
| | Shirt/Blouse/Top | Front Bodice, Back Bodice, Sleeve ×2, Collar, Cuff ×2 | Cylinder + tilted tubes | | |
| | Dress | Front Bodice, Back Bodice, Sleeve ×2, Front Skirt, Back Skirt | Cylinder + cone | | |
| | Skirt | Front Skirt, Back Skirt, Waistband | Cone + cylinder | | |
| | Pants/Jeans | Front Pant ×2, Back Pant ×2, Waistband | Half-tube legs + cylinder | | |
| | Jacket/Blazer | Front Bodice, Back Bodice, Sleeve ×2, Collar | Wider cylinder + tubes | | |
| | Hoodie | Front Bodice, Back Bodice, Sleeve ×2, Hood ×2 | Cylinder + tubes + dome | | |
| | Vest | Front Bodice, Back Bodice | Cylinder (no sleeves) | | |
| ## Related Resources | |
| - [ChatGarment Dataset](https://huggingface.co/datasets/sy000/ChatGarmentDataset) — 362GB training data | |
| - [GarmageSet](https://huggingface.co/datasets/Style3D/GarmageSet) — 14,801 professional garments | |
| - [GarmentCode DSL](https://github.com/maria-korosteleva/GarmentCode) — Parametric pattern compiler | |
| ## Development | |
| ```bash | |
| # Local testing | |
| pip install gradio Pillow matplotlib numpy scipy plotly scikit-image requests | |
| # Run locally | |
| python app.py | |
| # Test pattern generation | |
| python -c "from pattern_generator import get_pattern_pieces; print(get_pattern_pieces('dress', {'bust':90,'waist':72,'hip':96,'shoulder_width':40,'bodice_length':42,'sleeve_length':25,'skirt_length':55,'neckline_depth':12,'neckline_width':8,'bicep':28,'wrist':17,'cap_height':12,'flare':8,'has_collar':False,'has_cuffs':False,'has_pockets':False,'has_hood':False,'fit':'fitted'}))" | |
| # Test 3D assembly | |
| python -c " | |
| from pattern_generator import get_pattern_pieces | |
| from garment_3d import create_3d_figure | |
| analysis = {'garment_type':'dress','measurements':{'bust':90,'waist':72,'hip':96,'shoulder_width':40,'bodice_length':42,'sleeve_length':25,'skirt_length':55,'neckline_depth':12,'neckline_width':8,'bicep':28,'wrist':17,'cap_height':12,'flare':8},'features':{'has_collar':False,'has_cuffs':False,'has_pockets':False,'has_hood':False,'fit':'fitted'}} | |
| pieces = get_pattern_pieces('dress', {**analysis['measurements'], **analysis['features']}) | |
| fig = create_3d_figure(analysis, pattern_pieces=pieces) | |
| fig.write_html('test_dress.html') | |
| print(f'{len(fig.data)} traces') | |
| " | |
| ``` | |