garment-to-pattern / README.md
vikashmakeit's picture
Complete README with architecture docs, VLM provider findings, and agentic refinement design
6df9de5 verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: Garment Image  2D Sewing Pattern
emoji: 🧵
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: 6.13.0
app_file: app.py
pinned: false
tags:
  - fashion
  - sewing-pattern
  - garment
  - pattern-making
  - vlm
  - computer-vision

🧵 Garment Pattern Studio

Upload a garment image or describe one → get flat 2D sewing pattern pieces → see them assembled as a 3D garment on a mannequin → iteratively refine until it matches.

Architecture

┌─────────────┐     ┌──────────────────┐     ┌─────────────────┐     ┌──────────────┐
│ Input Image │────▶│ VLM Analysis     │────▶│ 2D Pattern Gen  │────▶│ 3D Assembly  │
│ or Text     │     │ (garment_type,   │     │ (flat pieces:   │     │ (wrap pieces │
│ or Manual   │     │  measurements,   │     │  bodice, sleeve,│     │  onto body   │
│             │     │  features → JSON) │     │  skirt, etc.)   │     │  mannequin)  │
└─────────────┘     └──────────────────┘     └─────────────────┘     └──────────────┘
                                                      │                       │
                                                      ▼                       ▼
                                              pattern_generator.py    garment_3d.py
                                              (matplotlib 2D image)   (Plotly Mesh3d)

Key Design: 3D is Built FROM the 2D Pattern Pieces

The 3D view is not an independent visualization. Each 3D surface is constructed by:

  1. Triangulating the 2D pattern piece polygon (UV grid mesh, ~150-200 triangles per piece)
  2. Wrapping it onto the correct body region using geometric projection:
    • Front/Back Bodice → cylindrical wrap around torso (θ = -π/2 to π/2 / π/2 to 3π/2)
    • Sleeve → tilted tube from shoulder outward/downward
    • Front/Back Skirt → cone projection with flare from waist down
    • Front/Back Pant → half-tube legs offset from center at ±hip_rx×0.45
    • Collar, Cuff, Waistband, Hood → each wrapped from its actual 2D piece shape
  3. Rendering as Plotly go.Mesh3d traces — each piece is a named, colored trace in the legend

This means: when you change a measurement, both the 2D pattern AND the 3D garment update consistently.

Files

File Purpose
app.py Gradio UI — 5 tabs (Image, Text, Manual, Chat, Agentic Refinement)
pattern_generator.py Parametric 2D sewing pattern engine (matplotlib)
garment_3d.py 2D pieces → 3D assembly on mannequin (Plotly Mesh3d)
refinement_loop.py Agentic convergence loop (⚠️ WIP — see below)

Current Status

✅ Working

  • From Text — describe a garment, get 2D pattern + 3D preview
  • Manual Parameters — sliders for all measurements, instant 2D+3D
  • Chat & Edit — natural language edits ("make sleeves longer", "add hood")
  • 2D→3D pipeline — pattern pieces correctly assembled on mannequin

⚠️ Broken: VLM Image Analysis & Agentic Refinement

The VLM calls via HF Inference Providers are failing. See VLM Provider Issues below.


VLM Provider Issues

Problem

All VLM model+provider combinations tested return errors when called via router.huggingface.co:

Model Provider Error
Qwen/Qwen2.5-VL-72B-Instruct together Model not supported by provider together
Qwen/Qwen3.5-9B together Returns text in reasoning only, content empty — JSON extraction works but model seems to not actually see images (classifies everything as shirt)
google/gemma-4-31B-it together Unable to access model (not available on Together's serverless)
google/gemma-4-31B-it novita Model not supported by provider novita
moonshotai/Kimi-K2.5 fireworks-ai Model not supported by provider fireworks-ai
moonshotai/Kimi-K2.6 together Returns empty content, answer in reasoning — may or may not see images

What Actually Works (verified 2026-04-25)

Confirmed working with image inputs:

Model Provider Image Support Notes
meta-llama/Llama-4-Scout-17B-16E-Instruct nscale ✅ YES Answers in content field. Correctly identifies garment types. Best option.
moonshotai/Kimi-K2.6 together ✅ YES (partial) Sees images but answers in reasoning field, content is empty. Need to extract from reasoning.
Qwen/Qwen3.5-9B together ⚠️ UNCLEAR Responds in reasoning field. May or may not process image pixels.

How to find current working models:

# List models available on a provider
curl -s -H "Authorization: Bearer $HF_TOKEN" \
  "https://router.huggingface.co/{provider}/v1/models"

# Test a model with image
curl -s -X POST "https://router.huggingface.co/{provider}/v1/chat/completions" \
  -H "Authorization: Bearer $HF_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "{model_id}", "messages": [{"role": "user", "content": [
    {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,{b64}"}},
    {"type": "text", "text": "What garment is this?"}
  ]}], "max_tokens": 50}'

Fix Required

Update VISION_MODELS in app.py and the models list in refinement_loop.py:

VISION_MODELS = [
    ("meta-llama/Llama-4-Scout-17B-16E-Instruct", "nscale", "Llama-4-Scout"),
    ("moonshotai/Kimi-K2.6", "together", "Kimi K2.6"),
    ("Qwen/Qwen3.5-9B", "together", "Qwen 3.5 9B"),
]

Also ensure _extract_response_text() checks both content and reasoning fields (it already does).


Agentic Refinement Loop — Design Document

Goal

Iteratively refine garment pattern parameters until the 3D garment projection visually matches the original input image.

Loop Architecture

INPUT IMAGE
     │
     ▼
[1] VLM Analysis → initial garment params JSON
     │
     ▼
┌─── LOOP (max 8-15 iterations) ──────────────────────────────────┐
│                                                                   │
│  [2] Pattern Generator → 2D sewing pattern pieces                │
│       │                                                           │
│  [3] 3D Assembly → wrap pieces onto mannequin (Plotly Mesh3d)    │
│       │                                                           │
│  [4] 3D→2D Projection → matplotlib renders front-view PNG        │
│       │                                                           │
│  [5] Similarity Metrics (CPU):                                   │
│       • SSIM (structural similarity)                             │
│       • Edge-SSIM (Sobel edge comparison)                        │
│       • Composite = 0.4×SSIM + 0.3×MSE + 0.3×Edge-SSIM         │
│       │                                                           │
│  [6] CONVERGENCE CHECK:                                          │
│       • composite ≥ 0.82 → DONE                                 │
│       • VLM declares "converged" → DONE                          │
│       • Score plateau for 3 iterations → DONE                    │
│       • VLM confidence < 0.2 → DONE                             │
│       • Max iterations → DONE                                    │
│       │                                                           │
│  [7] VLM Visual Comparison:                                      │
│       Send [original_image, projection_image] to VLM             │
│       VLM returns JSON:                                           │
│       {                                                           │
│         "differences": ["sleeves too short", "missing collar"],  │
│         "adjustments": {"sleeve_length": 65, "has_collar": true},│
│         "confidence": 0.7,                                       │
│         "converged": false                                       │
│       }                                                           │
│       │                                                           │
│  [8] Apply Adjustments:                                          │
│       • Damped update: new = old + 0.7 × (suggested - old)      │
│       • Keep-best tracking (only update if score improves)       │
│       │                                                           │
│  └─── LOOP BACK to [2] ─────────────────────────────────────────┘

Implementation Status

Component File Status
3D→2D projection (matplotlib renderer) refinement_loop.pyrender_3d_to_image() ✅ Working
SSIM + Edge-SSIM metrics refinement_loop.pycompute_similarity() ✅ Working
VLM comparison prompt refinement_loop.pyvlm_compare_and_adjust() ⚠️ Needs working VLM provider
Damped parameter updates refinement_loop.pyapply_adjustments() ✅ Working
Keep-best convergence loop refinement_loop.pyrefinement_loop() ✅ Working (logic)
Gradio UI tab app.py → "🔄 Agentic Refinement" tab ✅ Working (UI)

What Needs Fixing

  1. VLM provider — switch to meta-llama/Llama-4-Scout-17B-16E-Instruct via nscale (confirmed working with images)
  2. Similarity metric calibration — current SSIM scores are very high (~0.95+) even for different garments because both images have large white backgrounds. Options:
    • Crop to garment bounding box before comparison
    • Use foreground mask (non-white pixels) before SSIM
    • Rely primarily on VLM visual comparison, use SSIM only as a secondary signal
  3. VLM prompt tuning — the comparison prompt may need iteration to reliably produce valid JSON adjustments

Research References for the Approach

Paper Key Idea Used
NGL-Prompter (2025) Discrete semantic parameter schema — VLMs are better at categorical choices than numeric values
RRVF (2025) Two-role pattern: generator + qualitative assessor. Natural language diff feedback. Score progression: 75→83→85 across 3 turns
SceneAssistant (2026) Constrained action API for VLM (don't ask for full JSON, ask for atomic adjustments). Max 20 steps, VLM calls Finish to stop
AutoFigure (2026) Keep-best strategy prevents oscillation. Critic+generator are same LLM with different prompts. 5-10 iterations typical
RLRF (2025) SSIM + Edge-SSIM for CPU similarity. CLIP rewards found NOT effective (ablation study). Canny edge reward + pixel L2 accelerates convergence

Body Mannequin

17-landmark anatomical profile (female, 170cm):

Z (cm)   Body Part        RX (cm)  RY (cm)
  0      Feet              4.0      3.0
  7      Ankle             3.0      2.5
 30      Calf              4.5      4.0
 45      Knee              4.0      4.0
 65      Thigh             6.5      6.0
 82      Crotch            8.0      7.5
 88      Hip               9.5      8.5
100      Abdomen           8.0      7.5
104      Waist             7.0      6.5
112      Ribs              8.5      8.0
120      Bust             10.0      9.0
132      Chest             9.0      8.0
140      Upper chest       8.5      7.5
145      Shoulder         11.5      8.0
150      Neck base         3.5      3.5
158      Neck              3.0      3.0
162      Head              5.5      5.0
170      Head top          4.5      4.5

Garment radii are derived from circumference measurements: radius = circumference / (2π), with elliptical cross-sections (RX = circ/2π × 1.1, RY = circ/2π × 0.9).

Supported Garment Types

Type Pattern Pieces 3D Wrapping
Shirt/Blouse/Top Front Bodice, Back Bodice, Sleeve ×2, Collar, Cuff ×2 Cylinder + tilted tubes
Dress Front Bodice, Back Bodice, Sleeve ×2, Front Skirt, Back Skirt Cylinder + cone
Skirt Front Skirt, Back Skirt, Waistband Cone + cylinder
Pants/Jeans Front Pant ×2, Back Pant ×2, Waistband Half-tube legs + cylinder
Jacket/Blazer Front Bodice, Back Bodice, Sleeve ×2, Collar Wider cylinder + tubes
Hoodie Front Bodice, Back Bodice, Sleeve ×2, Hood ×2 Cylinder + tubes + dome
Vest Front Bodice, Back Bodice Cylinder (no sleeves)

Related Resources

Development

# Local testing
pip install gradio Pillow matplotlib numpy scipy plotly scikit-image requests

# Run locally
python app.py

# Test pattern generation
python -c "from pattern_generator import get_pattern_pieces; print(get_pattern_pieces('dress', {'bust':90,'waist':72,'hip':96,'shoulder_width':40,'bodice_length':42,'sleeve_length':25,'skirt_length':55,'neckline_depth':12,'neckline_width':8,'bicep':28,'wrist':17,'cap_height':12,'flare':8,'has_collar':False,'has_cuffs':False,'has_pockets':False,'has_hood':False,'fit':'fitted'}))"

# Test 3D assembly
python -c "
from pattern_generator import get_pattern_pieces
from garment_3d import create_3d_figure
analysis = {'garment_type':'dress','measurements':{'bust':90,'waist':72,'hip':96,'shoulder_width':40,'bodice_length':42,'sleeve_length':25,'skirt_length':55,'neckline_depth':12,'neckline_width':8,'bicep':28,'wrist':17,'cap_height':12,'flare':8},'features':{'has_collar':False,'has_cuffs':False,'has_pockets':False,'has_hood':False,'fit':'fitted'}}
pieces = get_pattern_pieces('dress', {**analysis['measurements'], **analysis['features']})
fig = create_3d_figure(analysis, pattern_pieces=pieces)
fig.write_html('test_dress.html')
print(f'{len(fig.data)} traces')
"