Qwen3-VL Models
Collection
Garment classification models based on Qwen3-VL (2B) • 7 items • Updated
NVFP4 post-training quantized version of the #1 ranked Denali-AI model (qwen3-vl-2b-sft-grpo-v9, 89.5% weighted). Quantized using NVIDIA ModelOpt with 512-sample calibration. Retains 99.9% JSON parse rate and achieves 74.6% weighted score — ranked #2/13 overall.
| Property | Value |
|---|---|
| Architecture | Qwen3-VL |
| Parameters | 2B |
| Base Model | Qwen/Qwen3-VL-2B |
| Fine-tuned From | Denali-AI/qwen3-vl-2b-sft-grpo-v9 |
| Training | SFT (LoRA) + GRPO (full precision) |
| Quantization | NVFP4 PTQ (ModelOpt, 512 calibration samples) |
| Task | Garment Attribute Extraction (9-field JSON) |
| Output Format | Structured JSON |
Rank #6/21 on eval_hard_3500
| Metric | NVFP4 PTQ | Full Precision | Delta |
|---|---|---|---|
| Weighted Score | 84.2% | 89.5% | -14.9pp |
| SBERT+NLI Combined | 74.1% | 78.5% | -4.4pp |
| JSON Parse Rate | 99.9% | 100% | -0.1pp |
| Throughput | 17.2/s | 15.9/s | +1.3/s |
| Inference Time | 203s | 220s | -17s |
| Field | SBERT | NLI | Levenshtein | Token F1 | SBERT+NLI | Weight |
|---|---|---|---|---|---|---|
| type | 78.5% | 65.4% | 70.6% | 61.2% | 68.3% | 2.5x |
| color | 84.8% | 61.8% | 71.0% | 54.3% | 76.1% | 1.0x |
| pattern | 70.2% | 65.2% | 67.8% | 55.3% | 61.9% | 1.0x |
| closure | 62.6% | 56.9% | 60.9% | 49.1% | 57.0% | 1.0x |
| sleeve | 85.3% | 86.8% | 84.4% | 71.4% | 86.1% | 1.0x |
| neckline | 77.1% | 74.8% | 75.2% | 68.8% | 71.1% | 1.0x |
| defect | 78.1% | 78.3% | 77.8% | 77.4% | 78.2% | 2.0x |
| brand | 79.6% | 79.5% | 80.2% | 78.0% | 79.1% | 1.5x |
| size | 89.6% | 89.5% | 89.5% | 89.4% | 89.4% | 1.5x |
| Rank | Model | Weighted | SBERT+NLI | JSON Parse | Throughput | Inference |
|---|---|---|---|---|---|---|
| 1 | qwen3-vl-8b-sft+grpo | 80.9% | 78.7% | 100% | 7.5/s | 464s |
| 2 | qwen3-vl-2b-sft-grpo-v9 | 79.9% | 78.5% | 100% | 15.9/s | 220s |
| 3 | qwen3-vl-8b-instruct-base | 78.1% | 75.6% | 100% | 5.5/s | 640s |
| 4 | qwen3-vl-8b-instruct-nvfp4 | 77.8% | 75.0% | 100% | 8.2/s | 424s |
| 5 | qwen35-2b-base | 76.2% | 73.0% | 100% | 6.6/s | 534s |
| 6 | qwen3-vl-2b-sft-grpo-v9-nvfp4 >>> | 74.6% | 74.1% | 100% | 17.2/s | 203s |
| 7 | qwen3-vl-2b-instruct-base | 68.0% | 66.7% | 100% | 15.1/s | 231s |
| 8 | internvl3-2b-grpo-gtpo-full | 67.5% | 64.3% | 100% | 11.8/s | 297s |
| 9 | internvl3-2b-grpo-gtpo-fp8 | 67.1% | 63.8% | 100% | 14.3/s | 244s |
| 10 | internvl3-2b-base | 66.8% | 63.7% | 100% | 11.8/s | 297s |
| 11 | moondream2-base | 63.8% | 61.8% | 100% | 1.4/s | 2416s |
| 12 | qwen35-2b-sft-grpo-gtpo-v8 | 60.7% | 60.1% | 100% | 11.3/s | 309s |
| 13 | qwen35-2b-sft-v7 | 58.6% | 58.9% | 100% | 11.6/s | 302s |
| 14 | qwen35-35b-a3b-gptq-int4 | 51.5% | 48.7% | 14% | 1.6/s | 2124s |
| 15 | qwen35-9b-nvfp4-v10 | 48.9% | 46.0% | 8% | 1.7/s | 2075s |
| 16 | qwen35-9b-sft-nvfp4-v11 | 48.3% | 45.5% | 8% | 1.7/s | 2023s |
| 17 | qwen35-2b-base-nvfp4-v10 | 45.9% | 42.9% | 0% | 4.0/s | 878s |
| 18 | qwen3.5-122b-a10b-nvfp4 | 45.9% | 42.9% | 0% | 1.2/s | 2893s |
| 19 | qwen35-2b-sft-nvfp4-v11 | 45.9% | 42.9% | 0% | 4.0/s | 876s |
| 20 | qwen35-2b-sft-grpo-gtpo-nvfp4 | 45.9% | 42.9% | 0% | 3.9/s | 907s |
| 21 | qwen3-vl-8b-sft-grpo | 0.0% | 0.0% | 100% | 0.0/s | 462s |
This model was created using NVIDIA ModelOpt post-training quantization:
export_hf_checkpoint()import modelopt.torch.quantization as mtq
from modelopt.torch.export import export_hf_checkpoint
nvfp4_cfg = mtq.NVFP4_DEFAULT_CFG.copy()
nvfp4_cfg["quant_cfg"]["*lm_head*"] = {"enable": False}
nvfp4_cfg["quant_cfg"]["*model.visual*"] = {"enable": False}
model = mtq.quantize(model, nvfp4_cfg, forward_loop=calibration_loop)
export_hf_checkpoint(model, dtype=torch.bfloat16, export_dir=output_dir)
Models are evaluated on the eval_hard_3500 benchmark using:
| Metric | Description |
|---|---|
| SBERT Cosine | Semantic similarity via sentence-transformers (all-MiniLM-L6-v2) |
| NLI Score | Natural language inference entailment scoring |
| Levenshtein Ratio | Fuzzy string matching |
| Token F1 | Token-level precision/recall |
| Weighted Score | Field-weighted aggregate (type=2.5x, defect=2.0x, brand/size=1.5x) |
@misc{denali-ai-qwen3-vl-2b-sft-grpo-v9-nvfp4,
title={Qwen3-VL-2B SFT+GRPO v9 NVFP4},
author={Denali AI},
year={2026},
publisher={HuggingFace},
url={https://huggingface.co/Denali-AI/qwen3-vl-2b-sft-grpo-v9-nvfp4}
}
This model is released under the Apache 2.0 License.