Qwen Visual Design Judge
A fine-tuned Qwen3.5-0.8B model that judges visual design quality between image pairs.
π― Performance
| Metric | Score |
|---|---|
| Overall Accuracy | 82% |
| High agreement pairs (β₯80%) | 90.9% |
| Low agreement pairs (<80%) | 79.5% |
Matches GPT-4.1 performance while being ~1000x cheaper to run locally!
π Training
- Base model: Qwen/Qwen3.5-0.8B
- Training data: 40K synthetic preference pairs labeled by GPT-4.1
- Domains: Landing pages, websites, mobile UI, graphics
- Epochs: 1
- Hardware: NVIDIA T4 GPU (~13 hours)
π Usage
import torch
from transformers import Qwen3_5ForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
model = Qwen3_5ForConditionalGeneration.from_pretrained(
"DillonNys/qwen-visual-design-judge",
torch_dtype=torch.bfloat16,
device_map="cuda",
)
processor = AutoProcessor.from_pretrained("DillonNys/qwen-visual-design-judge")
def judge_pair(img_a: str, img_b: str) -> str:
prompt = """You are an expert visual design judge. Compare these two images and determine which has better visual design quality.
Consider: layout, typography, color harmony, visual hierarchy, spacing, and overall aesthetic appeal.
Respond with ONLY "A" or "B" to indicate the better design."""
messages = [{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "text", "text": "\n\nImage A:"},
{"type": "image", "image": img_a},
{"type": "text", "text": "\n\nImage B:"},
{"type": "image", "image": img_b},
{"type": "text", "text": "\n\nWhich is better? Answer A or B:"},
],
}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt").to("cuda")
with torch.no_grad():
output_ids = model.generate(**inputs, max_new_tokens=8, do_sample=False)
response = processor.decode(output_ids[0, inputs.input_ids.shape[1]:], skip_special_tokens=True).strip()
return "A" if "A" in response.upper() else "B"
# Example
winner = judge_pair("design_a.png", "design_b.png")
print(f"Better design: {winner}")
π Citation
If you use this model, please cite:
@misc{qwen-visual-design-judge,
author = {Dillon Nys},
title = {Qwen Visual Design Judge},
year = {2026},
publisher = {HuggingFace},
url = {https://huggingface.co/DillonNys/qwen-visual-design-judge}
}
π Acknowledgments
- Qwen team for the excellent base model
- OpenAI for GPT-4.1 used in synthetic labeling
- The Vibe Arena community for preference data
- Downloads last month
- 19