chardizard's picture
Add model card
6b60a20 verified
metadata
license: apache-2.0
base_model: Qwen/Qwen3-8B
tags:
  - rubric-generation
  - sft
  - grubric

SFTQwen3-8B-OpenRubrics-v1

Qwen3-8B full fine-tuned on OpenRubrics v1 (~35.4k examples) for evaluation rubric generation.

Training

  • Base model: Qwen/Qwen3-8B
  • Dataset: OpenRubrics v1 (35,406 examples)
  • Epochs: 1
  • Learning rate: 8e-6 (cosine schedule)
  • Effective batch size: 128 (per-device=2, gradient accumulation=8, 8 GPUs)
  • Max sequence length: 3072

Task

Given a user prompt, the model generates a structured evaluation rubric in [Hard Rule] / [Principle] format. These rubrics are used to judge LLM response quality.

Evaluation

  • ~83.5% format validity on Chatbot Arena prompts
  • Used as the baseline rubric generator in the GRUBRIC pipeline