metadata
license: apache-2.0
base_model: Qwen/Qwen3-8B
tags:
- rubric-generation
- sft
- grubric
SFTQwen3-8B-OpenRubrics-v1v2
Qwen3-8B full fine-tuned on OpenRubrics v1 + v2 combined (~100k+ examples) for evaluation rubric generation.
Training
- Base model: Qwen/Qwen3-8B
- Dataset: OpenRubrics v1 + v2 (~100k examples)
- Epochs: 1
- Learning rate: 8e-6 (cosine schedule)
- Effective batch size: 128 (per-device=2, gradient accumulation=8, 8 GPUs)
- Max sequence length: 3072
Task
Given a user prompt, generates a structured evaluation rubric in [Hard Rule] / [Principle] format for judging LLM response quality.
Evaluation
- Used to generate the SFTQwen8b-ChatbotArena-ARMJudge-Correct dataset
- Improved format validity over v1-only training