chardizard's picture
Add model card
6b60a20 verified
---
license: apache-2.0
base_model: Qwen/Qwen3-8B
tags:
- rubric-generation
- sft
- grubric
---
# SFTQwen3-8B-OpenRubrics-v1
Qwen3-8B full fine-tuned on [OpenRubrics v1](https://huggingface.co/datasets/maxreciprocate/OpenRubrics) (~35.4k examples) for evaluation rubric generation.
## Training
- **Base model:** Qwen/Qwen3-8B
- **Dataset:** OpenRubrics v1 (35,406 examples)
- **Epochs:** 1
- **Learning rate:** 8e-6 (cosine schedule)
- **Effective batch size:** 128 (per-device=2, gradient accumulation=8, 8 GPUs)
- **Max sequence length:** 3072
## Task
Given a user prompt, the model generates a structured evaluation rubric in `[Hard Rule]` / `[Principle]` format. These rubrics are used to judge LLM response quality.
## Evaluation
- ~83.5% format validity on Chatbot Arena prompts
- Used as the baseline rubric generator in the GRUBRIC pipeline