SFTQwen3-8B-OpenRubrics-v1
Qwen3-8B full fine-tuned on OpenRubrics v1 (~35.4k examples) for evaluation rubric generation.
Training
- Base model: Qwen/Qwen3-8B
- Dataset: OpenRubrics v1 (35,406 examples)
- Epochs: 1
- Learning rate: 8e-6 (cosine schedule)
- Effective batch size: 128 (per-device=2, gradient accumulation=8, 8 GPUs)
- Max sequence length: 3072
Task
Given a user prompt, the model generates a structured evaluation rubric in [Hard Rule] / [Principle] format. These rubrics are used to judge LLM response quality.
Evaluation
- ~83.5% format validity on Chatbot Arena prompts
- Used as the baseline rubric generator in the GRUBRIC pipeline
- Downloads last month
- 14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support