| --- |
| license: apache-2.0 |
| base_model: Qwen/Qwen3-8B |
| tags: |
| - rubric-generation |
| - sft |
| - grubric |
| --- |
| |
| # SFTQwen3-8B-OpenRubrics-v1v2 |
|
|
| Qwen3-8B full fine-tuned on OpenRubrics v1 + v2 combined (~100k+ examples) for evaluation rubric generation. |
|
|
| ## Training |
|
|
| - **Base model:** Qwen/Qwen3-8B |
| - **Dataset:** OpenRubrics v1 + v2 (~100k examples) |
| - **Epochs:** 1 |
| - **Learning rate:** 8e-6 (cosine schedule) |
| - **Effective batch size:** 128 (per-device=2, gradient accumulation=8, 8 GPUs) |
| - **Max sequence length:** 3072 |
|
|
| ## Task |
|
|
| Given a user prompt, generates a structured evaluation rubric in `[Hard Rule]` / `[Principle]` format for judging LLM response quality. |
|
|
| ## Evaluation |
|
|
| - Used to generate the [SFTQwen8b-ChatbotArena-ARMJudge-Correct](https://huggingface.co/datasets/chardizard/SFTQwen8b-ChatbotArena-ARMJudge-Correct) dataset |
| - Improved format validity over v1-only training |
|
|