YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
CAP RLVR GRPO Model - Bluebook Task
This model is a Group Relative Policy Optimization (GRPO) fine-tuned version of Qwen3-14B for legal reasoning tasks, specifically optimized for the bluebook task.
Training Details
- Base Model: Qwen/Qwen3-14B (via SFT checkpoint)
- Task: bluebook (legal bluebook task)
- Training Method: GRPO (Group Relative Policy Optimization)
- Training Pairs: 2988
- Epochs: 3
- Learning Rate: 5e-06
- Final Loss: -0.00191936295795748
Model Architecture
This is a LoRA (Low-Rank Adaptation) model that can be loaded with the base Qwen3-14B model.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-14B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-14B")
# Load GRPO adapter
model = PeftModel.from_pretrained(base_model, "kylebrussell/cap-rlvr-grpo-bluebook")
# Use for bluebook tasks
inputs = tokenizer("Your bluebook query here", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
Dataset
Trained on CAP RLVR (Caselaw Access Project - Reinforcement Learning with Value-based Rewards) dataset, focusing on legal reasoning tasks.
Citation
@misc{cap-rlvr-grpo-bluebook,
title={CAP RLVR GRPO Model for Bluebook Tasks},
author={CAP RLVR Team},
year={2025},
howpublished={\url{https://huggingface.co/kylebrussell/cap-rlvr-grpo-bluebook}}}
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support