CAP RLVR GRPO Model - Bluebook Task

This model is a Group Relative Policy Optimization (GRPO) fine-tuned version of Qwen3-14B for legal reasoning tasks, specifically optimized for the bluebook task.

Training Details

Base Model: Qwen/Qwen3-14B (via SFT checkpoint)
Task: bluebook (legal bluebook task)
Training Method: GRPO (Group Relative Policy Optimization)
Training Pairs: 2988
Epochs: 3
Learning Rate: 5e-06
Final Loss: -0.00191936295795748

Model Architecture

This is a LoRA (Low-Rank Adaptation) model that can be loaded with the base Qwen3-14B model.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-14B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-14B")

# Load GRPO adapter
model = PeftModel.from_pretrained(base_model, "kylebrussell/cap-rlvr-grpo-bluebook")

# Use for bluebook tasks
inputs = tokenizer("Your bluebook query here", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Dataset

Trained on CAP RLVR (Caselaw Access Project - Reinforcement Learning with Value-based Rewards) dataset, focusing on legal reasoning tasks.

Citation

@misc{cap-rlvr-grpo-bluebook,
  title={CAP RLVR GRPO Model for Bluebook Tasks},
  author={CAP RLVR Team},
  year={2025},
  howpublished={\url{https://huggingface.co/kylebrussell/cap-rlvr-grpo-bluebook}}}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support