Model Card for Model ID

We fine-tuned LLaMA-3.1-8B for 250 steps using GPRO to enhance its reasoning capabilities on the law_and_order dataset available on Hugging Face.

Model Details

This model was trained on the law_and_order dataset and fine-tuned with GPRO to develop strong reasoning abilities. The fine-tuning process enables the model to analyze legal and procedural scenarios more effectively.

Model Description

text = tokenizer.apply_chat_template([
    {"role" : "system", "content" : SYSTEM_PROMPT},
    {"role" : "user", "content" : "Different Cyber law in India?"},
], tokenize = False, add_generation_prompt = True)

from vllm import SamplingParams
sampling_params = SamplingParams(
    temperature = 0.8,
    top_p = 0.95,
    max_tokens = 1024,
)
output = model.fast_generate(
    text,
    sampling_params = sampling_params,
    lora_request = model.load_lora("grpo_saved_lora"),
)[0].outputs[0].text

Downloads last month: 5

Safetensors

Model size

8B params

Tensor type

BF16