Model Card for Model ID
We fine-tuned LLaMA-3.1-8B for 250 steps using GPRO to enhance its reasoning capabilities on the law_and_order dataset available on Hugging Face.
Model Details
This model was trained on the law_and_order dataset and fine-tuned with GPRO to develop strong reasoning abilities. The fine-tuning process enables the model to analyze legal and procedural scenarios more effectively.
Model Description
text = tokenizer.apply_chat_template([
{"role" : "system", "content" : SYSTEM_PROMPT},
{"role" : "user", "content" : "Different Cyber law in India?"},
], tokenize = False, add_generation_prompt = True)
from vllm import SamplingParams
sampling_params = SamplingParams(
temperature = 0.8,
top_p = 0.95,
max_tokens = 1024,
)
output = model.fast_generate(
text,
sampling_params = sampling_params,
lora_request = model.load_lora("grpo_saved_lora"),
)[0].outputs[0].text
- Downloads last month
- 5