Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
nbd22
/
Llama-3.1-8B-Instruct-GRPO-gsm8k-ft-lora
like
0
Transformers
Safetensors
Generated from Trainer
trl
grpo
arxiv:
2402.03300
Model card
Files
Files and versions
xet
Community
1
Deploy
Use this model
Need idea about reward function.
#1
by
davinders
- opened
Jan 29, 2025
Discussion
davinders
Jan 29, 2025
This comment has been hidden
davinders
changed discussion status to
closed
Jan 29, 2025
Edit
Preview
Upload images, audio, and videos by dragging in the text input, pasting, or
clicking here
.
Tap or paste here to upload images
Comment
·
Sign up
or
log in
to comment