GSM8k-GRPO
Collection
20 items • Updated
Merged model fine-tuned from deepseek-ai/deepseek-llm-7b-chat on GSM8K using GRPO.
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("rghosh8/gsm8k-deepseek-llm-7b-chat-rajat-seed-42-G-16_merged", torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("rghosh8/gsm8k-deepseek-llm-7b-chat-rajat-seed-42-G-16_merged")
Base model
deepseek-ai/deepseek-llm-7b-chat