SQLStorm GRPO checkpoint (Llama 3.2 3B)
RL checkpoint from GRPO training (rl_finetune_grpo.py) on StackOverflow-style SQL, continuing from abharadwaj123/llama3-sql2plan.
Contents: merged causal LM weights (model.safetensors), tokenizer, config. Optimizer state was omitted to save space.
Load:
from transformers import AutoModelForCausalLM, AutoTokenizer
m = AutoModelForCausalLM.from_pretrained("REPO_ID", torch_dtype="auto", device_map="auto")
tok = AutoTokenizer.from_pretrained("REPO_ID")
- Downloads last month
- 14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support