Question concerning your use of Grok 3 in the name
#3
by Drafvan - opened
I don't see xAI listings for Grok 3 basically anywhere on this site. Grok 1 yes, but that's it. So why add that name to your model?
This is a distillation/fine-tuning of the Grok 3 model. That’s why it says gemma3-12B-distilled at the end. It means Grok 3 outputs were used as the dataset to fine-tune and distill into the gemma3-12B model.
So the Grok 3 model wasn’t just referenced, it was directly used as the source for the fine-tuning data. The result is a distilled version of Grok 3 behavior running on gemma3-12B.
This model-naming practice is standard, similar to distilled versions of DeepSeek R1.
For example: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. The model is not actually R1, it is Qwen-1.5B fine-tuned off of R1. Same practice here.
reedmayhew changed discussion status to closed