Question concerning your use of Grok 3 in the name

#3
by Drafvan - opened

I don't see xAI listings for Grok 3 basically anywhere on this site. Grok 1 yes, but that's it. So why add that name to your model?

This is a distillation/fine-tuning of the Grok 3 model. That’s why it says gemma3-12B-distilled at the end. It means Grok 3 outputs were used as the dataset to fine-tune and distill into the gemma3-12B model.

So the Grok 3 model wasn’t just referenced, it was directly used as the source for the fine-tuning data. The result is a distilled version of Grok 3 behavior running on gemma3-12B.

This model-naming practice is standard, similar to distilled versions of DeepSeek R1.
For example: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B. The model is not actually R1, it is Qwen-1.5B fine-tuned off of R1. Same practice here.

reedmayhew changed discussion status to closed

Sign up or log in to comment