Base model

#5
by Ethermich - opened

Dear team,

Could you please share more information regarding the base model training?
The intended use case we are evaluating for is fine tuning for writing tasks (fiction, non fiction).
Would you consider the Granite 4.1 a good base for such use case?
Would a o fine-tuning recipe developed on the 8b be applied relatively smoothly to the 30b?

Thanks in advance.

Thanks,
Michael

IBM Granite org

Hi Michael, you can find all the training details in the technical blog: https://huggingface.co/blog/ibm-granite/granite-4-1

So far going well. I wonder though why did you pivot back to dense models?
We haven’t had issues fine tuning moes for our use case. In terms of quality it gave as good results as dense variants while delivering much higher inference speeds…

Sign up or log in to comment