Really impressive
I have rarely seen a model card that has so many features which are all spot on true - including unfortunately, the memory usage. I've used this extensively with a 32k context. It doesn't seem to degrade over time, it follows orders very well etc. etc.
The only issues I had is that it isn't too imaginative, and kind of caught in, let's say, "limited length response mode" - it's hard to get it to provide longer passages than a few paragraphs, and if, it degrades badly. But other than that, I don't think I've been impressed this much by a <70B model before.
Thanks for your work!
Good day, thank you for your positive feedback. I am very pleased to hear that someone of your importance in the community liked my model. I won't miss the moment to thank you for your work, your quants are always needed.
As for truthfulness, the point is that I make merges primarily in order to use them myself, so I see no point in embellishing the advantages and disadvantages so as not to deceive myself. I test the model in real use in different scenarios, and only then I write a model card based on my impressions recorded during testing. Of course, this is not an objective method, but in my opinion it works better than numerous benchmarks.
Gemma 3 was a very good release, the model is extremely smart for its size, although it is very censored in the original, what was partially fixed by finetunes. I hope that the future gemma 4, other releases and their finetunes, (if it will be any), will be even better, giving people with limited amounts of vram and ram an experience comparable to using large models.
Thank you again for your feedback, I am glad that our opinions about this model coincide.
Have to agree, using the quantized version from mradermacher and it's a VERY good model. Had to lower the context length a bit again, even on my 5090, but absolutely worth it over other models, good job.
Thanks, I'm glad you liked my model, enjoy using it.
I just tried the model and it already failed to follow my instructions on the first response. (Part of the system prompt is that it should write third-person present tense, however it opted to write first-person present tense.) Makes me wonder what else the model ignores in favor of its training data.
Also, what Instruct format am I supposed to use? Gemma 2 & 3? Llama 3/4 Chat? ChatML? Would be nice if you provided that information in the model card.
Would be nice if you provided that information in the model card.
As mentioned in second line of the model card, this is a merge of pre-trained Gemma3 language models, so it utilizes Gemma3 format and sampler parameters. Personally, i prefer to use Gemma T4 preset (sleepdeprived3/Gemma3-T4) with minor modifications in sillytavern.
To better understand your case, i need more information. Open a new discussion and send what quant you use, your parameters and sysprompt. I'll try to help to get model to work for you.
Would be nice if you provided that information in the model card.
As mentioned in second line of the model card, this is a merge of pre-trained Gemma3 language models, so it utilizes Gemma3 format and sampler parameters. Personally, i prefer to use Gemma T4 preset (sleepdeprived3/Gemma3-T4) with minor modifications in sillytavern.
To better understand your case, i need more information. Open a new discussion and send what quant you use, your parameters and sysprompt. I'll try to help to get model to work for you.
It's fine. I took a couple more rounds with the model and it's quite decent. Giving a bit more in present-tense third-person makes the model adapt it instantly.
I use your Q4_K_S GGUF.
I use koboldcpp (KoboldAI Lite) and that only has Gemma 2 & 3 Instruct Format, but it works.
Thank you for your reply.