Kudos - a superb Ai model
having used this and tested it extensively in various (SFW) Silly Tavern Scenario role plays I have to say its one of the best !! its fast accurate and creative for its size (12B Q8 12.5Gb). Sadly its core base model training data finished Feb2024 a small price we have to pay for using open source models but one that gets increasing frustrating.
Thanks so much!!! I appreciate the feedback and encouragement!
I've been experimenting with finetuning some recent events in, however, i know you typically need a massive amount of data and information to do so.
I typically have been using Z.ai GLM reasoning with my models, but surprisingly, Gemini 3 reasoning has been way more efficient and just as effective.
Learning and understanding fine tuning mechanics is a critical core skill as much as experiment with and create your own unique working models. Whether your merging various models, integrating new datasets just being able to do it successfully puts you right up there at the fore front of development. The future demand for fine tuned specialist models is potentially going to be huge. Not everyone wants or can run full base models that need multiple servers for inference and your 12B Q8 fits nicely on any 16Gb GPU and even runs smoothly on my GPU which is an old Vega64 with 8Gb of HBM2 memory. I am in limbo right now as I was waiting for the NVidia 5xxxx super models which have been cancelled so unsure what I should, stick with what I have for now as an upgrade to 16Gb is almost pointless. One of those 128Gb mini API systems is an option. The thing is with Ai we are all learning new tricks all the time... makes life interesting and fun
100%! And yes, That's what I've been targeting, end-level users like us who have 16-24gb GPUs. I want the majority of my models to support those devices but also deliver high-quality results.
I'm also starting to experiment with 4B models for mobile use. I like the idea of having an offline solution/availability while on-the-go. Hard to get them reliable with specific info yet, but I've been making decent progress with them regardless. One thing is that they have their safeguards BAKED into them, and it's hard to pull apart because there's a ton less parameters. So I'm working on a chain process that breaks it down to at least 90% gone.