appears NSFW uncensored and really good for role play and stories

#2
by uselessoldman - opened

From initial testing this Ai model is really quite special. Training stopped Dec 2024 so its also one of the most recent available on open source. Running it in LM Studio with Silly Tavern Q8 9.46Gb on my Vega64 card (AM4 5600x 64Gb Ram) at 65% GPU and everything else off loaded and its surprisingly fast. Really good at following prompts although generated narrate can sometimes be a bit too long (nothing new there then?) so you need to be extra specific with your instructions which is to be expected. Thus far "no refusals" which says a lot when your messing about with Silly Tavern and some Chub AI cards

From ur YT comment || https://www.youtube.com/watch?v=RyGEn3qGuQU
Just downloaded this model and I have to say its seriously impressive...

Quick question: can this model run on Ollama?

Specs:
GPU: NVIDIA RTX 4060 (8GB VRAM)
CPU: AMD Ryzen 5 8000 series
RAM: 16GB DDR5 @ 5600MT/s

its a gguf file format so you can run it in anything that will inference. I use LM Studio but Ollama also uses the llama.cpp driver so there is no reason why not. The only comment I would make is that 16Gb Ram is going to seriously limit the size of Ai models you can run since you can offload the K/V cache to 8Gb GPU whilst running larger AI models in Ram/GPU. It varies from model to model some do not like running on GPU/Ram but they are the minority and sadly its trial and error since contributors do not test their merged models on consumer hardware. There is a performance/quality balance when you have limited hardware, speed/accuracy/quality for me running the Vega64 8Gb I can run many 11/16Gb Models in either Q8 or Q6 but you would need more Ram. That is why I upgraded to 64Gb Ram simply because I could not justify upgrading my GPU until the new GPU cards are released, maybe the 5070TI Super (early 2026) or the next gen AMD card (sometime in 2027). I do not see the point of buying a GPU with less than 24Gb Ram AMD RX 7900xtx? too old too slow and no CUDA support.

Yes, I already tested it on LM Studio. It seems that Ollama might not update/support this model architecture yet?

This model also struggles with certain prompts, maybe because my limited hardware? For example, when I asked it to:

Spell out 999999999999999999999999999999

it took about 10 minutes to respond, running at ~9 tokens per second and generating around 5,600 tokens.
In the middle process, it drifted off-topic, produced hallucinations, and even looped unrelated content instead of focusing on the prompt.

It was my doctor who suggested I used Ai for medical advice as I had issues over the guidance/advice given to me by so called specialists. Long story short that got me into Ai inference and why I use Silly Tavern, I can create character cards to acted as medical consultants and virtual specialists (I have had operations for Ulcerative Colitis and Pancreatic Cancer one of only a few in the UK to have had both stomach/digestive operations). As I experimented with various models I soon realised Q6 was the sweet point and Q8 could although not always be better and Q4 worked ok on some models but others it was crap.

Anyways I fast fell in love with Ai and that started my journey to where I am today. Looking into the future I started to upgrade my computer with the view of where I wanted to be. I have old servers (HP ML350) but there now too old to run Ai as they do not support AVX2 which is a pisser since they have 2x Xeon and each have 128Gb Ram and a couple of FX8350 computers (for music production) also too damn old so that left me with the newer AMD4 5600x. It originally had 32Gb so I whacked in 64Gb cos it was the cheapest upgrade then I upgraded NVMe to 2x 2Tb (Samsung 990 Pro) cos its so damn fast over SSD by 5600x CPU is fine cos it rarely gets to 100% but yeah my GPU is screwed. I configured the system to share all 64Gb with the GPU so that helped a lot since it loads about 7.2Gb into the GPU leaving just enough for computation. I refuse to buy a GPU with less than 24Gb for UHD video so that limits me somewhat and for video generation it must support CUDA (RTX 3090 (too slow), 4090 (maybe but price?), 5090 (seriously?) and some silly money pro cards). SO I am waiting for the 5070 Ti Super to make its appearance which is the card I will buy. I just bought one of those expansion cards that takes 4x NVMe on a PCIe slot, so far its full of 1Tb but will increase to 2Tb each when funds allow. MY computers are already full of SATA 8TB drives for music production, mainly Kontakt libraries and Toontrack for drums and bass.

Creating character cards for Silly Tavern made me understand how important even critical it is to give the Ai model the info it needs to generate the info you want/expect back. Like walking into a library and you know the exact page in a specific book you want but you are blind or none of the books have any titles on them. Using Silly Tavern is not like having an on line chat with chat GPT or any other model its way more sophisticated its not even like having a chat using Ollama, anythingLLM, Perplexity or LM Studio either. In Silly Tavern you can set the context prompt, persona, system prompt, context style its so flexible sure it can be a little confusing at first, but when you get it just right its wickedly fun. Then you have the various Ai models, and OMG which one is best depends on what you want from it, there all so damn different. Then you have mixed models and others with added datasets some work great on Ram/8GbGPU some not so especially once you get past models over 16Gb. So with a single GPU with 24Gb I recon I should be able to run some models upto about 60Gb hence why I upgraded to 64Gb Ram cos that means I should in theory be able to run 80Gb models which means I should be able to run F16! that means I can train/fine tune my own models with my own custom datasets.

64gb Ram + 24Gb GDDR/GPU = 84Gb so I recon 80Gb will be possible.

Sign up or log in to comment