Failed To Load Model Within LM Studio

#1
by iiiTRONiii - opened

Is this compatible with LM Studio or do I have to follow the llama.cpp instructions given? I was searching around the web before creating this discussion and it made it seem like LM Studio has llama.cpp already included and any GGUF file should work. I get this error message...

"🥲 Failed to load the model

Error loading model.

(Exit code: null). Please check settings and try loading the model again."

Sorry for the mass amount of junk models on my profile. Huggingface recently started charging more for private storage so to keep cost low I've been setting experimental repos to public.

If you're using LMStudio, try darkc0de/Agent.Xortron-Q5_K_M-GGUF

Its still very experimental, and will likley be replaced within the next 12-72 hours but it's probably along the lines of what you're looking for.

Keep in mind you need 24GB+ to run this with decent context size

I have NVIDIA RTX 5090 D 32GB GPU and Intel® Core™ Ultra 9 275HX × 24 with 64GB RAM. But in LM Studio it says my VRAM is 23.42. I don't have any issues running other models that are even higher in GB. Is there a way I can tweak the settings because I'm very close to 24GB VRAM. So far I really like your work as far as the chatbot running on spaces goes. Also, I have tried 3 different models of yours and they haven't been able to be ran in LM Studio. I was aiming for the Q5_K_M model.

UPDATE: I managed to tweak the settings a little bit in LM Studio, I got it to run, but it was being overloaded a bit. So I guess I can't risk it. Will you be making any models that can work with 24GB VRAM? It's funny my VRAM is 23.42... so close. Here is a screenshot of the Settings I changed in LM Studio to get it to run, but as you can see the CPU was overloaded of course.
SS-001

I'm downloading Darkc0de/Agent.Xortron-Q5_K_M-GGUF now as you suggested. Will let you know how it goes. What are the differences in your models? I was interested in this model. Is the agent one the same but agent focused? Do I lose anything from this model here? You don't really have descriptions to go off of other than the titles. Is the agent model the same as this? Sorry for asking so many questions.

UPDATE: So it won't run with default Settings in LM Studio. But if I change the Settings to the one's on the screenshot it runs, but with overloaded CPU.

Something is very wrong with the way you're loading this model.

  1. You have 23.41GB of VRAM available, that's not your total VRAM, obviously.

  2. Based on your VRAM, you should be able to load this model entirely in the VRAM, with KV cache in VRAM too, and a max context of at least 20.000 tokens and still have room to spare. LM Studio's auto detection of settings is terrible. Set guardrails to relaxed (or even OFF), put the KV cache in VRAM. Load all the layers in the GPU, make sure you've set max context properly and manually. Check VRAM usage in your task manager to see occupancy, that way you can tweak max context up/down yourself. It's a good policy to keep about 1GB of VRAM free after a model is loaded to avoid slow downs due to paging.

@SerialKicked
Can you tell from the screenshot what is wrong with how I'm loading it? Everything is default except for the changes I made in the screenshot. How do I know what my VRAM even is? When I changed the guardrails to balanced that's when my CPU started overloading (in the screenshot). What are the proper settings I should have? I'm able to run a lot of other models just fine up to 25GB in filesize (if that matters). Here are some screenshots maybe it will help. I thought my RAM was 64GB but LM Studio confused me by showing 24GB or technically 23.42GB.

SS-002
SS-003
SS-004
SS-005
SS-006
SS-007

image

I always limit GPU layers, with your 5090 shouldn't be the issue but still good practice.

What distro? Is LMStudio a Appimage or .deb?

It's an AppImage... could that also be the issue? Appreciate the feedback. I'm new to running larger models so it's a learning curve.

UPDATE: I did manage to get it to run, just the CPU gets up to 500% at times and runs a little hot, but I have the external fan all the way up. I'm grateful you have the chat bot running on Spaces on here. I'll just have to use that until I get rich.

I use Parrot security 7.1, I had issues with Appimage for LMStudio.

Delete all traces of LMStudio's Appimage and the files it creates, reboot, install .deb via package manager, reboot. Then everything worked well for me. Also automatically puts LMStudio neatly in the app list with icon and all (kde)

Thanks for the advice, I'll do that. Do you use Parrot Security as your daily driver? I was thinking about it. I use Fedora currently.

Daily drivers are Parrot Security 7.1 and Windows X-Lite Optimum 11 25H2 v2

I'll consider that option. I dual boot as well. I'll test it out in VM. Thank you for your work with your Ai, I've been testing it out. It helped me figure out what the Chinese was doing in the past (my competition). I knew they had a trick and your Ai exposed it. That is why uncensored Ai is important, it's a learning tool. To learn what is hidden or hard to find.

Sign up or log in to comment