IQ4_K

#2
by SFPLM - opened

Hey @ubergarm thanks again for the latest Deepseek quants. May I ask if it is possible to once more to do IQ4K or a "High 4 bpw" sort of something between the high 5 bit and low 4bit option you currently have?

Yeah its almost done quanting now, I'll validate it and update the README with exact size details and it'll be next in the upload queue!

382.485 GiB (4.896 BPW)

Thanks for your quants. I dont know how to express but the expirence of these quants + ik_llama.cpp is truly amazing and I really appreciate you doing this. even if DRAM+VRAM inference isnt API speeds nor agentic coding speeds, still usable for noncoding. Feel great.

Has anyone happened to encounter an issue where sometimes it crashes to reboot (I am on Ubuntu 24.04) it always seems to happen the only things I notice is that it seems to be in the middle of prompt processing, and it also has the fans spinning super high before crashing into reboot. Although I am not the most knowledgeable on inference details, I wonder if it either is kernel related or its KV cache getting mixed up or something Memory related. sometimes I like to swap into a new conversation in OpenWebUI and this sometimes triggers it.

@SFPLM

so your rig is hard-locking/rebooting during inference? hrm, could be memory instability possibly (not sure if you've overclocked RAM or what speeds you're running at)

you can check the output of dmesg -T while running, and there are some journalctl commands to see the dmesg from previous boot which might give a clue if you are somehow OOMing suddnely (though shouldn't result in reboot)

i forget your exact OS, CPU, RAM, GPU(s) but at first impression it sounds like a hardware issue and if you see nothing dmesg then might want to use something like netdata or another tool(s) to monitor temperatures and disk i/o etc to see if u notice a pattern before it reboots

hmm will keep that in mind when it starts to act up. has not acted yet but may soon.
its basically the Ubuntu 24.04, Xeon QYFS Eng. Sample, 512 GB, RTX 6000 Blackwell. Other than that I know 2 other people here have the "QYFS + Sage SE" combo. I did put C state at C1/C0 i think.

Sign up or log in to comment