No logs, no context, no tracebacks. What inference engine are you using (llama.cpp/vLLM)? What's your VRAM? If you ran this 31B NF4 on a toaster without offloading layers, expect burnt toast. Drop the specs or I'm closing the issue.
· Sign up or log in to comment