HyperNova-60B.IQ4_XS crashing on newest llama.cpp

#1
by Olafangensan - opened

llama-server.exe -m "HyperNova-60B.IQ4_XS.gguf" --port 5000 --host 127.0.0.1 --n-cpu-moe 11 -ngl 99 -a "HyperNova 60B" --no-warmup --ctx-size 32768 -ub 2048 -b 2048 -fa on -ctk q4_0 -ctv q4_0 -fit off

main: model loaded
main: server is listening on http://127.0.0.1:5000
main: starting the main loop...
srv  update_slots: all slots are idle
srv  log_server_r: request: GET / 127.0.0.1 200
srv  log_server_r: request: GET /props 127.0.0.1 200
srv  log_server_r: request: GET /props 127.0.0.1 200
srv  log_server_r: request: GET /props 127.0.0.1 200
srv  log_server_r: request: GET /v1/models 127.0.0.1 200
srv  params_from_: Chat format: GPT-OSS
slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = -1
slot launch_slot_: id  3 | task -1 | sampler chain: logits -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
slot launch_slot_: id  3 | task 0 | processing task
slot update_slots: id  3 | task 0 | new prompt, n_ctx_slot = 32768, n_keep = 0, task.n_tokens = 71
slot update_slots: id  3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
slot update_slots: id  3 | task 0 | prompt processing progress, n_tokens = 7, batch.n_tokens = 7, progress = 0.098592
D:\a\llama.cpp\llama.cpp\ggml\src\ggml-cuda\topk-moe.cu:226: GGML_ASSERT(false && "fatal error") failed

Sign up or log in to comment