Repetition galore ... The simple answer: Without being overly specific, the simple answer
cpp llama used:
@echo off
"D:\llama-b7951-bin-win-cuda-13.1-x64\llama-server.exe" ^
-hf "unsloth/Apertus-70B-Instruct-2509-GGUF:UD-Q3_K_XL" ^
--alias "Apertus-70B-Instruct:UD-Q3_K_XL" ^
--n-gpu-layers -1 ^
--flash-attn on ^
--cache-type-k q8_0 ^
--cache-type-v q8_0 ^
--ctx-size 60536 ^
--batch-size 1024 ^
--ubatch-size 512 ^
--threads 8 ^
--kv-offload ^
--op-offload ^
--fit off ^
--parallel 1 ^
--host 0.0.0.0 ^
--port 11434 ^
--seed 3407 ^
--temp 1.0 ^
--top-p 0.95 ^
--min-p 0.01 ^
--top-k 40 ^
--chat-template-file "C:\Users\Chris\Desktop\llm_scripts\cpp_llama\apertus_chat_template.jinja" ^
--jinja
pause
template as given in https://huggingface.co/swiss-ai/Apertus-70B-Instruct-2509/blob/main/chat_template.jinja
what is wrong?
why is it repeating strangely?
Also it seems not to stop ..
