Repetition galore ... The simple answer: Without being overly specific, the simple answer

by chrisoutwright - opened Feb 22

Feb 22

cpp llama used:

@echo off
"D:\llama-b7951-bin-win-cuda-13.1-x64\llama-server.exe" ^
-hf "unsloth/Apertus-70B-Instruct-2509-GGUF:UD-Q3_K_XL" ^
--alias "Apertus-70B-Instruct:UD-Q3_K_XL" ^
--n-gpu-layers -1 ^
--flash-attn on ^
--cache-type-k q8_0 ^
--cache-type-v q8_0 ^
--ctx-size 60536 ^
--batch-size 1024 ^
--ubatch-size 512 ^
--threads 8 ^
--kv-offload ^
--op-offload ^
--fit off ^
--parallel 1 ^
--host 0.0.0.0 ^
--port 11434 ^
--seed 3407 ^
--temp 1.0 ^
--top-p 0.95 ^
--min-p 0.01 ^
--top-k 40 ^
--chat-template-file "C:\Users\Chris\Desktop\llm_scripts\cpp_llama\apertus_chat_template.jinja" ^
--jinja
pause

template as given in https://huggingface.co/swiss-ai/Apertus-70B-Instruct-2509/blob/main/chat_template.jinja

what is wrong?

why is it repeating strangely?
Also it seems not to stop ..

chrisoutwright changed discussion title from Reptition galore ... The simple answer: Without being overly specific, the simple answer to Repetition galore ... The simple answer: Without being overly specific, the simple answer Feb 22

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment