32k?
#1
by sau-mil - opened
first of all, this you for this, I wanted to ask maximum context windows would be 32k since im using 128k also sorry for a newbie question but where can I get Q8 file for this so I can use it on my llamacpp on amd.
max context window still the same as the original/native, but for the output is 32k (recommended)
im still uploading quantization variants