32k?

#1
by sau-mil - opened

first of all, this you for this, I wanted to ask maximum context windows would be 32k since im using 128k also sorry for a newbie question but where can I get Q8 file for this so I can use it on my llamacpp on amd.

max context window still the same as the original/native, but for the output is 32k (recommended)

Owner

im still uploading quantization variants

Owner

hey @sau-mil i cant provide a 8bit variant in bnb (non-gguf) due it bugged

Sign up or log in to comment