32k?

by sau-mil - opened Mar 4

Mar 4

first of all, this you for this, I wanted to ask maximum context windows would be 32k since im using 128k also sorry for a newbie question but where can I get Q8 file for this so I can use it on my llamacpp on amd.

khtsly

Owner Mar 4

•

edited Mar 4

max context window still the same as the original/native, but for the output is 32k (recommended)

khtsly

Owner Mar 4

im still uploading quantization variants

khtsly

Owner Mar 4

hey @sau-mil i cant provide a 8bit variant in bnb (non-gguf) due it bugged

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment