Any plans on trying to make Q1 or Q2 models?

by 19440harry - opened 15 days ago

19440harry

Hey, your quantized models are really good and I was wondering if you could try reducing the model to Q1 and Q2 sized model. If the quality persists then this model becomes accessible to individuals with low VRAM availablity.

h34v7

Owner 15 days ago

Sure if you need it i could make it. But expect pretty severre perplexity degradation it is unavoidable. i heard 1bit bonsai and 2bit turboquant are good. but... i had no idea how to make them. 1bit bonsai need retraining meanwhile turboquant 2bit infra wasn't there yet... so yeah... maybe regular and imatrix quant i could upload

19440harry

15 days ago

That would be great, I am curious about the performance since your quantizations so far have been better than most I have seen and I am curious if it could translate in the smaller quants. I am aware you use the Ik_llama fork for the IQ models, however I dont think turboquant has been integrated into the ik llama fork yet, quite new to this so I am not sure if it could be at all.
Regardless, an IQ1/2 would be great if you could try to make that happen.

Thank you for your work.

h34v7

Owner 15 days ago

•

edited 15 days ago

Some after some test 1-bit and 2-bit simply wasn't enough bit to fit any weight. This is the result of a simple "Hi" test at f16 kv cache.

> hi                                                                                                      formatted: <|im_start|>user                                                                               hi                                                                                                        <|im_end|>                                                                                                <|im_start|>assistant                                                                                                                                                                                               <think>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             The</think></think></think></think></think></think></think></think></think></think></think></think></think></think></think>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ##HiHelloHelloIHelloHello</think>HelloHello</think>#HiHelloHihiformatted:                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             The</think></think></think></think></think></think></think></think></think></think></think></think></think></think></think>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           ##HiHelloHelloIHelloHello</think>HelloHello</think>#HiHelloHihi<|im_end|>                                                                                                                                                                                                                                                                                                                                                               >

Quant: IQ2_XS

19440harry

15 days ago

Ahh so no IQ1 or IQ2, thats unfortunate, thank you for attempting though, I really appreciate the trial and the very quick response.

h34v7

Owner 15 days ago

Yeah, unfortunately. Those stuff are too technical for me to handle. Thanks for swinging by

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment