gguf

#2
by KnutJaegersberg - opened

would help distribution to make a gguf version as well. I tried with the gguf my repo tool but it throws an exception. thought it would just work as it's a llama.

Since this fine tune is classic one - no new added layers etc, its 1:1 original(except weights and tokenizer content + embeddings), i need to look how unsloth did it, or maybe we can ask them politely to convert this fine tune. Also i have new RAG finetuned version of this model, ill upload when ill have spare time, its way better at following instructions.

I doubt they will add this to their catalog, but can assist whit converting.

Im still waiting for something smaller, running 30B dense model on consumer hardware is not a easy task, MoE sure.

Owner

I run this model in workflow/agentic environment where humans are waiting for output on VLLM 4xA40 GPU node. So thats not a consumer hardware.

Yes, pushing consumer hardware limits on these models are quite an adventure alas.

I have found that conversion to gguf when using llamma.cpp convert_hf_to_gguf.py is failing due to unrecognized 'no' in Readme.md

language: [en, de, fr, es, it, pt, nl, pl, lv, et, lt, cs, sk, ro, bg, sl, hr, sv, da, fi, hu, uk, ru, zh, hi, ja, ko, el, no]

Probably should be:

language: [en, de, fr, es, it, pt, nl, pl, lv, et, lt, cs, sk, ro, bg, sl, hr, sv, da, fi, hu, uk, ru, zh, hi, ja, ko, el, nb]

I have found that conversion to gguf when using llamma.cpp convert_hf_to_gguf.py is failing due to unrecognized 'no' in Readme.md

language: [en, de, fr, es, it, pt, nl, pl, lv, et, lt, cs, sk, ro, bg, sl, hr, sv, da, fi, hu, uk, ru, zh, hi, ja, ko, el, no]

Probably should be:

language: [en, de, fr, es, it, pt, nl, pl, lv, et, lt, cs, sk, ro, bg, sl, hr, sv, da, fi, hu, uk, ru, zh, hi, ja, ko, el, nb]

Thank You! Easy fix!

KnutJaegersberg changed discussion status to closed

I will perhaps check out how tokenizer behaves in this conversion - it uses slow one or something else, since even if its broken, it sort of predicts somewhat plausible, but degraded output tokens.

Sign up or log in to comment