how to use the gguf models instead of default model which is starts downloading when we use command - ggc h2 ?
I did installed all required packages with their repo but when i used the command which you mentioned it started downloading the original model without giving me any option to select any gguf model, so can you please guide me on this calcius
that's not the original model; which is f16, not bf16; make sure you could run it first; those gguf files work if you dequant it or convert it back to safetensors or pt/pth; at this stage, the gguf in this repo is for developers whom are willing to work on the engine; just be patient and wait
I did installed all required packages with their repo but when i used the command which you mentioned it started downloading the original model without giving me any option to select any gguf model, so can you please guide me on this calcius
use the new ggc h6 command; you can use gguf files now; see the updated model card (readme)
hello and thank you for your work!
I´ve tried out the quants, but they all use the same amount of vram (i think due to converting back to safetensors). Is it possible to change that behavior?
for small models, convertor engine should be sufficient; and it is the fastest engine; high tier engines might need proper code support and it will certainly scarify the loading speed in return
that's the only drawback and we think which is totally bearable for this case
Tried to install everything (ggc h6 gave many errors on missing packages before it finally started) but it fails in the end anyways:
ggufconnector@precision:~$ ggc h6
Using device: cuda
GGUF file(s) available. Select which one to use:
- higgs-audio-3b-q8_0.gguf
- Lexi-Llama-3-8B-Uncensored_Q5_K_M.gguf
Enter your choice (1 to 2): 1
Model file: higgs-audio-3b-q8_0.gguf is selected!
Prepare to dequantize: higgs-audio-3b-q8_0.gguf
Extracted 397 tensors from GGUF file
Dequantizing tensors: 100%|███████████████| 397/397 [00:47<00:00, 8.32tensor/s]
Traceback (most recent call last):
File "/home/ggufconnector/./.local/bin/ggc", line 8, in
sys.exit(init())
File "/home/ggufconnector/.local/lib/python3.10/site-packages/gguf_connector/init.py", line 276, in init
from gguf_connector import h6
File "/home/ggufconnector/.local/lib/python3.10/site-packages/gguf_connector/h6.py", line 94, in
convert_gguf_to_safetensors(selected_file_path, model_path, use_bf16)
File "/home/ggufconnector/.local/lib/python3.10/site-packages/gguf_connector/quant3.py", line 46, in convert_gguf_to_safetensors
save_file(tensors_dict, output_path, metadata=metadata)
File "/home/ggufconnector/.local/lib/python3.10/site-packages/safetensors/torch.py", line 307, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)
safetensors_rust.SafetensorError: Error while serializing: I/O error: No such file or directory (os error 2)
Suggestions?
requirements - file would be a first step.
Anyone willing to create one?
I guess that the line 307 is trying to save something in a folder that has not been created. I might be able to take a look at this again at a later time and see what folder needs to be created for this step to work.
check do you folks have the package safetensors or not; if not, pip install safetensors