This model cannot be converted using llama.cpp quantization

by jab0688 - opened Jan 13

Jan 13

This model cannot be converted using llama.cpp quantization, as it reports that the corresponding tokenizer.model file cannot be found.

python.exe convert-hf-to-gguf.py ./mymodel --outfile ll33.gguf --outtype f16
Loading model: mymodel
gguf: This GGUF file is for Little Endian only
Set model parameters
gguf: context length = 8192
gguf: embedding length = 4096
gguf: feed forward length = 14336
gguf: head count = 32
gguf: key-value head count = 8
gguf: rope theta = 500000.0
gguf: rms norm epsilon = 1e-05
gguf: file type = 1
Set model tokenizer
Error: Missing mymodel\tokenizer.model

inflatebot

Allura (Forge) org Jan 14

https://huggingface.co/bartowski/allura-forge_Llama-3.3-8B-Instruct-GGUF

EnlistedGhost

15 days ago

•

edited 15 days ago

(.env) sera@AURORA-R11-GH44:~> python3 ~/llama.cpp_3/convert_hf_to_gguf.py /run/media/hf-models/Llama-3.3-8B-Base --outfile ~/LLM-Quants/Llama-3.3-8B-Instruct-F16.gguf --outtype f16
INFO:hf-to-gguf:Loading model: Llama-3.3-8B-Base
INFO:hf-to-gguf:Model architecture: LlamaForCausalLM
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: indexing model part 'model-00001-of-00004.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00002-of-00004.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00003-of-00004.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00004-of-00004.safetensors'
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:token_embd.weight,           torch.bfloat16 --> F16, shape = {4096, 128256}
INFO:hf-to-gguf:blk.0.attn_norm.weight,      torch.bfloat16 --> F32, shape = {4096}
[ Stripped all the weights to save room]
INFO:hf-to-gguf:output_norm.weight,          torch.bfloat16 --> F32, shape = {4096}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 8192
INFO:hf-to-gguf:gguf: embedding length = 4096
INFO:hf-to-gguf:gguf: feed forward length = 14336
INFO:hf-to-gguf:gguf: head count = 32
INFO:hf-to-gguf:gguf: key-value head count = 8
WARNING:hf-to-gguf:Unknown RoPE type: default
INFO:hf-to-gguf:gguf: rope scaling type = NONE
INFO:hf-to-gguf:gguf: rope theta = 500000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: file type = 1
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
WARNING:gguf.vocab:Unknown separator token '<|begin_of_text|>' in TemplateProcessing<pair>
INFO:gguf.vocab:Adding 280147 merge(s).
INFO:gguf.vocab:Setting special token type bos to 128000
INFO:gguf.vocab:Setting special token type eos to 128009
INFO:gguf.vocab:Setting add_bos_token to True
INFO:gguf.vocab:Setting add_sep_token to False
INFO:gguf.vocab:Setting chat_template to {% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>
'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>
' }}{% endif %}
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:~/LLM-Quants/Llama-3.3-8B-Instruct-F16.gguf: n_tensors = 291, total_size = 16.1G
Writing:  31%|██████████████████████████▉                                                            | 4.97G/16.1G [01:48<03:37, 51.1Mbyte/s]

Hmm, it converted for me just fine, I can supply my config files but I swear I downloaded them from here.
Let me know - it is an AWESOME model!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment