This is a hybrid quantization of the language model using Q5_K_M. The embeddings are quantized to q8_0, the first three and last three layers are in q8_0, and the fourth layers from both the beginning and the end are in q6_K. Since the embeddings, as well as the first and last layers, are the most sensitive to quantization, I decided to quantize them less. This will allow achieving higher-quality responses from the AI at the cost of a slight increase in weight. The quantization was done using llama.cpp version b8591 with this config:

^token_embd\.weight=q8_0
^output\.weight=q8_0
blk\.[0-2]\..*=q8_0
blk\.6[1-3]\..*=q8_0
blk\.3\..*=q6_K
blk\.60\..*=q6_K
Downloads last month
822
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for FrescoHF/TQ3.5-27B-Musica-v1-Hybrid-GGUF

Base model

Qwen/Qwen3.5-27B
Quantized
(6)
this model