This is a hybrid quantization of the language model using Q5_K_M. The embeddings are quantized to q8_0, the first three and last three layers are in q8_0, and the fourth layers from both the beginning and the end are in q6_K. Since the embeddings, as well as the first and last layers, are the most sensitive to quantization, I decided to quantize them less. This will allow achieving higher-quality responses from the AI at the cost of a slight increase in weight. The quantization was done using llama.cpp version b8591 with this config:

^token_embd\.weight=q8_0
^output\.weight=q8_0
blk\.[0-2]\..*=q8_0
blk\.6[1-3]\..*=q8_0
blk\.3\..*=q6_K
blk\.60\..*=q6_K

Downloads last month: 822

GGUF

Model size

27B params

Architecture

qwen35

Hardware compatibility

5-bit

Model tree for FrescoHF/TQ3.5-27B-Musica-v1-Hybrid-GGUF

Base model

Qwen/Qwen3.5-27B

Finetuned

ArliAI/Qwen3.5-27B-Derestricted

Finetuned

AuriAetherwiing/TQ3.5-27B-Musica-v1

Quantized

(6)

this model