This is a hybrid quantization of the language model using Q5_K_M. The embeddings are quantized to q8_0, the first three and last three layers are in q8_0, and the fourth layers from both the beginning and the end are in q6_K. Since the embeddings, as well as the first and last layers, are the most sensitive to quantization, I decided to quantize them less. This will allow achieving higher-quality responses from the AI at the cost of a slight increase in weight. The quantization was done using llama.cpp version b8591 with this config:
^token_embd\.weight=q8_0
^output\.weight=q8_0
blk\.[0-2]\..*=q8_0
blk\.6[1-3]\..*=q8_0
blk\.3\..*=q6_K
blk\.60\..*=q6_K
- Downloads last month
- 822
Hardware compatibility
Log In to add your hardware
5-bit
Model tree for FrescoHF/TQ3.5-27B-Musica-v1-Hybrid-GGUF
Base model
Qwen/Qwen3.5-27B Finetuned
ArliAI/Qwen3.5-27B-Derestricted Finetuned
AuriAetherwiing/TQ3.5-27B-Musica-v1