Qwen3.5_2B_Abiliterate_All_Layers_Baked_GGUF

GGUF companion release for Qwen3.5_2B_Abiliterate_All_Layers_Baked.

Files

Quant	Size	Use when	Tradeoff
`Q2_K`	`0.85 GB`	You need the smallest possible file and can tolerate a clear quality drop	Lowest memory use, weakest output quality
`Q3_K_S`	`0.95 GB`	You are below 8 GB RAM/VRAM and want a small step up from `Q2_K`	Still a noticeable quality hit
`Q3_K_M`	`1.02 GB`	You want the best low-end compromise for constrained devices	Better than `Q3_K_S`, still compressed hard
`Q3_K_L`	`1.06 GB`	You want a slightly safer `Q3` choice without moving into `Q4`	Marginally larger for a modest gain
`Q4_K_S`	`1.12 GB`	You want a compact everyday quant and are optimizing for size first	Good balance, a bit weaker than `Q4_K_M`
`Q4_K_M`	`1.18 GB`	You want the standard balanced option for general local use	Best default size/quality compromise for many setups
`Q5_K_S`	`1.28 GB`	You have more headroom and want to preserve quality better than `Q4`	Larger file for a smaller quality jump
`Q5_K_M`	`1.33 GB`	You want a strong general-purpose quant without going near full precision	Best practical choice if RAM/VRAM is not very tight
`Q6_K`	`1.45 GB`	You want near-high-quality local inference and can afford the extra memory	Larger and slower than `Q5_K_M`
`Q8_0`	`1.87 GB`	You want to stay as close as possible to `f16` while still using GGUF quantization	Highest quality quant here, but much heavier
`f16`	`3.52 GB`	You want the least quantization loss and have plenty of memory/storage	Largest file by a wide margin

local.bat

To run a specific quant without editing the script:

set MODEL=qwen3.5_2b_abiliterate_all_layers_baked.Q5_K_M.gguf
local.bat

Source HF: https://huggingface.co/amkkk/qwen3.5-2B-abliterated-alllayers
This is from the checkpoint build which performed even better than the runtime hook ablated variant that I have uploaded here

GGUF

Model size

2B params

Architecture

qwen35

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit