Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic-GGUF
GGUF quantizations of ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic.
Qwen3.5-27B with Claude Opus 4.6 reasoning distillation, abliterated via Heretic.
Available Quants
| Filename | Quant | Size | BPW | Notes |
|---|---|---|---|---|
| heretic-27b-Q4_K_M.gguf | Q4_K_M | ~15.4 GB | 4.92 | Recommended โ best balance of quality and VRAM |
VRAM Requirements (Q4_K_M)
| Context | VRAM | Fits on |
|---|---|---|
| 16K | ~18 GB | RTX 3090, 4090, A5000 |
| 32K | ~19 GB | RTX 3090 Ti, A6000 |
| 65K | ~21 GB | RTX 3090 Ti, A6000 |
Qwen3.5's hybrid DeltaNet architecture means KV cache only covers ~25% of layers. Context scaling is very VRAM-efficient compared to pure transformer models.
Usage with llama.cpp
llama-server \
-m heretic-27b-Q4_K_M.gguf \
-ngl 99 \
--ctx-size 16384 \
--flash-attn on \
--jinja
With vision (mmproj)
Build the mmproj from the base model weights:
python convert_hf_to_gguf.py \
ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic \
--mmproj --outfile heretic-27b-mmproj-F16.gguf
llama-server \
-m heretic-27b-Q4_K_M.gguf \
--mmproj heretic-27b-mmproj-F16.gguf \
-ngl 99 --ctx-size 16384 --flash-attn on --jinja
Recommended settings
temperature: 0.6
top_p: 0.95
top_k: 20
presence_penalty: 1.5
repetition_penalty: 1.05
Abliteration Stats
- Tool: Heretic v1.2.0
- Refusals: 13/100
- KL Divergence: 1264
Architecture
Qwen3.5 hybrid Gated DeltaNet + conventional attention. 64 layers in 3:1 pattern. 262K native context. Native multimodal vision. See the full model card for details.
Made by
Ghost โ ghost-actual
- Downloads last month
- 316
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for ghost-actual/Qwen3.5-27B-Claude-Opus-4.6-Distilled-heretic-GGUF
Base model
Qwen/Qwen3.5-27B