Huihui-Qwen3.5-27B-abliterated — CT-W4A16 (text-only + MTP)

4-bit weight-only quantization of huihui-ai/Huihui-Qwen3.5-27B-abliterated.

Text-only: vision encoder removed to save 0.9 GB VRAM.
MTP preserved: speculative decoding works out of the box.

Specs

Property	Value
Base model	Qwen3.5-27B (abliterated)
Quantization	W4A16 (compressed-tensors, group_size=128)
Size on disk	18.5 GB
vs GPTQ W4A16	-1.0 GB (no vision encoder)
MTP head	Included (BF16, 0.85 GB)
Vision encoder	Removed
Calibration	128 samples, Pile validation

Usage with vLLM (RTX 5090 / 32GB)

python -m vllm.entrypoints.openai.api_server \
    --model j-a-a-a-y/Huihui-Qwen3.5-27B-abliterated-CT-W4A16 \
    --served-model-name qwen3.5-27b \
    --dtype float16 \
    --max-model-len 1024 \
    --max-num-seqs 4 \
    --gpu-memory-utilization 0.97 \
    --speculative-config '{"method": "mtp", "num_speculative_tokens": 5}' \
    --performance-mode interactivity

Benchmarks (RTX 5090, MTP=5 + interactivity)

Metric	GPTQ W4A16	This model
Single 256 tok	149 tok/s	TBD
Single 512 tok	131 tok/s	TBD
Batch=4 agg	410 tok/s	TBD
MTP acceptance	50%	TBD
Available KV cache	6.35 GiB	~7.3 GiB (est.)

Downloads last month: 52

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for j-a-a-a-y/Huihui-Qwen3.5-27B-abliterated-CT-W4A16

Base model

Qwen/Qwen3.5-27B

Finetuned

huihui-ai/Huihui-Qwen3.5-27B-abliterated

Quantized

(17)

this model