Huihui-Qwen3.5-27B-abliterated โ CT-W4A16 (text-only + MTP)
4-bit weight-only quantization of huihui-ai/Huihui-Qwen3.5-27B-abliterated.
Text-only: vision encoder removed to save 0.9 GB VRAM.
MTP preserved: speculative decoding works out of the box.
Specs
| Property | Value |
|---|---|
| Base model | Qwen3.5-27B (abliterated) |
| Quantization | W4A16 (compressed-tensors, group_size=128) |
| Size on disk | 18.5 GB |
| vs GPTQ W4A16 | -1.0 GB (no vision encoder) |
| MTP head | Included (BF16, 0.85 GB) |
| Vision encoder | Removed |
| Calibration | 128 samples, Pile validation |
Usage with vLLM (RTX 5090 / 32GB)
python -m vllm.entrypoints.openai.api_server \
--model j-a-a-a-y/Huihui-Qwen3.5-27B-abliterated-CT-W4A16 \
--served-model-name qwen3.5-27b \
--dtype float16 \
--max-model-len 1024 \
--max-num-seqs 4 \
--gpu-memory-utilization 0.97 \
--speculative-config '{"method": "mtp", "num_speculative_tokens": 5}' \
--performance-mode interactivity
Benchmarks (RTX 5090, MTP=5 + interactivity)
| Metric | GPTQ W4A16 | This model |
|---|---|---|
| Single 256 tok | 149 tok/s | TBD |
| Single 512 tok | 131 tok/s | TBD |
| Batch=4 agg | 410 tok/s | TBD |
| MTP acceptance | 50% | TBD |
| Available KV cache | 6.35 GiB | ~7.3 GiB (est.) |
- Downloads last month
- 52
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for j-a-a-a-y/Huihui-Qwen3.5-27B-abliterated-CT-W4A16
Base model
Qwen/Qwen3.5-27B Finetuned
huihui-ai/Huihui-Qwen3.5-27B-abliterated