Image-Text-to-Text
Transformers
Safetensors
qwen3_5
compressed-tensors
qwen3_6
int4
int8
mixed
autoround
conversational
4-bit precision
auto-round
# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText
processor = AutoProcessor.from_pretrained("Minachist/Qwen3.6-27B-Mixed-AutoRound")
model = AutoModelForImageTextToText.from_pretrained("Minachist/Qwen3.6-27B-Mixed-AutoRound")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
{"type": "text", "text": "What animal is on the candy?"}
]
},
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))Quick Links
Qwen3.6-27B Mixed AutoRound
This is an unofficial quantized version of the Qwen3.6-27B. It was created using AutoRound with a custom mixed-precision recipe.
Quantization details
- This model uses a mixed-precision quantization to balance performance and model size.
- The
self_attnlayers are quantized to 8-bit. - The MLP layers are generally quantized to 4-bit, but the first 3 and last 3 layers are kept at 8-bit.
- The
lm_head,linear_attn,visual,mtp.fclayers are kept unquantized in FP16.
| Field | Custom Mixed Recipe |
|---|---|
| Base | Qwen/Qwen3.6-27B |
| Method | AutoRound (intel/auto-round), custom recipe |
| Scheme | Mixed (W4A16 / W8A16) |
| Bits | 4 & 8 |
| Group size | 128 |
| Symmetric | yes |
| Unquantized layers | lm_head, linear_attn, visual, mtp.fc |
| Calibration dataset | NeelNanda/pile-10k |
| Calibration samples | 512 |
| Sequence length | 2048 |
| Iterations | 1000 |
| Batch size | 8 |
| torch.compile | enabled |
- For more information, please check
quantize.py.
KLD Metrics
| Metric | Value | Description |
|---|---|---|
| Median KLD | 0.005592 | Median divergence |
| P90 KLD | 0.034514 | Divergence at the 90th percentile |
| Mean KLD | 0.046941 | Average divergence |
| Mean Coverage | 0.994750 | - |
Evaluation Configuration
| Parameter | Value |
|---|---|
| Calibration Dataset | wikitext-2-raw-v1 (test) |
| Sequence Length | 2048 |
| Num Samples | 64 |
| Total Positions | 131,008 |
| Top-K Reference | 1000 |
How to use
This model is tested on the latest
docker.io/vllm/vllm-openai:cu130-nightly.vLLM is recommended.
鈿狅笍 Important Note: Do NOT use
FLASHINFERas the attention backend (--attention-backend FLASHINFER), as it may cause compatibility issues for some people!Example args (For 2x 3090 Users) :
vllm serve ./Qwen3.6-27B-mixed-autoround \
--tensor-parallel-size 2 \
--attention-backend FLASH_ATTN \
--performance-mode interactivity \
--max-model-len auto \
--max-num-batched-tokens 2048 \
--max-num-seqs 1 \
--gpu-memory-utilization 0.96 \
--compilation-config '{"mode":"VLLM_COMPILE","cudagraph_capture_sizes":[4]}' \
-O3 \
--async-scheduling \
--language-model-only \
--tool-call-parser qwen3_coder \
--reasoning-parser qwen3 \
--enable-auto-tool-choice \
--speculative-config '{"method":"mtp","num_speculative_tokens":3}' \
--default-chat-template-kwargs.preserve_thinking true \
--mamba-cache-mode all \
--mamba-block-size 8 \
--enable-prefix-caching \
--enable-chunked-prefill
- With these settings, you get full context.
- Note: This information is based on current understanding and testing. Optimal configurations may vary depending on your specific hardware setup. For further details, please refer to the official vLLM documentation.
Acknowledgements
- Lorbus for the README.md format
- Alibaba / Qwen team for the base Qwen3.6-27B model
- Intel AutoRound team for the quantization framework
- vLLM project for the inference engine and Qwen3_5 MTP support
- Downloads last month
- 79
Model tree for Minachist/Qwen3.6-27B-Mixed-AutoRound
Base model
Qwen/Qwen3.6-27B
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Minachist/Qwen3.6-27B-Mixed-AutoRound") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)