How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Minachist/Qwen3.6-27B-Mixed-AutoRound")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)
# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("Minachist/Qwen3.6-27B-Mixed-AutoRound")
model = AutoModelForImageTextToText.from_pretrained("Minachist/Qwen3.6-27B-Mixed-AutoRound")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Quick Links

Qwen3.6-27B Mixed AutoRound

This is an unofficial quantized version of the Qwen3.6-27B. It was created using AutoRound with a custom mixed-precision recipe.

Quantization details

  • This model uses a mixed-precision quantization to balance performance and model size.
  • The self_attn layers are quantized to 8-bit.
  • The MLP layers are generally quantized to 4-bit, but the first 3 and last 3 layers are kept at 8-bit.
  • The lm_head, linear_attn, visual, mtp.fc layers are kept unquantized in FP16.
Field Custom Mixed Recipe
Base Qwen/Qwen3.6-27B
Method AutoRound (intel/auto-round), custom recipe
Scheme Mixed (W4A16 / W8A16)
Bits 4 & 8
Group size 128
Symmetric yes
Unquantized layers lm_head, linear_attn, visual, mtp.fc
Calibration dataset NeelNanda/pile-10k
Calibration samples 512
Sequence length 2048
Iterations 1000
Batch size 8
torch.compile enabled
  • For more information, please check quantize.py.

KLD Metrics

Metric Value Description
Median KLD 0.005592 Median divergence
P90 KLD 0.034514 Divergence at the 90th percentile
Mean KLD 0.046941 Average divergence
Mean Coverage 0.994750 -

Evaluation Configuration

Parameter Value
Calibration Dataset wikitext-2-raw-v1 (test)
Sequence Length 2048
Num Samples 64
Total Positions 131,008
Top-K Reference 1000

How to use

  • This model is tested on the latest docker.io/vllm/vllm-openai:cu130-nightly.

  • vLLM is recommended.

  • 鈿狅笍 Important Note: Do NOT use FLASHINFER as the attention backend (--attention-backend FLASHINFER), as it may cause compatibility issues for some people!

  • Example args (For 2x 3090 Users) :

vllm serve ./Qwen3.6-27B-mixed-autoround \
  --tensor-parallel-size 2 \
  --attention-backend FLASH_ATTN \
  --performance-mode interactivity \
  --max-model-len auto \
  --max-num-batched-tokens 2048 \
  --max-num-seqs 1 \
  --gpu-memory-utilization 0.96 \
  --compilation-config '{"mode":"VLLM_COMPILE","cudagraph_capture_sizes":[4]}' \
  -O3 \
  --async-scheduling \
  --language-model-only \
  --tool-call-parser qwen3_coder \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice \
  --speculative-config '{"method":"mtp","num_speculative_tokens":3}' \
  --default-chat-template-kwargs.preserve_thinking true \
  --mamba-cache-mode all \
  --mamba-block-size 8 \
  --enable-prefix-caching \
  --enable-chunked-prefill
  • With these settings, you get full context.
  • Note: This information is based on current understanding and testing. Optimal configurations may vary depending on your specific hardware setup. For further details, please refer to the official vLLM documentation.

Acknowledgements

Downloads last month
79
Safetensors
Model size
11B params
Tensor type
I32
BF16
F16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for Minachist/Qwen3.6-27B-Mixed-AutoRound

Base model

Qwen/Qwen3.6-27B
Quantized
(280)
this model