Instructions to use Minachist/Qwen3.6-27B-Mixed-AutoRound with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Minachist/Qwen3.6-27B-Mixed-AutoRound with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Minachist/Qwen3.6-27B-Mixed-AutoRound")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("Minachist/Qwen3.6-27B-Mixed-AutoRound")
model = AutoModelForImageTextToText.from_pretrained("Minachist/Qwen3.6-27B-Mixed-AutoRound")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Minachist/Qwen3.6-27B-Mixed-AutoRound with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Minachist/Qwen3.6-27B-Mixed-AutoRound"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Minachist/Qwen3.6-27B-Mixed-AutoRound",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Minachist/Qwen3.6-27B-Mixed-AutoRound

SGLang

How to use Minachist/Qwen3.6-27B-Mixed-AutoRound with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Minachist/Qwen3.6-27B-Mixed-AutoRound" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Minachist/Qwen3.6-27B-Mixed-AutoRound",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Minachist/Qwen3.6-27B-Mixed-AutoRound" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Minachist/Qwen3.6-27B-Mixed-AutoRound",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Minachist/Qwen3.6-27B-Mixed-AutoRound with Docker Model Runner:
```
docker model run hf.co/Minachist/Qwen3.6-27B-Mixed-AutoRound
```

Qwen3.6-27B-Mixed-AutoRound / README.md

Minachist

Update README.md

beedc78 verified 2 days ago

preview code

raw

history blame contribute delete

3.64 kB

	---
	library_name: transformers
	license: apache-2.0
	license_link: https://huggingface.co/Qwen/Qwen3.6-27B/blob/main/LICENSE
	pipeline_tag: image-text-to-text
	base_model: Qwen/Qwen3.6-27B
	base_model_relation: quantized
	tags:
	- compressed-tensors
	- qwen3_6
	- int4
	- int8
	- mixed
	- autoround
	---

	# Qwen3.6-27B Mixed AutoRound

	This is an unofficial quantized version of the Qwen3.6-27B.
	It was created using [AutoRound](https://github.com/intel/auto-round) with a custom mixed-precision recipe.

	## Quantization details

	* This model uses a mixed-precision quantization to balance performance and model size.
	* The `self_attn` layers are quantized to 8-bit.
	* The MLP layers are generally quantized to 4-bit, but the first 3 and last 3 layers are kept at 8-bit.
	* The `lm_head`, `linear_attn`, `visual`, `mtp.fc` layers are kept unquantized in FP16.

	\| Field \| Custom Mixed Recipe \|
	\|------\|------\|
	\| Base \| `Qwen/Qwen3.6-27B` \|
	\| Method \| AutoRound (`intel/auto-round`), custom recipe \|
	\| Scheme \| Mixed (W4A16 / W8A16) \|
	\| Bits \| 4 & 8 \|
	\| Group size \| 128 \|
	\| Symmetric \| yes \|
	\| Unquantized layers \| `lm_head`, `linear_attn`, `visual`, `mtp.fc` \|
	\| Calibration dataset \| `NeelNanda/pile-10k` \|
	\| Calibration samples \| 512 \|
	\| Sequence length \| 2048 \|
	\| Iterations \| 1000 \|
	\| Batch size \| 8 \|
	\| torch.compile \| enabled \|

	* For more information, please check `quantize.py`.

	### KLD Metrics
	\| Metric \| Value \| Description \|
	\| :--- \| :--- \| :--- \|
	\| Median KLD \| 0.005592 \| Median divergence \|
	\| P90 KLD \| 0.034514 \| Divergence at the 90th percentile \|
	\| Mean KLD \| 0.046941 \| Average divergence \|
	\| Mean Coverage \| 0.994750 \| - \|

	### Evaluation Configuration
	\| Parameter \| Value \|
	\| :--- \| :--- \|
	\| Calibration Dataset \| wikitext-2-raw-v1 (test) \|
	\| Sequence Length \| 2048 \|
	\| Num Samples \| 64 \|
	\| Total Positions \| 131,008 \|
	\| Top-K Reference \| 1000 \|

	## How to use

	* This model is tested on the latest `docker.io/vllm/vllm-openai:cu130-nightly`.
	* vLLM is recommended.
	* ⚠️ Important Note: Do NOT use `FLASHINFER` as the attention backend (`--attention-backend FLASHINFER`), as it may cause compatibility issues for some people!

	* Example args (For 2x 3090 Users) :
	```bash
	vllm serve ./Qwen3.6-27B-mixed-autoround \
	--tensor-parallel-size 2 \
	--attention-backend FLASH_ATTN \
	--performance-mode interactivity \
	--max-model-len auto \
	--max-num-batched-tokens 2048 \
	--max-num-seqs 1 \
	--gpu-memory-utilization 0.96 \
	--compilation-config '{"mode":"VLLM_COMPILE","cudagraph_capture_sizes":[4]}' \
	-O3 \
	--async-scheduling \
	--language-model-only \
	--tool-call-parser qwen3_coder \
	--reasoning-parser qwen3 \
	--enable-auto-tool-choice \
	--speculative-config '{"method":"mtp","num_speculative_tokens":3}' \
	--default-chat-template-kwargs.preserve_thinking true \
	--mamba-cache-mode all \
	--mamba-block-size 8 \
	--enable-prefix-caching \
	--enable-chunked-prefill
	```
	* With these settings, you get full context.
	* Note: This information is based on current understanding and testing. Optimal configurations may vary depending on your specific hardware setup. For further details, please refer to the official vLLM documentation.

	## Acknowledgements
	- [Lorbus](https://huggingface.co/Lorbus) for the README.md format
	- [Alibaba / Qwen team](https://huggingface.co/Qwen) for the base [Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B) model
	- [Intel AutoRound](https://github.com/intel/auto-round) team for the quantization framework
	- [vLLM project](https://github.com/vllm-project/vllm) for the inference engine and Qwen3_5 MTP support