Instructions to use unsloth/Qwen3.6-27B-UD-MLX-3bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use unsloth/Qwen3.6-27B-UD-MLX-3bit with MLX:

# Make sure mlx-vlm is installed
# pip install --upgrade mlx-vlm

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load the model
model, processor = load("unsloth/Qwen3.6-27B-UD-MLX-3bit")
config = load_config("unsloth/Qwen3.6-27B-UD-MLX-3bit")

# Prepare input
image = ["http://images.cocodataset.org/val2017/000000039769.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=1
)

# Generate output
output = generate(model, processor, formatted_prompt, image)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Unsloth Studio new

How to use unsloth/Qwen3.6-27B-UD-MLX-3bit with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/Qwen3.6-27B-UD-MLX-3bit to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/Qwen3.6-27B-UD-MLX-3bit to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for unsloth/Qwen3.6-27B-UD-MLX-3bit to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="unsloth/Qwen3.6-27B-UD-MLX-3bit",
    max_seq_length=2048,
)

Pi new

How to use unsloth/Qwen3.6-27B-UD-MLX-3bit with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "unsloth/Qwen3.6-27B-UD-MLX-3bit"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "unsloth/Qwen3.6-27B-UD-MLX-3bit"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use unsloth/Qwen3.6-27B-UD-MLX-3bit with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "unsloth/Qwen3.6-27B-UD-MLX-3bit"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default unsloth/Qwen3.6-27B-UD-MLX-3bit

Run Hermes

hermes

Quick question: I noticed your MLX 3-bit variant sits at ~24GB, while GGUF’s Q3_K_S is only ~12.4GB?

by realperson1234 - opened 18 days ago

Discussion

realperson1234

18 days ago

Hi Daniel and the Unsloth team,

Thank you for your amazing efforts and consistent work on Unsloth. The recent 2-bit Qwen3.6-27B showcase (the one that made 26 tool calls, Reddit post) really highlights the potential of everything you're building. Kudos to the entire team! 👏

Quick question: I noticed your MLX 3-bit variant sits at ~24GB, while GGUF’s Q3_K_S is only ~12.4GB (UD-Q2_K_XL is ~11.8GB). I also see that some GGUF/MLX variants already match that footprint (e.g., Qwen3.6-27B-oQ2 mlx at 11.4GB, Qwen3.6-27B-oQ4 mlx at 16.7GB). I’m curious:

What’s driving the ~2x size difference in Unsloth’s current MLX quantization pipeline compared to GGUF’s compression strategy?
With native MLX still pending in Studio, what’s the expected timeline for tighter size/accuracy parity, and will Unsloth align its MLX quantization naming/strategy with GGUF’s more compressed families?

Looking forward to your insights. Thanks again for the incredible work pushing local AI forward!

Nick-Co

18 days ago

This probably is an effective bit ~4 or such. Don't think there any way I can run this on my Mac mini M4 Pro with just 24GB. ☹️

Manan1717

15 days ago

Hi Daniel and the Unsloth team,

Thank you for your amazing efforts and consistent work on Unsloth. The recent 2-bit Qwen3.6-27B showcase (the one that made 26 tool calls, Reddit post) really highlights the potential of everything you're building. Kudos to the entire team! 👏

Quick question: I noticed your MLX 3-bit variant sits at ~24GB, while GGUF’s Q3_K_S is only ~12.4GB (UD-Q2_K_XL is ~11.8GB). I also see that some GGUF/MLX variants already match that footprint (e.g., Qwen3.6-27B-oQ2 mlx at 11.4GB, Qwen3.6-27B-oQ4 mlx at 16.7GB). I’m curious:

What’s driving the ~2x size difference in Unsloth’s current MLX quantization pipeline compared to GGUF’s compression strategy?

With native MLX still pending in Studio, what’s the expected timeline for tighter size/accuracy parity, and will Unsloth align its MLX quantization naming/strategy with GGUF’s more compressed families?

Looking forward to your insights. Thanks again for the incredible work pushing local AI forward!

So for the MLX's quant we don't have each layer in 3-bit. There are some layers which are sensitive to quantization so we keep them in 8bit or 16bit. That is why the size of the 3bit model is ~24GB. To maintain the accuracy.

Manan1717

15 days ago

This probably is an effective bit ~4 or such. Don't think there any way I can run this on my Mac mini M4 Pro with just 24GB. ☹️

You can still try out GGUF's using unsloth studio or llama.cpp

realperson1234

14 days ago

•

edited 14 days ago

So for the MLX's quant we don't have each layer in 3-bit. There are some layers which are sensitive to quantization so we keep them in 8bit or 16bit

Yep i understand that, but here is the thing, your 35B model UD 3 bit mlx here https://huggingface.co/unsloth/Qwen3.6-35B-A3B-UD-MLX-3bit is only 17.4Gb then why does a 27B UD mlx 3 bit bigger then this, so either it is not 3bit more like 6 bit or something seems off, is just what i am calling out.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment