Instructions to use Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed

Run Hermes

hermes

MLX LM

How to use Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

How to get mtplx binary?

by calvarado2004 - opened 20 days ago

Discussion

calvarado2004

20 days ago

I've found

https://github.com/mlx-community/speculative-decoding

but I'm not sure if is equivalent to mtplx binary tool.

Youssofal

Owner 19 days ago

Releasing later today! I have not released it yet.

When it is out expect a 2- 2.5x speed increase on temp 0.6.

calvarado2004

14 days ago

•

edited 14 days ago

I'm using this test:

Given this PGN string of a chess game:

1. b3 e5 2. Nf3 h5 3. d4 exd4 4. Nxd4 Nf6 5. f4 Ke7 6. Qd3 d5 7. h4 *

Figure out the current state of the chessboard, create an image in SVG code, also highlight the last move.

This test makes the model to draw from code the SVG position of the chess game described, it is harder than it sounds, it demonstrates if a model starts to drift or forgetting details.

It takes around 13k context tokens for a model to produce and answer. Believe me, quantization erosion is quite real, this test made me stick around to Q8 on MoE or at least to Q6 on dense models.

Your custom Qwen 3.6 27B model did a good job! Almost correct, it forgot to generate a white Rock.

mtplx serve --model Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed --port 8081 --max-tokens 262411 --mtp --depth 3

But you don't mention which quantization this variant have, is it Q4?

After a second attempt, the model started to show up the quantization erosion issues:

That's why I believe this is Q4.

And this is the right position, generated with a frontier model:

Youssofal

Owner 14 days ago

Haha I also use a similar test to build an HTML chess game with an opponent AI to see if it correctly implements stalemate and checkmate rules.

This model is 4 bit with 16 bit MTP heads. I have another varient available at 4.75 bits and I am releasing a 6 and 8 bit varient soon.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment