Instructions to use Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps
LM Studio

Pi new

How to use Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed

Run Hermes

hermes

MLX LM

How to use Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Youssofal commited on 20 days ago

Commit

995f754

verified ·

1 Parent(s): d85dd52

Recommend 3-bit draft head for speed lane

Browse files

Files changed (3) hide show

MTPLX_PUBLISH_MANIFEST.json +3 -3
README.md +6 -4
mtplx_runtime.json +5 -0

MTPLX_PUBLISH_MANIFEST.json CHANGED Viewed

@@ -78,7 +78,7 @@
       "name": "mtplx_runtime.json",
       "resolved_source_path": "/Users/youssof/Documents/MTPLX/models/Qwen3.6-27B-MTPLX-Flat4-CyanKiwiMTP/mtplx_runtime.json",
       "same_inode_as_source": false,
-      "size_bytes": 589,
       "source_path": "/Users/youssof/Documents/MTPLX/models/Qwen3.6-27B-MTPLX-Flat4-CyanKiwiMTP/mtplx_runtime.json"
     },
     {
@@ -142,7 +142,7 @@
       "name": "README.md",
       "resolved_source_path": null,
       "same_inode_as_source": false,
-      "size_bytes": 2651,
       "source_path": null
     },
     {
@@ -156,7 +156,7 @@
   ],
   "include_hashes": false,
   "repo_id": "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed",
-  "size_bytes": 16419069596,
   "source_provenance": {
     "base_model": "Qwen/Qwen3.6-27B",
     "base_model_revision": "6a9e13bd6fc8f0983b9b99948120bc37f49c13e9",

       "name": "mtplx_runtime.json",
       "resolved_source_path": "/Users/youssof/Documents/MTPLX/models/Qwen3.6-27B-MTPLX-Flat4-CyanKiwiMTP/mtplx_runtime.json",
       "same_inode_as_source": false,
+      "size_bytes": 1025,
       "source_path": "/Users/youssof/Documents/MTPLX/models/Qwen3.6-27B-MTPLX-Flat4-CyanKiwiMTP/mtplx_runtime.json"
     },
     {
       "name": "README.md",
       "resolved_source_path": null,
       "same_inode_as_source": false,
+      "size_bytes": 2797,
       "source_path": null
     },
     {
   ],
   "include_hashes": false,
   "repo_id": "Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed",
+  "size_bytes": 16419070178,
   "source_provenance": {
     "base_model": "Qwen/Qwen3.6-27B",
     "base_model_revision": "6a9e13bd6fc8f0983b9b99948120bc37f49c13e9",

README.md CHANGED Viewed

@@ -55,6 +55,7 @@ compressed-tensors checkpoint at the revision listed above.
 - architecture: `qwen3-next-mtp`
 - maximum MTP depth: `3`
 - recommended profile: `performance-cold`
 - exactness gate: `Phase 0H paged-verifier smoke`
 - exactness max absolute diff: `0.0`
 - verified hardware: `Apple M5 Max, 128 GB unified memory`
@@ -66,10 +67,11 @@ top-p `0.95`, top-k `20`.
 ## Performance Honesty
 This is the speed lane. On the local Apple M5 Max fanmax performance-cold
-benchmark, this artifact reached 57.668 tok/s at depth 3 on the long-code
-192-token prompt, with acceptance [94.3%, 90.6%, 77.4%]. It is faster than the
-current GDN8+CyanKiwi quality/default artifact on the same lane, while GDN8
-remains the conservative quality/default checkpoint.
 ## Files

 - architecture: `qwen3-next-mtp`
 - maximum MTP depth: `3`
 - recommended profile: `performance-cold`
+- recommended draft-only LM head: `3-bit affine, group_size=64`
 - exactness gate: `Phase 0H paged-verifier smoke`
 - exactness max absolute diff: `0.0`
 - verified hardware: `Apple M5 Max, 128 GB unified memory`
 ## Performance Honesty
 This is the speed lane. On the local Apple M5 Max fanmax performance-cold
+benchmark, this artifact reached 60.061 tok/s at depth 3 on the long-code
+192-token prompt when using its contract-recommended 3-bit draft-only LM head,
+with acceptance [100.0%, 98.0%, 87.8%]. The same flat4+CyanKiwi artifact with
+the older 4-bit draft-only LM head measured 57.668 tok/s on the same lane.
+GDN8 remains the conservative quality/default checkpoint.
 ## Files

mtplx_runtime.json CHANGED Viewed

@@ -6,6 +6,11 @@
   "mtp_sidecar": "Qwen3.6-27B-MTPLX-CyanKiwi-Packed-BF16-INT4-v3",
   "mtp_depth_max": 3,
   "recommended_profile": "performance-cold",
   "sampler": {
     "temperature": 0.6,
     "top_p": 0.95,

   "mtp_sidecar": "Qwen3.6-27B-MTPLX-CyanKiwi-Packed-BF16-INT4-v3",
   "mtp_depth_max": 3,
   "recommended_profile": "performance-cold",
+  "recommended_draft_lm_head": {
+    "bits": 3,
+    "group_size": 64,
+    "mode": "affine"
+  },
   "sampler": {
     "temperature": 0.6,
     "top_p": 0.95,