Youssofal commited on
Commit
01d5989
·
verified ·
1 Parent(s): 399c137

Tighten MTPLX section, drop self-link, evergreen heading

Browse files
Files changed (1) hide show
  1. README.md +3 -4
README.md CHANGED
@@ -20,9 +20,9 @@ tags:
20
 
21
  # Qwen3.5-4B MTPLX Optimized Speed (Q4 trunk)
22
 
23
- ## MTPLX is released
24
 
25
- This checkpoint runs on **MTPLX** — an MLX-native runtime for native Multi-Token-Prediction speculative decoding on Apple Silicon. Up to **2.24× faster decode** at real coding temperatures (`temp=0.6 / top_p=0.95 / top_k=20`), using the model's own built-in MTP heads. No external drafter, no greedy hack, no distribution drift.
26
 
27
  ```bash
28
  pip install mtplx
@@ -31,11 +31,10 @@ mtplx start
31
 
32
  **Project:** [github.com/youssofal/MTPLX](https://github.com/youssofal/MTPLX)
33
 
34
- **MTPLX model fleet on Hugging Face:**
35
 
36
  - [Qwen3.6-27B-MTPLX-Optimized-Speed](https://huggingface.co/Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed) — 4-bit flagship speed (63 TPS on M5 Max)
37
  - [Qwen3.6-27B-MTPLX-Optimized](https://huggingface.co/Youssofal/Qwen3.6-27B-MTPLX-Optimized) — verified default (GDN8-Speed4 trunk + CyanKiwi INT4 MTP)
38
- - [Qwen3.5-4B-MTPLX-Optimized-Speed](https://huggingface.co/Youssofal/Qwen3.5-4B-MTPLX-Optimized-Speed) — small 4-bit speed-test
39
  - [Qwen3.5-4B-Optimized-MTPLX](https://huggingface.co/Youssofal/Qwen3.5-4B-Optimized-MTPLX) — small 8-bit
40
 
41
  ---
 
20
 
21
  # Qwen3.5-4B MTPLX Optimized Speed (Q4 trunk)
22
 
23
+ ## Run this with MTPLX
24
 
25
+ **MTPLX** is an MLX-native runtime for native Multi-Token-Prediction speculative decoding on Apple Silicon. Up to **2.24× faster decode** at real coding temperatures (`temp=0.6 / top_p=0.95 / top_k=20`) using the model's own built-in MTP heads — no external drafter, no greedy hack.
26
 
27
  ```bash
28
  pip install mtplx
 
31
 
32
  **Project:** [github.com/youssofal/MTPLX](https://github.com/youssofal/MTPLX)
33
 
34
+ **Other MTPLX checkpoints:**
35
 
36
  - [Qwen3.6-27B-MTPLX-Optimized-Speed](https://huggingface.co/Youssofal/Qwen3.6-27B-MTPLX-Optimized-Speed) — 4-bit flagship speed (63 TPS on M5 Max)
37
  - [Qwen3.6-27B-MTPLX-Optimized](https://huggingface.co/Youssofal/Qwen3.6-27B-MTPLX-Optimized) — verified default (GDN8-Speed4 trunk + CyanKiwi INT4 MTP)
 
38
  - [Qwen3.5-4B-Optimized-MTPLX](https://huggingface.co/Youssofal/Qwen3.5-4B-Optimized-MTPLX) — small 8-bit
39
 
40
  ---