Youssofal commited on
Commit
aed68a8
·
verified ·
1 Parent(s): ac8e4d3

Drop remaining 'coming soon' references; MTPLX is released

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -40,9 +40,9 @@ mtplx start
40
 
41
  This artifact pairs the Qwen3.6-27B trunk — MLX-quantized with MTPLX's `gdn8-speed4` policy (8-bit Gated Delta Network linears, 4-bit MLP, BF16 norms) — with a **calibrated INT4 Multi-Token-Prediction sidecar** grafted onto the trunk. The MTP head is what enables *native* speculative decoding: the model drafts its own tokens, with no external draft model required.
42
 
43
- When MTPLX is released, it will accept those draft tokens with **mathematically exact** probability-ratio acceptance and residual correction, so the speculative path stays distribution-preserving at realistic coding settings (`temperature=0.6`, `top_p=0.95`, `top_k=20`) — not just greedy.
44
 
45
- Until then you can still:
46
 
47
  - Inspect the architecture and MTP tensors with any `safetensors` reader.
48
  - Use the trunk weights with [`mlx-lm`](https://github.com/ml-explore/mlx-lm) for ordinary autoregressive decoding (the MTP head is sidecar-only and ignored by `mlx-lm`).
@@ -104,5 +104,5 @@ This checkpoint is released under the **Apache License 2.0**, matching the Qwen3
104
 
105
  ## Links
106
 
107
- - **Runtime (coming soon)**: [github.com/youssofal/mtplx](https://github.com/youssofal/mtplx)
108
  - **Base model**: [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B)
 
40
 
41
  This artifact pairs the Qwen3.6-27B trunk — MLX-quantized with MTPLX's `gdn8-speed4` policy (8-bit Gated Delta Network linears, 4-bit MLP, BF16 norms) — with a **calibrated INT4 Multi-Token-Prediction sidecar** grafted onto the trunk. The MTP head is what enables *native* speculative decoding: the model drafts its own tokens, with no external draft model required.
42
 
43
+ MTPLX accepts those draft tokens with **mathematically exact** probability-ratio acceptance and residual correction, so the speculative path stays distribution-preserving at realistic coding settings (`temperature=0.6`, `top_p=0.95`, `top_k=20`) — not just greedy.
44
 
45
+ You can also:
46
 
47
  - Inspect the architecture and MTP tensors with any `safetensors` reader.
48
  - Use the trunk weights with [`mlx-lm`](https://github.com/ml-explore/mlx-lm) for ordinary autoregressive decoding (the MTP head is sidecar-only and ignored by `mlx-lm`).
 
104
 
105
  ## Links
106
 
107
+ - **Runtime**: [github.com/youssofal/MTPLX](https://github.com/youssofal/MTPLX) · `pip install mtplx`
108
  - **Base model**: [Qwen/Qwen3.6-27B](https://huggingface.co/Qwen/Qwen3.6-27B)