OsaurusAI
/

MiniMax-M2.7-Small-JANGTQ

Text Generation

Mixture of Experts

mixture-of-experts

Model card Files Files and versions

Osaurus-AI commited on 14 days ago

Commit

f18c97b

·

verified ·

1 Parent(s): a4ac05c

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -25,7 +25,7 @@ tags:
 </p>
 <h3 align="center">MiniMax M2.7 Small &mdash; 138B-A10B &mdash; JANGTQ (MLX)</h3>
-<p align="center"><b>This is now a ~138B-A10B MoE</b> (down from MiniMax M2's 230B base) &mdash; 40% routed-expert prune + 2-bit JANGTQ quantization. Distilled from MiniMax M2 via REAP saliency + JANGTQ2 codebook quantization &mdash; routed experts at 2-bit via Lloyd-Max codebooks + Hadamard rotation, attention / embed / lm_head / dense MLP at 8-bit affine, norms and router at 16-bit.</p>
 <p align="center">
   <a href="https://osaurus.ai"><img src="https://img.shields.io/badge/Web-osaurus.ai-blue" alt="Website"></a>&nbsp;

 </p>
 <h3 align="center">MiniMax M2.7 Small &mdash; 138B-A10B &mdash; JANGTQ (MLX)</h3>
+<p align="center"><b>This is now a ~138B-A10B MoE &mdash; 38 GB on disk</b> (down from MiniMax M2's ~460 GB / 230B base) &mdash; 40% routed-expert prune + 2-bit JANGTQ quantization. Distilled from MiniMax M2 via REAP saliency + JANGTQ2 codebook quantization &mdash; routed experts at 2-bit via Lloyd-Max codebooks + Hadamard rotation, attention / embed / lm_head / dense MLP at 8-bit affine, norms and router at 16-bit.</p>
 <p align="center">
   <a href="https://osaurus.ai"><img src="https://img.shields.io/badge/Web-osaurus.ai-blue" alt="Website"></a>&nbsp;