Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -10,6 +10,8 @@ pipeline_tag: text-generation
|
|
| 10 |
|
| 11 |
Made possible by [Lambda.ai](https://huggingface.co/lambda) ❤️
|
| 12 |
|
|
|
|
|
|
|
| 13 |
## Use with mlx
|
| 14 |
|
| 15 |
```bash
|
|
|
|
| 10 |
|
| 11 |
Made possible by [Lambda.ai](https://huggingface.co/lambda) ❤️
|
| 12 |
|
| 13 |
+
DeepSeek-V4-Flash-2bit-DQ uses a dynamic mixed-precision quantization policy. Most routed MoE expert weights are packed to 2-bit, while sensitive layers and projections remain in higher-quality 4-bit, 6-bit or 8-bit quantization. This keeps memory use much lower than the baseline 4-bit checkpoint.
|
| 14 |
+
|
| 15 |
## Use with mlx
|
| 16 |
|
| 17 |
```bash
|