prince-canuma commited on
Commit
722bf55
·
verified ·
1 Parent(s): f77170f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -10,6 +10,8 @@ pipeline_tag: text-generation
10
 
11
  Made possible by [Lambda.ai](https://huggingface.co/lambda) ❤️
12
 
 
 
13
  ## Use with mlx
14
 
15
  ```bash
 
10
 
11
  Made possible by [Lambda.ai](https://huggingface.co/lambda) ❤️
12
 
13
+ DeepSeek-V4-Flash-2bit-DQ uses a dynamic mixed-precision quantization policy. Most routed MoE expert weights are packed to 2-bit, while sensitive layers and projections remain in higher-quality 4-bit, 6-bit or 8-bit quantization. This keeps memory use much lower than the baseline 4-bit checkpoint.
14
+
15
  ## Use with mlx
16
 
17
  ```bash