Files changed (1) hide show
  1. README.md +8 -7
README.md CHANGED
@@ -16,8 +16,9 @@ base_model:
16
  - **Operating System(s):** Linux
17
  - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
18
  - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.11.1)
19
- - **Weight quantization:** MOE-only, OCP MXFP4, Static
20
- - **Activation quantization:** MOE-only, OCP MXFP4, Dynamic
 
21
  - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
22
 
23
  This model was built with Kimi-K2-Instruct model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization.
@@ -30,7 +31,7 @@ The model was quantized from [unsloth/Kimi-K2-Instruct-0905-BF16](https://huggin
30
  **Quantization scripts:**
31
  ```
32
  cd Quark/examples/torch/language_modeling/llm_ptq/
33
- exclude_layers="*self_attn* *mlp.gate *lm_head *mlp.gate_proj *mlp.up_proj *mlp.down_proj *shared_experts*"
34
 
35
  python quantize_quark.py \
36
  --model_dir unsloth/Kimi-K2-Instruct-0905-BF16 \
@@ -62,13 +63,13 @@ The model was evaluated on GSM8K benchmarks.
62
  </td>
63
  </tr>
64
  <tr>
65
- <td>GSM8K (strict-match)
66
  </td>
67
- <td>95.53
68
  </td>
69
- <td>94.47
70
  </td>
71
- <td>98.89%
72
  </td>
73
  </tr>
74
  </table>
 
16
  - **Operating System(s):** Linux
17
  - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
18
  - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.11.1)
19
+ - **Quantized layers:** Experts, Shared_experts
20
+ - **Weight quantization:** OCP MXFP4, Static
21
+ - **Activation quantization:** OCP MXFP4, Dynamic
22
  - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
23
 
24
  This model was built with Kimi-K2-Instruct model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for MXFP4 quantization.
 
31
  **Quantization scripts:**
32
  ```
33
  cd Quark/examples/torch/language_modeling/llm_ptq/
34
+ exclude_layers="*self_attn* *mlp.gate *lm_head *mlp.gate_proj *mlp.up_proj *mlp.down_proj"
35
 
36
  python quantize_quark.py \
37
  --model_dir unsloth/Kimi-K2-Instruct-0905-BF16 \
 
63
  </td>
64
  </tr>
65
  <tr>
66
+ <td>GSM8K (flexible-extract)
67
  </td>
68
+ <td>95.45
69
  </td>
70
+ <td>93.78
71
  </td>
72
+ <td>98.25%
73
  </td>
74
  </tr>
75
  </table>