Neooooo commited on
Commit
24027ea
·
verified ·
1 Parent(s): 5ac4fa9

Upload quant-forge artifacts

Browse files
README.md CHANGED
@@ -20,23 +20,22 @@ tags:
20
  - Calibration dataset: `HuggingFaceH4/ultrachat_200k`
21
  - Calibration samples: `32`
22
  - Max sequence length: `512`
23
- - Ignored layers: `lm_head, re:.*gate.*, re:.*router.*`
24
 
25
  ## Accuracy (BF16 vs NVFP4)
26
 
27
- _Recovery status: partial_
28
- _Details: Quantized evaluation unavailable due to current vLLM MoE quantization compatibility in this environment._
29
-
30
  | Task | Metric | BF16 | NVFP4 | Recovery |
31
  |---|---:|---:|---:|---:|
32
- | arc_challenge | acc,none | 0.5000 | n/a | n/a |
33
- | hellaswag | acc,none | 0.4000 | n/a | n/a |
 
 
34
 
35
  > **Note:** Scores estimated from subset.
36
 
37
  ## Performance
38
 
39
- _Performance benchmark unavailable: perf skipped because quantized evaluation did not complete successfully_
40
 
41
  ## Usage (vLLM)
42
 
 
20
  - Calibration dataset: `HuggingFaceH4/ultrachat_200k`
21
  - Calibration samples: `32`
22
  - Max sequence length: `512`
23
+ - Ignored layers: `lm_head, re:.*\.mlp\.gate$, re:.*\.mlp\.router$`
24
 
25
  ## Accuracy (BF16 vs NVFP4)
26
 
 
 
 
27
  | Task | Metric | BF16 | NVFP4 | Recovery |
28
  |---|---:|---:|---:|---:|
29
+ | arc_challenge | acc,none | 0.4000 | 0.3000 | 0.750 |
30
+ | hellaswag | acc,none | 0.4000 | 0.4000 | 1.000 |
31
+
32
+ Aggregate macro recovery: **0.875**
33
 
34
  > **Note:** Scores estimated from subset.
35
 
36
  ## Performance
37
 
38
+ _Performance benchmark unavailable: evaluate.skip_perf=true_
39
 
40
  ## Usage (vLLM)
41
 
config.json CHANGED
The diff for this file is too large to render. See raw diff
 
model-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3fafee4328ec020dc01fd4587b5f04929c6fc740564dfe7b6ffd1711a543a9fc
3
+ size 5002279496
model-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f0e0506d4eb7d9111e3878f17f77fbe597efcbb18ca03307efd56a84600ab44e
3
+ size 5002723840
model-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b5efb70a71e6b21b194af3d5ae68d733fe4f2d998d9638dac52b91f6f2a6ce3a
3
+ size 5002036280
model-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a15068f737814d410a094b7913ae8707341023689428f0a8be31594ceb0d06be
3
+ size 3089670712
model.safetensors.index.json CHANGED
The diff for this file is too large to render. See raw diff
 
recipe.yaml CHANGED
@@ -2,5 +2,5 @@ default_stage:
2
  default_modifiers:
3
  QuantizationModifier:
4
  targets: [Linear]
5
- ignore: [lm_head, 're:.*gate.*', 're:.*router.*']
6
  scheme: NVFP4
 
2
  default_modifiers:
3
  QuantizationModifier:
4
  targets: [Linear]
5
+ ignore: [lm_head, 're:.*\.mlp\.gate$', 're:.*\.mlp\.router$']
6
  scheme: NVFP4