Upload quant-forge artifacts
Browse files- README.md +6 -7
- config.json +0 -0
- model-00001-of-00004.safetensors +3 -0
- model-00002-of-00004.safetensors +3 -0
- model-00003-of-00004.safetensors +3 -0
- model-00004-of-00004.safetensors +3 -0
- model.safetensors.index.json +0 -0
- recipe.yaml +1 -1
README.md
CHANGED
|
@@ -20,23 +20,22 @@ tags:
|
|
| 20 |
- Calibration dataset: `HuggingFaceH4/ultrachat_200k`
|
| 21 |
- Calibration samples: `32`
|
| 22 |
- Max sequence length: `512`
|
| 23 |
-
- Ignored layers: `lm_head, re:.*
|
| 24 |
|
| 25 |
## Accuracy (BF16 vs NVFP4)
|
| 26 |
|
| 27 |
-
_Recovery status: partial_
|
| 28 |
-
_Details: Quantized evaluation unavailable due to current vLLM MoE quantization compatibility in this environment._
|
| 29 |
-
|
| 30 |
| Task | Metric | BF16 | NVFP4 | Recovery |
|
| 31 |
|---|---:|---:|---:|---:|
|
| 32 |
-
| arc_challenge | acc,none | 0.
|
| 33 |
-
| hellaswag | acc,none | 0.4000 |
|
|
|
|
|
|
|
| 34 |
|
| 35 |
> **Note:** Scores estimated from subset.
|
| 36 |
|
| 37 |
## Performance
|
| 38 |
|
| 39 |
-
_Performance benchmark unavailable:
|
| 40 |
|
| 41 |
## Usage (vLLM)
|
| 42 |
|
|
|
|
| 20 |
- Calibration dataset: `HuggingFaceH4/ultrachat_200k`
|
| 21 |
- Calibration samples: `32`
|
| 22 |
- Max sequence length: `512`
|
| 23 |
+
- Ignored layers: `lm_head, re:.*\.mlp\.gate$, re:.*\.mlp\.router$`
|
| 24 |
|
| 25 |
## Accuracy (BF16 vs NVFP4)
|
| 26 |
|
|
|
|
|
|
|
|
|
|
| 27 |
| Task | Metric | BF16 | NVFP4 | Recovery |
|
| 28 |
|---|---:|---:|---:|---:|
|
| 29 |
+
| arc_challenge | acc,none | 0.4000 | 0.3000 | 0.750 |
|
| 30 |
+
| hellaswag | acc,none | 0.4000 | 0.4000 | 1.000 |
|
| 31 |
+
|
| 32 |
+
Aggregate macro recovery: **0.875**
|
| 33 |
|
| 34 |
> **Note:** Scores estimated from subset.
|
| 35 |
|
| 36 |
## Performance
|
| 37 |
|
| 38 |
+
_Performance benchmark unavailable: evaluate.skip_perf=true_
|
| 39 |
|
| 40 |
## Usage (vLLM)
|
| 41 |
|
config.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
model-00001-of-00004.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3fafee4328ec020dc01fd4587b5f04929c6fc740564dfe7b6ffd1711a543a9fc
|
| 3 |
+
size 5002279496
|
model-00002-of-00004.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f0e0506d4eb7d9111e3878f17f77fbe597efcbb18ca03307efd56a84600ab44e
|
| 3 |
+
size 5002723840
|
model-00003-of-00004.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b5efb70a71e6b21b194af3d5ae68d733fe4f2d998d9638dac52b91f6f2a6ce3a
|
| 3 |
+
size 5002036280
|
model-00004-of-00004.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a15068f737814d410a094b7913ae8707341023689428f0a8be31594ceb0d06be
|
| 3 |
+
size 3089670712
|
model.safetensors.index.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
recipe.yaml
CHANGED
|
@@ -2,5 +2,5 @@ default_stage:
|
|
| 2 |
default_modifiers:
|
| 3 |
QuantizationModifier:
|
| 4 |
targets: [Linear]
|
| 5 |
-
ignore: [lm_head, 're:.*
|
| 6 |
scheme: NVFP4
|
|
|
|
| 2 |
default_modifiers:
|
| 3 |
QuantizationModifier:
|
| 4 |
targets: [Linear]
|
| 5 |
+
ignore: [lm_head, 're:.*\.mlp\.gate$', 're:.*\.mlp\.router$']
|
| 6 |
scheme: NVFP4
|