Fix broken folder links in README
Browse files
README.md
CHANGED
|
@@ -19,14 +19,14 @@ Eight variants, one per (sparsity × calibration × allocation):
|
|
| 19 |
|
| 20 |
| Sparsity | Calibration | Allocation | Folder |
|
| 21 |
|---|---|---|---|
|
| 22 |
-
| 25% | coding | uniform | [`coding-25-uniform/`](./coding-25-uniform) |
|
| 23 |
-
| 25% | coding | nonuniform | [`coding-25-nonuniform/`](./coding-25-nonuniform) |
|
| 24 |
-
| 50% | coding | uniform | [`coding-50-uniform/`](./coding-50-uniform) |
|
| 25 |
-
| 50% | coding | nonuniform | [`coding-50-nonuniform/`](./coding-50-nonuniform) |
|
| 26 |
-
| 25% | general | uniform | [`general-25-uniform/`](./general-25-uniform) |
|
| 27 |
-
| 25% | general | nonuniform | [`general-25-nonuniform/`](./general-25-nonuniform) |
|
| 28 |
-
| 50% | general | uniform | [`general-50-uniform/`](./general-50-uniform) |
|
| 29 |
-
| 50% | general | nonuniform | [`general-50-nonuniform/`](./general-50-nonuniform) |
|
| 30 |
|
| 31 |
|
| 32 |
## What is expert pruning?
|
|
@@ -159,28 +159,28 @@ All evaluations run with vLLM (bf16, greedy decoding). Coding benchmarks: HumanE
|
|
| 159 |
| Variant | Size | HumanEval | rec. | MBPP | rec. |
|
| 160 |
|---|---|---|---|---|---|
|
| 161 |
| **Full model** | **159 GB** | **0.744** | n/a | **0.764** | n/a |
|
| 162 |
-
| [coding-25-uniform](./coding-25-uniform) | 121 GB | 0.683 | 92% | **0.688** | **90%** |
|
| 163 |
-
| [coding-25-nonuniform](./coding-25-nonuniform) | 121 GB | **0.744** | **100%** | 0.678 | 89% |
|
| 164 |
-
| [coding-50-uniform](./coding-50-uniform) | 82 GB | 0.409 | 55% | 0.534 | 70% |
|
| 165 |
-
| [coding-50-nonuniform](./coding-50-nonuniform) | 82 GB | **0.720** | **97%** | **0.690** | **90%** |
|
| 166 |
-
| [general-25-uniform](./general-25-uniform) | 121 GB | 0.043 | 6% | 0.046 | 6% |
|
| 167 |
-
| [general-25-nonuniform](./general-25-nonuniform) | 121 GB | 0.061 | 8% | 0.058 | 8% |
|
| 168 |
-
| [general-50-uniform](./general-50-uniform) | 82 GB | 0.000 | 0% | 0.018 | 2% |
|
| 169 |
-
| [general-50-nonuniform](./general-50-nonuniform) | 82 GB | 0.012 | 2% | 0.010 | 1% |
|
| 170 |
|
| 171 |
### General benchmarks (MC-8)
|
| 172 |
|
| 173 |
| Variant | MC-8 avg | rec. | ARC-C | ARC-E | BoolQ | HSwag | MMLU | OBQA | RTE | WinoG |
|
| 174 |
|---|---|---|---|---|---|---|---|---|---|---|
|
| 175 |
| **Full model** | **0.714** | n/a | 0.606 | 0.821 | 0.885 | 0.775 | 0.767 | 0.430 | 0.765 | 0.666 |
|
| 176 |
-
| [coding-25-uniform](./coding-25-uniform) | 0.656 | 92% | 0.501 | 0.722 | 0.864 | 0.690 | 0.710 | 0.380 | 0.729 | 0.655 |
|
| 177 |
-
| [coding-25-nonuniform](./coding-25-nonuniform) | 0.638 | 89% | 0.462 | 0.662 | 0.851 | 0.665 | 0.680 | 0.362 | **0.776** | 0.642 |
|
| 178 |
-
| [coding-50-uniform](./coding-50-uniform) | 0.577 | 81% | 0.403 | 0.641 | 0.789 | 0.578 | 0.564 | 0.350 | 0.671 | 0.616 |
|
| 179 |
-
| [coding-50-nonuniform](./coding-50-nonuniform) | 0.546 | 76% | 0.356 | 0.555 | 0.776 | 0.548 | 0.543 | 0.340 | 0.646 | 0.603 |
|
| 180 |
-
| [general-25-uniform](./general-25-uniform) | 0.707 | 99% | 0.600 | 0.807 | 0.876 | **0.785** | 0.704 | **0.452** | 0.751 | 0.677 |
|
| 181 |
-
| [general-25-nonuniform](./general-25-nonuniform) | **0.714** | **100%** | **0.618** | **0.822** | **0.882** | 0.776 | **0.712** | 0.442 | 0.762 | **0.699** |
|
| 182 |
-
| [general-50-uniform](./general-50-uniform) | **0.654** | **92%** | **0.541** | **0.771** | 0.839 | **0.709** | **0.610** | **0.428** | **0.675** | **0.658** |
|
| 183 |
-
| [general-50-nonuniform](./general-50-nonuniform) | 0.644 | 90% | 0.526 | 0.762 | **0.842** | 0.708 | 0.595 | 0.414 | **0.675** | 0.635 |
|
| 184 |
|
| 185 |
### Key takeaways
|
| 186 |
|
|
|
|
| 19 |
|
| 20 |
| Sparsity | Calibration | Allocation | Folder |
|
| 21 |
|---|---|---|---|
|
| 22 |
+
| 25% | coding | uniform | [`coding-25-uniform/`](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/coding-25-uniform) |
|
| 23 |
+
| 25% | coding | nonuniform | [`coding-25-nonuniform/`](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/coding-25-nonuniform) |
|
| 24 |
+
| 50% | coding | uniform | [`coding-50-uniform/`](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/coding-50-uniform) |
|
| 25 |
+
| 50% | coding | nonuniform | [`coding-50-nonuniform/`](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/coding-50-nonuniform) |
|
| 26 |
+
| 25% | general | uniform | [`general-25-uniform/`](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/general-25-uniform) |
|
| 27 |
+
| 25% | general | nonuniform | [`general-25-nonuniform/`](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/general-25-nonuniform) |
|
| 28 |
+
| 50% | general | uniform | [`general-50-uniform/`](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/general-50-uniform) |
|
| 29 |
+
| 50% | general | nonuniform | [`general-50-nonuniform/`](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/general-50-nonuniform) |
|
| 30 |
|
| 31 |
|
| 32 |
## What is expert pruning?
|
|
|
|
| 159 |
| Variant | Size | HumanEval | rec. | MBPP | rec. |
|
| 160 |
|---|---|---|---|---|---|
|
| 161 |
| **Full model** | **159 GB** | **0.744** | n/a | **0.764** | n/a |
|
| 162 |
+
| [coding-25-uniform](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/coding-25-uniform) | 121 GB | 0.683 | 92% | **0.688** | **90%** |
|
| 163 |
+
| [coding-25-nonuniform](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/coding-25-nonuniform) | 121 GB | **0.744** | **100%** | 0.678 | 89% |
|
| 164 |
+
| [coding-50-uniform](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/coding-50-uniform) | 82 GB | 0.409 | 55% | 0.534 | 70% |
|
| 165 |
+
| [coding-50-nonuniform](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/coding-50-nonuniform) | 82 GB | **0.720** | **97%** | **0.690** | **90%** |
|
| 166 |
+
| [general-25-uniform](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/general-25-uniform) | 121 GB | 0.043 | 6% | 0.046 | 6% |
|
| 167 |
+
| [general-25-nonuniform](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/general-25-nonuniform) | 121 GB | 0.061 | 8% | 0.058 | 8% |
|
| 168 |
+
| [general-50-uniform](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/general-50-uniform) | 82 GB | 0.000 | 0% | 0.018 | 2% |
|
| 169 |
+
| [general-50-nonuniform](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/general-50-nonuniform) | 82 GB | 0.012 | 2% | 0.010 | 1% |
|
| 170 |
|
| 171 |
### General benchmarks (MC-8)
|
| 172 |
|
| 173 |
| Variant | MC-8 avg | rec. | ARC-C | ARC-E | BoolQ | HSwag | MMLU | OBQA | RTE | WinoG |
|
| 174 |
|---|---|---|---|---|---|---|---|---|---|---|
|
| 175 |
| **Full model** | **0.714** | n/a | 0.606 | 0.821 | 0.885 | 0.775 | 0.767 | 0.430 | 0.765 | 0.666 |
|
| 176 |
+
| [coding-25-uniform](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/coding-25-uniform) | 0.656 | 92% | 0.501 | 0.722 | 0.864 | 0.690 | 0.710 | 0.380 | 0.729 | 0.655 |
|
| 177 |
+
| [coding-25-nonuniform](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/coding-25-nonuniform) | 0.638 | 89% | 0.462 | 0.662 | 0.851 | 0.665 | 0.680 | 0.362 | **0.776** | 0.642 |
|
| 178 |
+
| [coding-50-uniform](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/coding-50-uniform) | 0.577 | 81% | 0.403 | 0.641 | 0.789 | 0.578 | 0.564 | 0.350 | 0.671 | 0.616 |
|
| 179 |
+
| [coding-50-nonuniform](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/coding-50-nonuniform) | 0.546 | 76% | 0.356 | 0.555 | 0.776 | 0.548 | 0.543 | 0.340 | 0.646 | 0.603 |
|
| 180 |
+
| [general-25-uniform](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/general-25-uniform) | 0.707 | 99% | 0.600 | 0.807 | 0.876 | **0.785** | 0.704 | **0.452** | 0.751 | 0.677 |
|
| 181 |
+
| [general-25-nonuniform](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/general-25-nonuniform) | **0.714** | **100%** | **0.618** | **0.822** | **0.882** | 0.776 | **0.712** | 0.442 | 0.762 | **0.699** |
|
| 182 |
+
| [general-50-uniform](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/general-50-uniform) | **0.654** | **92%** | **0.541** | **0.771** | 0.839 | **0.709** | **0.610** | **0.428** | **0.675** | **0.658** |
|
| 183 |
+
| [general-50-nonuniform](https://huggingface.co/ISTA-DASLab/Qwen3-Coder-Next-RCO-pruned/tree/main/general-50-nonuniform) | 0.644 | 90% | 0.526 | 0.762 | **0.842** | 0.708 | 0.595 | 0.414 | **0.675** | 0.635 |
|
| 184 |
|
| 185 |
### Key takeaways
|
| 186 |
|