plawanrath commited on
Commit
815fbf0
·
verified ·
1 Parent(s): 7f8ad50

Initial upload (research artifact for IEEE AIIoT 2026 paper)

Browse files
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ base_model: microsoft/Phi-3.5-mini-instruct
4
+ library_name: transformers
5
+ language:
6
+ - en
7
+ tags:
8
+ - pruning
9
+ - wanda
10
+ - bias-evaluation
11
+ - llm-compression
12
+ - research-only
13
+ ---
14
+
15
+ # phi-3.5-mini-instruct — wanda pruning at 10% target sparsity
16
+
17
+ > ⚠️ **Research artifact only — not for production use.**
18
+ > This model was created to *study* fairness degradation under weight pruning. The companion paper (IEEE AIIoT 2026) demonstrates that wanda pruning at this sparsity level induces measurable bias amplification on the BBQ benchmark. Do not deploy this model in any user-facing or decision-making system.
19
+
20
+ ## Paper
21
+
22
+ **Weight Pruning Amplifies Bias: A Multi-Method Study of Compressed LLMs for Edge AI**
23
+ Plawan Kumar Rath, Rahul Maliakkal. *IEEE AIIoT 2026.*
24
+
25
+ - Code: <https://github.com/plawanrath/pruning-impact-analysis>
26
+ - Base model: [microsoft/Phi-3.5-mini-instruct](https://huggingface.co/microsoft/Phi-3.5-mini-instruct)
27
+ - License: `mit` (inherited from base model — see [terms](https://opensource.org/licenses/MIT))
28
+
29
+ ## Pruning configuration
30
+
31
+ - **Method**: `wanda`
32
+ - **Target sparsity**: 10%
33
+ - **Actual sparsity achieved**: 6.94%
34
+ - **Zeroed parameters**: 251,494,400 of 3,623,878,656 prunable (6.94%)
35
+ - **Prune wall time**: 138.8s
36
+ - **Pruning scope**: linear layers in transformer blocks (attention projections + MLP). Embeddings, LM head, and layer norms are untouched.
37
+ - **Calibration set** (Wanda only): 128 samples from C4, sequence length 2048.
38
+
39
+ **Method description.** Activation-aware unstructured pruning (Wanda, ICLR 2024). Importance score is `|W_ij| * ||X_j||_2`, computed from 128 C4 calibration samples at sequence length 2048. Reported by the paper as the most dangerous method from a fairness standpoint despite preserving perplexity best.
40
+
41
+ ## Reported metrics (from the paper)
42
+
43
+ | Metric | Value | Reference |
44
+ |---|---|---|
45
+ | New-bias-emergence rate | 1.11% | % of items with per-item SRS=0 at dense that develop SRS>0 after pruning, across 5 seeds (Table III in paper) |
46
+ | Mean per-item inference latency (Apple Silicon, MLX) | 0.158s | **identical to the dense baseline** — unstructured pruning provides no latency benefit on dense GEMM kernels (paper §V.B) |
47
+
48
+ ## Important caveats for IoT / edge deployment
49
+
50
+ - **No storage savings.** Unstructured pruning zeroes individual weights but keeps them in the dense float tensor. SafeTensors and GGUF do not exploit unstructured sparsity, so the on-disk size of this checkpoint is **identical** to the dense base model.
51
+ - **No latency savings.** Dense GEMM kernels do not skip zero entries. Inference latency on Apple Silicon (MLX) and the majority of consumer GPUs / mobile NPUs is **identical** to the dense baseline.
52
+ - **Bias amplification may be invisible to perplexity-based eval.** The paper's headline finding (the *Smart Pruning Paradox*): Wanda at 50% sparsity on Mistral-7B raises perplexity 3.5% but raises Stereotype Reliance Score 83.7% — a 24× disparity. Standard deployment validation based on perplexity alone provides false assurance.
53
+
54
+ ## Citation
55
+
56
+ ```bibtex
57
+ @inproceedings{rath2026pruning,
58
+ title = {Weight Pruning Amplifies Bias: A Multi-Method Study of Compressed LLMs for Edge AI},
59
+ author = {Rath, Plawan Kumar and Maliakkal, Rahul},
60
+ booktitle = {Proc. IEEE AIIoT 2026},
61
+ year = {2026},
62
+ url = {https://github.com/plawanrath/pruning-impact-analysis}
63
+ }
64
+ ```
65
+
66
+ ## Reproducibility
67
+
68
+ - All pruning scripts, evaluation pipelines, and aggregated results: <https://github.com/plawanrath/pruning-impact-analysis>
69
+ - BBQ benchmark (ambiguous condition only): [`Elfsong/BBQ`](https://huggingface.co/datasets/Elfsong/BBQ)
70
+ - Generated from `pruning_meta.json` shipped in this repo (`actual_sparsity`, prune time, etc.).
71
+
chat_template.jinja ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {% for message in messages %}{% if message['role'] == 'system' and message['content'] %}{{'<|system|>
2
+ ' + message['content'] + '<|end|>
3
+ '}}{% elif message['role'] == 'user' %}{{'<|user|>
4
+ ' + message['content'] + '<|end|>
5
+ '}}{% elif message['role'] == 'assistant' %}{{'<|assistant|>
6
+ ' + message['content'] + '<|end|>
7
+ '}}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>
8
+ ' }}{% else %}{{ eos_token }}{% endif %}
config.json ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Phi3ForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration_phi3.Phi3Config",
9
+ "AutoModelForCausalLM": "modeling_phi3.Phi3ForCausalLM"
10
+ },
11
+ "bos_token_id": 1,
12
+ "embd_pdrop": 0.0,
13
+ "eos_token_id": [
14
+ 32007,
15
+ 32001,
16
+ 32000
17
+ ],
18
+ "hidden_act": "silu",
19
+ "hidden_size": 3072,
20
+ "initializer_range": 0.02,
21
+ "intermediate_size": 8192,
22
+ "max_position_embeddings": 131072,
23
+ "model_type": "phi3",
24
+ "num_attention_heads": 32,
25
+ "num_hidden_layers": 32,
26
+ "num_key_value_heads": 32,
27
+ "original_max_position_embeddings": 4096,
28
+ "pad_token_id": 32000,
29
+ "resid_pdrop": 0.0,
30
+ "rms_norm_eps": 1e-05,
31
+ "rope_scaling": {
32
+ "long_factor": [
33
+ 1.0800000429153442,
34
+ 1.1100000143051147,
35
+ 1.1399999856948853,
36
+ 1.340000033378601,
37
+ 1.5899999141693115,
38
+ 1.600000023841858,
39
+ 1.6200000047683716,
40
+ 2.620000123977661,
41
+ 3.2300000190734863,
42
+ 3.2300000190734863,
43
+ 4.789999961853027,
44
+ 7.400000095367432,
45
+ 7.700000286102295,
46
+ 9.09000015258789,
47
+ 12.199999809265137,
48
+ 17.670000076293945,
49
+ 24.46000099182129,
50
+ 28.57000160217285,
51
+ 30.420001983642578,
52
+ 30.840002059936523,
53
+ 32.590003967285156,
54
+ 32.93000411987305,
55
+ 42.320003509521484,
56
+ 44.96000289916992,
57
+ 50.340003967285156,
58
+ 50.45000457763672,
59
+ 57.55000305175781,
60
+ 57.93000411987305,
61
+ 58.21000289916992,
62
+ 60.1400032043457,
63
+ 62.61000442504883,
64
+ 62.62000274658203,
65
+ 62.71000289916992,
66
+ 63.1400032043457,
67
+ 63.1400032043457,
68
+ 63.77000427246094,
69
+ 63.93000411987305,
70
+ 63.96000289916992,
71
+ 63.970001220703125,
72
+ 64.02999877929688,
73
+ 64.06999969482422,
74
+ 64.08000183105469,
75
+ 64.12000274658203,
76
+ 64.41000366210938,
77
+ 64.4800033569336,
78
+ 64.51000213623047,
79
+ 64.52999877929688,
80
+ 64.83999633789062
81
+ ],
82
+ "short_factor": [
83
+ 1.0,
84
+ 1.0199999809265137,
85
+ 1.0299999713897705,
86
+ 1.0299999713897705,
87
+ 1.0499999523162842,
88
+ 1.0499999523162842,
89
+ 1.0499999523162842,
90
+ 1.0499999523162842,
91
+ 1.0499999523162842,
92
+ 1.0699999332427979,
93
+ 1.0999999046325684,
94
+ 1.1099998950958252,
95
+ 1.1599998474121094,
96
+ 1.1599998474121094,
97
+ 1.1699998378753662,
98
+ 1.2899998426437378,
99
+ 1.339999794960022,
100
+ 1.679999828338623,
101
+ 1.7899998426437378,
102
+ 1.8199998140335083,
103
+ 1.8499997854232788,
104
+ 1.8799997568130493,
105
+ 1.9099997282028198,
106
+ 1.9399996995925903,
107
+ 1.9899996519088745,
108
+ 2.0199997425079346,
109
+ 2.0199997425079346,
110
+ 2.0199997425079346,
111
+ 2.0199997425079346,
112
+ 2.0199997425079346,
113
+ 2.0199997425079346,
114
+ 2.0299997329711914,
115
+ 2.0299997329711914,
116
+ 2.0299997329711914,
117
+ 2.0299997329711914,
118
+ 2.0299997329711914,
119
+ 2.0299997329711914,
120
+ 2.0299997329711914,
121
+ 2.0299997329711914,
122
+ 2.0299997329711914,
123
+ 2.0799996852874756,
124
+ 2.0899996757507324,
125
+ 2.189999580383301,
126
+ 2.2199995517730713,
127
+ 2.5899994373321533,
128
+ 2.729999542236328,
129
+ 2.749999523162842,
130
+ 2.8399994373321533
131
+ ],
132
+ "type": "longrope"
133
+ },
134
+ "rope_theta": 10000.0,
135
+ "sliding_window": 262144,
136
+ "tie_word_embeddings": false,
137
+ "torch_dtype": "bfloat16",
138
+ "transformers_version": "4.43.3",
139
+ "use_cache": true,
140
+ "vocab_size": 32064
141
+ }
model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3925e2b9e5dde21c8ae9df0401570cb381ee80cbafd2b3082df252f0095f04f5
3
+ size 5356281360
model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69c8cc8c42d8506818de03671e77426c67f0358520b0cec2eb2caadb74445a16
3
+ size 2285900500
model.safetensors.index.json ADDED
@@ -0,0 +1,203 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 7642159104,
4
+ "total_parameters": 3821079552
5
+ },
6
+ "weight_map": {
7
+ "lm_head.weight": "model-00002-of-00002.safetensors",
8
+ "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
9
+ "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
10
+ "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
11
+ "model.layers.0.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
12
+ "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
13
+ "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
14
+ "model.layers.0.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
15
+ "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
16
+ "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
17
+ "model.layers.1.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
18
+ "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
19
+ "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
20
+ "model.layers.1.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
21
+ "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
22
+ "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
23
+ "model.layers.10.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
24
+ "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
25
+ "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
26
+ "model.layers.10.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
27
+ "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
28
+ "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
29
+ "model.layers.11.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
30
+ "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
31
+ "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
32
+ "model.layers.11.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
33
+ "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
34
+ "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
35
+ "model.layers.12.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
36
+ "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
37
+ "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
38
+ "model.layers.12.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
39
+ "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
40
+ "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
41
+ "model.layers.13.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
42
+ "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
43
+ "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
44
+ "model.layers.13.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
45
+ "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
46
+ "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
47
+ "model.layers.14.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
48
+ "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
49
+ "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
50
+ "model.layers.14.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
51
+ "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
52
+ "model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
53
+ "model.layers.15.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
54
+ "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
55
+ "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
56
+ "model.layers.15.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
57
+ "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
58
+ "model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
59
+ "model.layers.16.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
60
+ "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
61
+ "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
62
+ "model.layers.16.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
63
+ "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
64
+ "model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
65
+ "model.layers.17.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
66
+ "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
67
+ "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
68
+ "model.layers.17.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
69
+ "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
70
+ "model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
71
+ "model.layers.18.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
72
+ "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
73
+ "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
74
+ "model.layers.18.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
75
+ "model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
76
+ "model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
77
+ "model.layers.19.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
78
+ "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
79
+ "model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
80
+ "model.layers.19.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
81
+ "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
82
+ "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
83
+ "model.layers.2.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
84
+ "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
85
+ "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
86
+ "model.layers.2.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
87
+ "model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
88
+ "model.layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
89
+ "model.layers.20.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
90
+ "model.layers.20.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
91
+ "model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
92
+ "model.layers.20.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
93
+ "model.layers.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
94
+ "model.layers.21.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
95
+ "model.layers.21.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
96
+ "model.layers.21.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
97
+ "model.layers.21.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
98
+ "model.layers.21.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
99
+ "model.layers.22.input_layernorm.weight": "model-00002-of-00002.safetensors",
100
+ "model.layers.22.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
101
+ "model.layers.22.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
102
+ "model.layers.22.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
103
+ "model.layers.22.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
104
+ "model.layers.22.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
105
+ "model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
106
+ "model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
107
+ "model.layers.23.mlp.gate_up_proj.weight": "model-00002-of-00002.safetensors",
108
+ "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
109
+ "model.layers.23.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
110
+ "model.layers.23.self_attn.qkv_proj.weight": "model-00002-of-00002.safetensors",
111
+ "model.layers.24.input_layernorm.weight": "model-00002-of-00002.safetensors",
112
+ "model.layers.24.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
113
+ "model.layers.24.mlp.gate_up_proj.weight": "model-00002-of-00002.safetensors",
114
+ "model.layers.24.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
115
+ "model.layers.24.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
116
+ "model.layers.24.self_attn.qkv_proj.weight": "model-00002-of-00002.safetensors",
117
+ "model.layers.25.input_layernorm.weight": "model-00002-of-00002.safetensors",
118
+ "model.layers.25.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
119
+ "model.layers.25.mlp.gate_up_proj.weight": "model-00002-of-00002.safetensors",
120
+ "model.layers.25.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
121
+ "model.layers.25.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
122
+ "model.layers.25.self_attn.qkv_proj.weight": "model-00002-of-00002.safetensors",
123
+ "model.layers.26.input_layernorm.weight": "model-00002-of-00002.safetensors",
124
+ "model.layers.26.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
125
+ "model.layers.26.mlp.gate_up_proj.weight": "model-00002-of-00002.safetensors",
126
+ "model.layers.26.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
127
+ "model.layers.26.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
128
+ "model.layers.26.self_attn.qkv_proj.weight": "model-00002-of-00002.safetensors",
129
+ "model.layers.27.input_layernorm.weight": "model-00002-of-00002.safetensors",
130
+ "model.layers.27.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
131
+ "model.layers.27.mlp.gate_up_proj.weight": "model-00002-of-00002.safetensors",
132
+ "model.layers.27.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
133
+ "model.layers.27.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
134
+ "model.layers.27.self_attn.qkv_proj.weight": "model-00002-of-00002.safetensors",
135
+ "model.layers.28.input_layernorm.weight": "model-00002-of-00002.safetensors",
136
+ "model.layers.28.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
137
+ "model.layers.28.mlp.gate_up_proj.weight": "model-00002-of-00002.safetensors",
138
+ "model.layers.28.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
139
+ "model.layers.28.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
140
+ "model.layers.28.self_attn.qkv_proj.weight": "model-00002-of-00002.safetensors",
141
+ "model.layers.29.input_layernorm.weight": "model-00002-of-00002.safetensors",
142
+ "model.layers.29.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
143
+ "model.layers.29.mlp.gate_up_proj.weight": "model-00002-of-00002.safetensors",
144
+ "model.layers.29.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
145
+ "model.layers.29.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
146
+ "model.layers.29.self_attn.qkv_proj.weight": "model-00002-of-00002.safetensors",
147
+ "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
148
+ "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
149
+ "model.layers.3.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
150
+ "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
151
+ "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
152
+ "model.layers.3.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
153
+ "model.layers.30.input_layernorm.weight": "model-00002-of-00002.safetensors",
154
+ "model.layers.30.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
155
+ "model.layers.30.mlp.gate_up_proj.weight": "model-00002-of-00002.safetensors",
156
+ "model.layers.30.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
157
+ "model.layers.30.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
158
+ "model.layers.30.self_attn.qkv_proj.weight": "model-00002-of-00002.safetensors",
159
+ "model.layers.31.input_layernorm.weight": "model-00002-of-00002.safetensors",
160
+ "model.layers.31.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
161
+ "model.layers.31.mlp.gate_up_proj.weight": "model-00002-of-00002.safetensors",
162
+ "model.layers.31.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
163
+ "model.layers.31.self_attn.o_proj.weight": "model-00002-of-00002.safetensors",
164
+ "model.layers.31.self_attn.qkv_proj.weight": "model-00002-of-00002.safetensors",
165
+ "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
166
+ "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
167
+ "model.layers.4.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
168
+ "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
169
+ "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
170
+ "model.layers.4.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
171
+ "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
172
+ "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
173
+ "model.layers.5.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
174
+ "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
175
+ "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
176
+ "model.layers.5.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
177
+ "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
178
+ "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
179
+ "model.layers.6.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
180
+ "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
181
+ "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
182
+ "model.layers.6.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
183
+ "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
184
+ "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
185
+ "model.layers.7.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
186
+ "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
187
+ "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
188
+ "model.layers.7.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
189
+ "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
190
+ "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
191
+ "model.layers.8.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
192
+ "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
193
+ "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
194
+ "model.layers.8.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
195
+ "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
196
+ "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
197
+ "model.layers.9.mlp.gate_up_proj.weight": "model-00001-of-00002.safetensors",
198
+ "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
199
+ "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
200
+ "model.layers.9.self_attn.qkv_proj.weight": "model-00001-of-00002.safetensors",
201
+ "model.norm.weight": "model-00002-of-00002.safetensors"
202
+ }
203
+ }
pruning_meta.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model": "phi-3.5-mini-instruct",
3
+ "method": "wanda",
4
+ "target_sparsity": 10,
5
+ "actual_sparsity": 6.9399,
6
+ "total_prunable_params": 3623878656,
7
+ "zero_params": 251494400,
8
+ "prune_time_s": 138.8
9
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "bos_token": "<s>",
4
+ "clean_up_tokenization_spaces": false,
5
+ "eos_token": "<|endoftext|>",
6
+ "is_local": true,
7
+ "legacy": false,
8
+ "model_max_length": 131072,
9
+ "pad_token": "<|endoftext|>",
10
+ "padding_side": "left",
11
+ "sp_model_kwargs": {},
12
+ "tokenizer_class": "TokenizersBackend",
13
+ "unk_token": "<unk>",
14
+ "use_default_system_prompt": false
15
+ }