drawais commited on
Commit
97e305e
·
verified ·
1 Parent(s): 36ba742

Initial upload of Qwen2.5-Math-7B-Instruct-AWQ-INT4

Browse files
Files changed (8) hide show
  1. LICENSE +202 -0
  2. NOTICE +6 -0
  3. README.md +29 -17
  4. config.json +38 -9
  5. generation_config.json +2 -2
  6. model.safetensors +3 -0
  7. recipe.yaml +23 -0
  8. tokenizer.json +2 -2
LICENSE ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
5
+
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
+
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright [yyyy] [name of copyright owner]
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
NOTICE ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ NOTICE
2
+
3
+ This artifact is a derivative work of Qwen/Qwen2.5-Math-7B-Instruct, distributed under the Apache License, Version 2.0.
4
+ The full license text is in the LICENSE file at the root of this repository.
5
+
6
+ Source model: https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct
README.md CHANGED
@@ -1,45 +1,57 @@
1
  ---
2
  license: apache-2.0
 
3
  base_model: Qwen/Qwen2.5-Math-7B-Instruct
4
  tags:
5
  - quantized
6
  - 4-bit
7
  - int4
8
- - qwen2.5
9
- - math
10
  language:
11
  - en
 
12
  pipeline_tag: text-generation
13
  ---
14
 
15
  # Qwen2.5-Math-7B-Instruct-AWQ-INT4
16
 
17
- INT4 quantization of [`Qwen/Qwen2.5-Math-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct). Math-specialized 7B model, runs on a single 8 GB+ consumer GPU.
18
 
19
- ## Footprint
20
 
21
- | | |
22
  |---|---|
23
- | Source params | 7B (math-specialized) |
24
- | Quantized weights | ~5.2 GB on disk |
25
- | Inference VRAM (incl. KV cache @ 32K context) | ~10 GB |
 
 
26
 
27
- ## Quick start
28
 
29
  ```bash
30
- vllm serve drawais/Qwen2.5-Math-7B-Instruct-AWQ-INT4 --quantization awq_marlin --max-model-len 32768
 
 
31
  ```
32
 
33
  ```python
34
- from transformers import AutoTokenizer, AutoModelForCausalLM
35
- tok = AutoTokenizer.from_pretrained("drawais/Qwen2.5-Math-7B-Instruct-AWQ-INT4")
36
- model = AutoModelForCausalLM.from_pretrained("drawais/Qwen2.5-Math-7B-Instruct-AWQ-INT4", device_map="auto")
37
  ```
38
 
39
- ## Bench
 
 
 
 
40
 
41
- Leaderboard score on [`drawais/needle-1M-bench-mvp`](https://huggingface.co/datasets/drawais/needle-1M-bench-mvp) coming after upload.
 
42
 
43
- ## License
 
44
 
45
- Apache 2.0 (inherits from base model).
 
 
1
  ---
2
  license: apache-2.0
3
+ license_link: https://www.apache.org/licenses/LICENSE-2.0
4
  base_model: Qwen/Qwen2.5-Math-7B-Instruct
5
  tags:
6
  - quantized
7
  - 4-bit
8
  - int4
9
+ - awq
 
10
  language:
11
  - en
12
+ library_name: transformers
13
  pipeline_tag: text-generation
14
  ---
15
 
16
  # Qwen2.5-Math-7B-Instruct-AWQ-INT4
17
 
18
+ INT4 weight-only quantization of [`Qwen/Qwen2.5-Math-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct).
19
 
20
+ Qwen 2.5 Math 7B-Instruct in INT4. About 5 GB on disk. Runs on an 8 GB consumer GPU.
21
 
22
+ | Property | Value |
23
  |---|---|
24
+ | Base model | [Qwen/Qwen2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct) |
25
+ | Quantization | INT4 weight-only |
26
+ | Approx. on-disk size | ~5.6 GB |
27
+ | License | Apache License, Version 2.0 |
28
+ | Languages | English |
29
 
30
+ ## Load (vLLM)
31
 
32
  ```bash
33
+ vllm serve drawais/Qwen2.5-Math-7B-Instruct-AWQ-INT4 \
34
+ --max-model-len 32768 \
35
+ --gpu-memory-utilization 0.94
36
  ```
37
 
38
  ```python
39
+ from vllm import LLM, SamplingParams
40
+ llm = LLM(model="drawais/Qwen2.5-Math-7B-Instruct-AWQ-INT4", max_model_len=32768)
41
+ print(llm.generate(["Hello!"], SamplingParams(max_tokens=128))[0].outputs[0].text)
42
  ```
43
 
44
+ ## Footprint
45
+
46
+ ~5.6 GB on disk. Recommended VRAM: enough headroom for KV cache.
47
+
48
+ ## License & attribution
49
 
50
+ This artifact is a derivative work of [`Qwen/Qwen2.5-Math-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct),
51
+ released by its original authors under the **Apache License, Version 2.0**.
52
 
53
+ This artifact is distributed under the same license. The full license text is
54
+ included in [`LICENSE`](LICENSE), and required attribution is in [`NOTICE`](NOTICE).
55
 
56
+ License text: https://www.apache.org/licenses/LICENSE-2.0
57
+ Source model: https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct
config.json CHANGED
@@ -48,12 +48,41 @@
48
  "num_key_value_heads": 4,
49
  "pad_token_id": null,
50
  "quantization_config": {
51
- "bits": 4,
52
- "group_size": 128,
53
- "modules_to_not_convert": null,
54
- "quant_method": "awq",
55
- "version": "gemm",
56
- "zero_point": true
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  },
58
  "rms_norm_eps": 1e-06,
59
  "rope_parameters": {
@@ -62,8 +91,8 @@
62
  },
63
  "sliding_window": null,
64
  "tie_word_embeddings": false,
65
- "transformers_version": "5.6.2",
66
- "use_cache": false,
67
  "use_sliding_window": false,
68
  "vocab_size": 152064
69
- }
 
48
  "num_key_value_heads": 4,
49
  "pad_token_id": null,
50
  "quantization_config": {
51
+ "config_groups": {
52
+ "group_0": {
53
+ "format": "pack-quantized",
54
+ "input_activations": null,
55
+ "output_activations": null,
56
+ "targets": [
57
+ "Linear"
58
+ ],
59
+ "weights": {
60
+ "actorder": null,
61
+ "block_structure": null,
62
+ "dynamic": false,
63
+ "group_size": 128,
64
+ "num_bits": 4,
65
+ "observer": "memoryless_minmax",
66
+ "observer_kwargs": {},
67
+ "scale_dtype": null,
68
+ "strategy": "group",
69
+ "symmetric": true,
70
+ "type": "int",
71
+ "zp_dtype": null
72
+ }
73
+ }
74
+ },
75
+ "format": "pack-quantized",
76
+ "global_compression_ratio": null,
77
+ "ignore": [
78
+ "lm_head"
79
+ ],
80
+ "kv_cache_scheme": null,
81
+ "quant_method": "compressed-tensors",
82
+ "quantization_status": "compressed",
83
+ "sparsity_config": {},
84
+ "transform_config": {},
85
+ "version": "0.15.1.a20260428"
86
  },
87
  "rms_norm_eps": 1e-06,
88
  "rope_parameters": {
 
91
  },
92
  "sliding_window": null,
93
  "tie_word_embeddings": false,
94
+ "transformers_version": "5.8.0.dev0",
95
+ "use_cache": true,
96
  "use_sliding_window": false,
97
  "vocab_size": 152064
98
+ }
generation_config.json CHANGED
@@ -1,10 +1,10 @@
1
  {
2
  "bos_token_id": 151643,
3
- "do_sample": true,
4
  "eos_token_id": [
5
  151645,
6
  151643
7
  ],
8
  "pad_token_id": 151643,
9
- "transformers_version": "5.6.2"
10
  }
 
1
  {
2
  "bos_token_id": 151643,
3
+ "do_sample": false,
4
  "eos_token_id": [
5
  151645,
6
  151643
7
  ],
8
  "pad_token_id": 151643,
9
+ "transformers_version": "5.8.0.dev0"
10
  }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8f72874477a9319bc7bc805fa0c5c8fa795ee810e5e852495fe782a1f3279f4
3
+ size 5545344064
recipe.yaml ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ default_stage:
2
+ default_modifiers:
3
+ AWQModifier:
4
+ mappings:
5
+ - smooth_layer: re:.*input_layernorm$
6
+ balance_layers: ['re:.*q_proj$', 're:.*k_proj$', 're:.*v_proj$']
7
+ activation_hook_target: null
8
+ - smooth_layer: re:.*v_proj$
9
+ balance_layers: ['re:.*o_proj$']
10
+ activation_hook_target: null
11
+ - smooth_layer: re:.*post_attention_layernorm$
12
+ balance_layers: ['re:.*gate_proj$', 're:.*up_proj$']
13
+ activation_hook_target: null
14
+ - smooth_layer: re:.*up_proj$
15
+ balance_layers: ['re:.*down_proj$']
16
+ activation_hook_target: null
17
+ duo_scaling: true
18
+ n_grid: 20
19
+ QuantizationModifier:
20
+ targets: [Linear]
21
+ ignore: [lm_head, 're:.*embed.*', 're:.*router.*', 're:.*\.gate$']
22
+ scheme: W4A16
23
+ bypass_divisibility_checks: false
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3fd169731d2cbde95e10bf356d66d5997fd885dd8dbb6fb4684da3f23b2585d8
3
- size 11421892
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2f55e63353d3d978b390d346bae531be8b83bc9532c0be500d62b7253aa4c595
3
+ size 11421991