stevenkuang commited on
Commit
ac8c1bb
·
verified ·
1 Parent(s): 5bceb89

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -68
README.md CHANGED
@@ -83,6 +83,33 @@ For more experimental results and analysis, please refer to our [report](./HY_MT
83
  ---
84
 
85
  ## Inference and Deployment
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  ### transformers
87
 
88
  transformers>=5.6.0
@@ -91,7 +118,7 @@ transformers>=5.6.0
91
  from transformers import AutoModelForCausalLM, AutoTokenizer
92
  import torch
93
 
94
- model_path = "tencent/Hy-MT2-30B-A3B"
95
 
96
  # Load tokenizer
97
  tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
@@ -134,7 +161,7 @@ uv pip install --editable . --torch-backend=auto
134
  Start the vLLM server:
135
 
136
  ```bash
137
- vllm serve tencent/Hy-MT2-30B-A3B --tensor-parallel-size 1
138
  ```
139
 
140
  ### sglang
@@ -151,74 +178,9 @@ pip3 install -e "python"
151
  Launch SGLang server:
152
 
153
  ```bash
154
- python3 -m sglang.launch_server --model tencent/Hy-MT2-30B-A3B --tp 1
155
- ```
156
-
157
- ### llama_cpp
158
- **❕❕ This gguf depends on our STQ kernel, which is released at [PR #22836](https://github.com/ggml-org/llama.cpp/pull/22836).**
159
-
160
- #### Clone llama.cpp
161
-
162
- ```bash
163
- git clone https://github.com/ggml-org/llama.cpp.git
164
- ```
165
-
166
- #### Enter the llama.cpp folder
167
-
168
- ```bash
169
- cd llama.cpp
170
- ```
171
-
172
- #### Build llama.cpp
173
-
174
- ```bash
175
- cmake -B build
176
- cmake --build build --config Release
177
- ```
178
-
179
- #### Run a completion example
180
-
181
- ```bash
182
- ./build/bin/llama-completion \
183
- --model model.gguf \
184
- -p "Translate the following segment into Chinese, without additional explanation:Hello" \
185
- --jinja \
186
- -ngl 0 \
187
- -n 64 -st
188
  ```
189
 
190
- #### Run the llama.cpp benchmark
191
-
192
- ```bash
193
- ./build/bin/llama-bench -m model_zoo/model.gguf -ngl 0
194
- ```
195
-
196
-
197
- For 1.8B and 7B, we recommend using the following parameters for inference. Note that our models do not have a default system_prompt.
198
-
199
- ```json
200
-
201
- {
202
- "temperature": 0.7,
203
- "top_p": 0.6,
204
- "top_k": 20,
205
- "repetition_penalty": 1.05,
206
- "max_tokens": 4096
207
- }
208
- ```
209
-
210
- For 30B-A3B, we recommend using the following parameters for inference. Note that our models do not have a default system_prompt.
211
-
212
- ```json
213
-
214
- {
215
- "temperature": 0.7,
216
- "top_p": 1.0,
217
- "top_k": -1,
218
- "repetition_penalty": 1.0,
219
- "max_tokens": 4096
220
- }
221
- ```
222
 
223
  ## Model Training
224
  Hy-MT2 provides a complete model training pipeline, supporting both full-parameter fine-tuning and LoRA fine-tuning, as well as multiple DeepSpeed ZeRO configurations and LLaMA-Factory integration.
 
83
  ---
84
 
85
  ## Inference and Deployment
86
+
87
+ For 1.8B and 7B, we recommend using the following parameters for inference. Note that our models do not have a default system_prompt.
88
+
89
+ ```json
90
+
91
+ {
92
+ "temperature": 0.7,
93
+ "top_p": 0.6,
94
+ "top_k": 20,
95
+ "repetition_penalty": 1.05,
96
+ "max_tokens": 4096
97
+ }
98
+ ```
99
+
100
+ For 30B-A3B, we recommend using the following parameters for inference. Note that our models do not have a default system_prompt.
101
+
102
+ ```json
103
+
104
+ {
105
+ "temperature": 0.7,
106
+ "top_p": 1.0,
107
+ "top_k": -1,
108
+ "repetition_penalty": 1.0,
109
+ "max_tokens": 4096
110
+ }
111
+ ```
112
+
113
  ### transformers
114
 
115
  transformers>=5.6.0
 
118
  from transformers import AutoModelForCausalLM, AutoTokenizer
119
  import torch
120
 
121
+ model_path = "tencent/Hy-MT2-1.8B"
122
 
123
  # Load tokenizer
124
  tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
 
161
  Start the vLLM server:
162
 
163
  ```bash
164
+ vllm serve tencent/Hy-MT2-1.8B --tensor-parallel-size 1
165
  ```
166
 
167
  ### sglang
 
178
  Launch SGLang server:
179
 
180
  ```bash
181
+ python3 -m sglang.launch_server --model tencent/Hy-MT2-1.8B --tp 1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
182
  ```
183
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
184
 
185
  ## Model Training
186
  Hy-MT2 provides a complete model training pipeline, supporting both full-parameter fine-tuning and LoRA fine-tuning, as well as multiple DeepSpeed ZeRO configurations and LLaMA-Factory integration.