stevenkuang commited on
Commit
446567a
·
verified ·
1 Parent(s): 1fa8bee

Update README_CN.md

Browse files
Files changed (1) hide show
  1. README_CN.md +30 -70
README_CN.md CHANGED
@@ -82,6 +82,33 @@ Hy-MT2 是一款面向真实复杂场景的“快思考”多语言翻译模型
82
  ---
83
 
84
  ## 推理和部署
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
  ### transformers
86
 
87
  transformers>=5.6.0
@@ -90,7 +117,7 @@ transformers>=5.6.0
90
  from transformers import AutoModelForCausalLM, AutoTokenizer
91
  import torch
92
 
93
- model_path = "tencent/Hy-MT2-30B-A3B"
94
 
95
  # Load tokenizer
96
  tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
@@ -133,7 +160,7 @@ uv pip install --editable . --torch-backend=auto
133
  Start the vLLM server:
134
 
135
  ```bash
136
- vllm serve tencent/Hy-MT2-30B-A3B --tensor-parallel-size 1
137
  ```
138
 
139
  ### sglang
@@ -150,74 +177,7 @@ pip3 install -e "python"
150
  Launch SGLang server:
151
 
152
  ```bash
153
- python3 -m sglang.launch_server --model tencent/Hy-MT2-30B-A3B --tp 1
154
- ```
155
-
156
-
157
- ### llama_cpp
158
- **❕❕ This gguf depends on our STQ kernel, which is released at [PR #22836](https://github.com/ggml-org/llama.cpp/pull/22836).**
159
-
160
- #### Clone llama.cpp
161
-
162
- ```bash
163
- git clone https://github.com/ggml-org/llama.cpp.git
164
- ```
165
-
166
- #### Enter the llama.cpp folder
167
-
168
- ```bash
169
- cd llama.cpp
170
- ```
171
-
172
- #### Build llama.cpp
173
-
174
- ```bash
175
- cmake -B build
176
- cmake --build build --config Release
177
- ```
178
-
179
- #### Run a completion example
180
-
181
- ```bash
182
- ./build/bin/llama-completion \
183
- --model model.gguf \
184
- -p "Translate the following segment into Chinese, without additional explanation:Hello" \
185
- --jinja \
186
- -ngl 0 \
187
- -n 64 -st
188
- ```
189
-
190
- #### Run the llama.cpp benchmark
191
-
192
- ```bash
193
- ./build/bin/llama-bench -m model_zoo/model.gguf -ngl 0
194
- ```
195
-
196
-
197
- 对于1.8B和7B,我们推荐使用下面这组参数进行推理。注意,我们的模型没有默认 system_prompt。
198
-
199
- ```json
200
-
201
- {
202
- "temperature": 0.7,
203
- "top_p": 0.6,
204
- "top_k": 20,
205
- "repetition_penalty": 1.05,
206
- "max_tokens": 4096
207
- }
208
- ```
209
-
210
- 对于30B-A3B,我们推荐使用下面这组参数进行推理。注意,我们的模型没有默认 system_prompt。
211
-
212
- ```json
213
-
214
- {
215
- "temperature": 0.7,
216
- "top_p": 1.0,
217
- "top_k": -1,
218
- "repetition_penalty": 1.0,
219
- "max_tokens": 4096
220
- }
221
  ```
222
 
223
 
 
82
  ---
83
 
84
  ## 推理和部署
85
+
86
+ 对于1.8B和7B,我们推荐使用下面这组参数进行推理。注意,我们的模型没有默认 system_prompt。
87
+
88
+ ```json
89
+
90
+ {
91
+ "temperature": 0.7,
92
+ "top_p": 0.6,
93
+ "top_k": 20,
94
+ "repetition_penalty": 1.05,
95
+ "max_tokens": 4096
96
+ }
97
+ ```
98
+
99
+ 对于30B-A3B,我们推荐使用下面这组参数进行推理。注意,我们的模型没有默认 system_prompt。
100
+
101
+ ```json
102
+
103
+ {
104
+ "temperature": 0.7,
105
+ "top_p": 1.0,
106
+ "top_k": -1,
107
+ "repetition_penalty": 1.0,
108
+ "max_tokens": 4096
109
+ }
110
+ ```
111
+
112
  ### transformers
113
 
114
  transformers>=5.6.0
 
117
  from transformers import AutoModelForCausalLM, AutoTokenizer
118
  import torch
119
 
120
+ model_path = "tencent/Hy-MT2-1.8B"
121
 
122
  # Load tokenizer
123
  tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
 
160
  Start the vLLM server:
161
 
162
  ```bash
163
+ vllm serve tencent/Hy-MT2-1.8B --tensor-parallel-size 1
164
  ```
165
 
166
  ### sglang
 
177
  Launch SGLang server:
178
 
179
  ```bash
180
+ python3 -m sglang.launch_server --model tencent/Hy-MT2-1.8B --tp 1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
181
  ```
182
 
183