tencent
/

Hy-MT2-1.8B

hunyuan_v1_dense

text-generation

Model card Files Files and versions

stevenkuang commited on 2 days ago

Commit

446567a

·

verified ·

1 Parent(s): 1fa8bee

Update README_CN.md

Files changed (1) hide show

README_CN.md +30 -70

README_CN.md CHANGED Viewed

@@ -82,6 +82,33 @@ Hy-MT2 是一款面向真实复杂场景的“快思考”多语言翻译模型
 ---
 ## 推理和部署
 ### transformers
 transformers>=5.6.0
@@ -90,7 +117,7 @@ transformers>=5.6.0
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
-model_path = "tencent/Hy-MT2-30B-A3B"
 # Load tokenizer
 tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
@@ -133,7 +160,7 @@ uv pip install --editable . --torch-backend=auto
 Start the vLLM server:
 ```bash
-vllm serve tencent/Hy-MT2-30B-A3B --tensor-parallel-size 1
 ```
 ### sglang
@@ -150,74 +177,7 @@ pip3 install -e "python"
 Launch SGLang server:
 ```bash
-python3 -m sglang.launch_server --model tencent/Hy-MT2-30B-A3B --tp 1
-```
-### llama_cpp
-**❕❕ This gguf depends on our STQ kernel, which is released at [PR #22836](https://github.com/ggml-org/llama.cpp/pull/22836).**
-#### Clone llama.cpp
-```bash
-git clone https://github.com/ggml-org/llama.cpp.git
-```
-#### Enter the llama.cpp folder
-```bash
-cd llama.cpp
-```
-#### Build llama.cpp
-```bash
-cmake -B build
-cmake --build build --config Release
-```
-#### Run a completion example
-```bash
-./build/bin/llama-completion \
-  --model model.gguf  \
-  -p "Translate the following segment into Chinese, without additional explanation：Hello" \
-  --jinja \
-  -ngl 0 \
-  -n 64 -st
-```
-#### Run the llama.cpp benchmark
-```bash
-./build/bin/llama-bench -m model_zoo/model.gguf  -ngl 0
-```
-对于1.8B和7B，我们推荐使用下面这组参数进行推理。注意，我们的模型没有默认 system_prompt。
-```json
-{
-  "temperature": 0.7,
-  "top_p": 0.6,
-  "top_k": 20,
-  "repetition_penalty": 1.05,
-  "max_tokens": 4096
-}
-```
-对于30B-A3B，我们推荐使用下面这组参数进行推理。注意，我们的模型没有默认 system_prompt。
-```json
-{
-  "temperature": 0.7,
-  "top_p": 1.0,
-  "top_k": -1,
-  "repetition_penalty": 1.0,
-  "max_tokens": 4096
-}
 ```

 ---
 ## 推理和部署
+对于1.8B和7B，我们推荐使用下面这组参数进行推理。注意，我们的模型没有默认 system_prompt。
+```json
+{
+  "temperature": 0.7,
+  "top_p": 0.6,
+  "top_k": 20,
+  "repetition_penalty": 1.05,
+  "max_tokens": 4096
+}
+```
+对于30B-A3B，我们推荐使用下面这组参数进行推理。注意，我们的模型没有默认 system_prompt。
+```json
+{
+  "temperature": 0.7,
+  "top_p": 1.0,
+  "top_k": -1,
+  "repetition_penalty": 1.0,
+  "max_tokens": 4096
+}
+```
 ### transformers
 transformers>=5.6.0
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
+model_path = "tencent/Hy-MT2-1.8B"
 # Load tokenizer
 tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
 Start the vLLM server:
 ```bash
+vllm serve tencent/Hy-MT2-1.8B --tensor-parallel-size 1
 ```
 ### sglang
 Launch SGLang server:
 ```bash
+python3 -m sglang.launch_server --model tencent/Hy-MT2-1.8B --tp 1
 ```