tencent
/

Hy-MT2-7B

@@ -91,7 +91,7 @@ transformers>=5.6.0
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
-model_path = "tencent/Hy-MT2-30B-A3B"
 # Load tokenizer
 tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
@@ -134,7 +134,7 @@ uv pip install --editable . --torch-backend=auto
 Start the vLLM server:
 ```bash
-vllm serve tencent/Hy-MT2-30B-A3B --tensor-parallel-size 1
 ```
 ### sglang
@@ -151,7 +151,7 @@ pip3 install -e "python"
 Launch SGLang server:
 ```bash
-python3 -m sglang.launch_server --model tencent/Hy-MT2-30B-A3B --tp 1
 ```
 ### llama_cpp
@@ -207,18 +207,6 @@ For 1.8B and 7B, we recommend using the following parameters for inference. Note
 }
 ```
-For 30B-A3B, we recommend using the following parameters for inference. Note that our models do not have a default system_prompt.
-```json
-{
-  "temperature": 0.7,
-  "top_p": 1.0,
-  "top_k": -1,
-  "repetition_penalty": 1.0,
-  "max_tokens": 4096
-}
-```
 ## Model Training
 Hy-MT2 provides a complete model training pipeline, supporting both full-parameter fine-tuning and LoRA fine-tuning, as well as multiple DeepSpeed ZeRO configurations and LLaMA-Factory integration.

 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
+model_path = "tencent/Hy-MT2-7B"
 # Load tokenizer
 tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
 Start the vLLM server:
 ```bash
+vllm serve tencent/Hy-MT2-7B --tensor-parallel-size 1
 ```
 ### sglang
 Launch SGLang server:
 ```bash
+python3 -m sglang.launch_server --model tencent/Hy-MT2-7B --tp 1
 ```
 ### llama_cpp
 }
 ```
 ## Model Training
 Hy-MT2 provides a complete model training pipeline, supporting both full-parameter fine-tuning and LoRA fine-tuning, as well as multiple DeepSpeed ZeRO configurations and LLaMA-Factory integration.