tencent
/

Hy-MT2-1.8B-FP8

hunyuan_v1_dense

compressed-tensors

Model card Files Files and versions

stevenkuang commited on 2 days ago

Commit

355487d

·

verified ·

1 Parent(s): 1be4dbc

Update README.md

Files changed (1) hide show

README.md +39 -0

README.md CHANGED Viewed

@@ -181,6 +181,45 @@ Launch SGLang server:
 python3 -m sglang.launch_server --model tencent/Hy-MT2-1.8B-FP8 --tp 1
 ```
 ## Model Training
 Hy-MT2 provides a complete model training pipeline, supporting both full-parameter fine-tuning and LoRA fine-tuning, as well as multiple DeepSpeed ZeRO configurations and LLaMA-Factory integration.

 python3 -m sglang.launch_server --model tencent/Hy-MT2-1.8B-FP8 --tp 1
 ```
+### llama_cpp
+**❕❕ This gguf depends on our STQ kernel, which is released at [PR #22836](https://github.com/ggml-org/llama.cpp/pull/22836).**
+#### Clone llama.cpp
+```bash
+git clone https://github.com/ggml-org/llama.cpp.git
+```
+#### Enter the llama.cpp folder
+```bash
+cd llama.cpp
+```
+#### Build llama.cpp
+```bash
+cmake -B build
+cmake --build build --config Release
+```
+#### Run a completion example
+```bash
+./build/bin/llama-completion \
+  --model model.gguf  \
+  -p "Translate the following segment into Chinese, without additional explanation：Hello" \
+  --jinja \
+  -ngl 0 \
+  -n 64 -st
+```
+#### Run the llama.cpp benchmark
+```bash
+./build/bin/llama-bench -m model_zoo/model.gguf  -ngl 0
+```
 ## Model Training
 Hy-MT2 provides a complete model training pipeline, supporting both full-parameter fine-tuning and LoRA fine-tuning, as well as multiple DeepSpeed ZeRO configurations and LLaMA-Factory integration.