Update README.md
Browse files
README.md
CHANGED
|
@@ -181,6 +181,45 @@ Launch SGLang server:
|
|
| 181 |
python3 -m sglang.launch_server --model tencent/Hy-MT2-1.8B-FP8 --tp 1
|
| 182 |
```
|
| 183 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 184 |
|
| 185 |
## Model Training
|
| 186 |
Hy-MT2 provides a complete model training pipeline, supporting both full-parameter fine-tuning and LoRA fine-tuning, as well as multiple DeepSpeed ZeRO configurations and LLaMA-Factory integration.
|
|
|
|
| 181 |
python3 -m sglang.launch_server --model tencent/Hy-MT2-1.8B-FP8 --tp 1
|
| 182 |
```
|
| 183 |
|
| 184 |
+
### llama_cpp
|
| 185 |
+
**❕❕ This gguf depends on our STQ kernel, which is released at [PR #22836](https://github.com/ggml-org/llama.cpp/pull/22836).**
|
| 186 |
+
|
| 187 |
+
#### Clone llama.cpp
|
| 188 |
+
|
| 189 |
+
```bash
|
| 190 |
+
git clone https://github.com/ggml-org/llama.cpp.git
|
| 191 |
+
```
|
| 192 |
+
|
| 193 |
+
#### Enter the llama.cpp folder
|
| 194 |
+
|
| 195 |
+
```bash
|
| 196 |
+
cd llama.cpp
|
| 197 |
+
```
|
| 198 |
+
|
| 199 |
+
#### Build llama.cpp
|
| 200 |
+
|
| 201 |
+
```bash
|
| 202 |
+
cmake -B build
|
| 203 |
+
cmake --build build --config Release
|
| 204 |
+
```
|
| 205 |
+
|
| 206 |
+
#### Run a completion example
|
| 207 |
+
|
| 208 |
+
```bash
|
| 209 |
+
./build/bin/llama-completion \
|
| 210 |
+
--model model.gguf \
|
| 211 |
+
-p "Translate the following segment into Chinese, without additional explanation:Hello" \
|
| 212 |
+
--jinja \
|
| 213 |
+
-ngl 0 \
|
| 214 |
+
-n 64 -st
|
| 215 |
+
```
|
| 216 |
+
|
| 217 |
+
#### Run the llama.cpp benchmark
|
| 218 |
+
|
| 219 |
+
```bash
|
| 220 |
+
./build/bin/llama-bench -m model_zoo/model.gguf -ngl 0
|
| 221 |
+
```
|
| 222 |
+
|
| 223 |
|
| 224 |
## Model Training
|
| 225 |
Hy-MT2 provides a complete model training pipeline, supporting both full-parameter fine-tuning and LoRA fine-tuning, as well as multiple DeepSpeed ZeRO configurations and LLaMA-Factory integration.
|