GGUF
conversational
woodchen7 commited on
Commit
de4bf56
·
verified ·
1 Parent(s): 6af0443

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -23,6 +23,7 @@
23
 
24
  ## Model Introduction
25
 
 
26
 
27
  Hy-MT2 is a family of “fast-thinking” multilingual translation models designed for complex real-world scenarios. It includes three model sizes: 1.8B, 7B, and 30B-A3B (MoE), all of which support translation among 33 languages and effectively follow translation instructions in multiple languages.
28
  For on-device deployment, AngelSlim 1.25-bit extreme quantization reduces the storage requirement of the 1.8B model to only 440 MB and improves inference speed by 1.5x.
@@ -155,7 +156,7 @@ python3 -m sglang.launch_server --model tencent/Hy-MT2-30B-A3B --tp 1
155
  ```
156
 
157
  ### llama_cpp
158
- **❕❕ This gguf depends on our STQ kernel, which is released at [PR #22836](https://github.com/ggml-org/llama.cpp/pull/22836).**
159
 
160
  #### Clone llama.cpp
161
 
 
23
 
24
  ## Model Introduction
25
 
26
+ Hy-MT2-1.8B-2Bit-GGUF produced by AngelSlim, For more detailed information, please refer to [[AngelSlim]](https://github.com/Tencent/AngelSlim).
27
 
28
  Hy-MT2 is a family of “fast-thinking” multilingual translation models designed for complex real-world scenarios. It includes three model sizes: 1.8B, 7B, and 30B-A3B (MoE), all of which support translation among 33 languages and effectively follow translation instructions in multiple languages.
29
  For on-device deployment, AngelSlim 1.25-bit extreme quantization reduces the storage requirement of the 1.8B model to only 440 MB and improves inference speed by 1.5x.
 
156
  ```
157
 
158
  ### llama_cpp
159
+ **❕❕ This gguf depends on our STQ kernel, which is released at [PR #19357](https://github.com/ggml-org/llama.cpp/pull/19357).**
160
 
161
  #### Clone llama.cpp
162