Mirror worker 5
Browse files- encoding/tests/test_output_4.txt +29 -0
- generation_config.json +9 -0
- inference/README.md +26 -0
- inference/convert.py +168 -0
- model-00005-of-00046.safetensors +3 -0
- model-00013-of-00046.safetensors +3 -0
- model-00021-of-00046.safetensors +3 -0
- model-00029-of-00046.safetensors +3 -0
- model-00037-of-00046.safetensors +3 -0
- model-00045-of-00046.safetensors +3 -0
encoding/tests/test_output_4.txt
ADDED
|
@@ -0,0 +1,29 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
<|begin▁of▁sentence|>该助手为DeepSeek-V3,由深度求索公司创造。
|
| 2 |
+
今天是2025年10月17日,星期五。<|latest_reminder|>2024-11-15,上海市,App,中文<|User|>热海大滚锅是世界著名温泉吗<|Assistant|></think>关于热海大滚锅是否是世界著名温泉,可以这样总结:它在**中国乃至全球的地热奇观中占有重要地位**,但“世界著名”的称号更侧重于它作为**独特的地质现象和旅游景点**,而非普遍的温泉疗养体验。
|
| 3 |
+
|
| 4 |
+
为了让你快速了解,我整理了一个简要的表格:
|
| 5 |
+
|
| 6 |
+
| 维度 | 热海大滚锅的地位与特点 |
|
| 7 |
+
| :--- | :--- |
|
| 8 |
+
| **地质奇观** | **世界罕见**的高温地热系统,泉眼多、水温高、形态多样。 |
|
| 9 |
+
| **国内声誉** | **中国三大地热区之一**,**国家5A级旅游景区**,自古闻名(徐霞客曾游历并记载)。 |
|
| 10 |
+
| **国际知名度** | 在一些旅游平台被提及为“世界六大温泉”之一,但此说法流传不广,其国际声誉更多建立在地质独特性上。 |
|
| 11 |
+
| **核心体验** | **观赏地热奇观**(如97℃沸腾的“大滚锅”)、**体验温泉煮鸡蛋**。 |
|
| 12 |
+
|
| 13 |
+
### 💡 游玩攻略与温馨提示
|
| 14 |
+
|
| 15 |
+
如果你计划前往热海大滚锅,这里有一些实用信息供你参考:
|
| 16 |
+
|
| 17 |
+
- **门票与开放时间**:
|
| 18 |
+
- **门票**:景区门票约为**50元/人**。如果选择包含温泉沐浴的套餐,价格会更高,例如约**288元**。
|
| 19 |
+
- **开放时间**:景区一般**08:00-18:00**开放,但具体时间可能变动,建议提前核实。
|
| 20 |
+
|
| 21 |
+
- **特色体验**:
|
| 22 |
+
- **温泉煮鸡蛋**:这几乎是必试项目。可以在景区门口购买用草绳串起的生鸡蛋(约5-8元/串),然后到“大滚锅”旁的指定区域蒸煮,几分钟便可熟食,趣味十足。
|
| 23 |
+
- **金汤足浴**:可以直接用从“大滚锅”流出的温泉水泡脚,缓解旅途疲劳。
|
| 24 |
+
|
| 25 |
+
- **注意事项**:
|
| 26 |
+
- **安全第一**:“大滚锅”水温极高,务必遵守游览规则,在指定区域内观赏,切勿随意触碰泉水。
|
| 27 |
+
- **规划行程**:建议为热海景区预留**3-4小时**的游览时间。景区内步道不走回头路,出入口有观光车接送。
|
| 28 |
+
|
| 29 |
+
希望这些信息能帮助你更好地了解热海大滚锅。如果你对腾冲的其他景点或者行程规划有更多疑问,我很乐意提供进一步的信息。<|end▁of▁sentence|><|User|>世界著名温泉有哪些<|Assistant|></think><|action|>Search<|end▁of▁sentence|>
|
generation_config.json
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_from_model_config": true,
|
| 3 |
+
"bos_token_id": 0,
|
| 4 |
+
"eos_token_id": 1,
|
| 5 |
+
"do_sample": true,
|
| 6 |
+
"temperature": 1.0,
|
| 7 |
+
"top_p": 1.0,
|
| 8 |
+
"transformers_version": "4.46.3"
|
| 9 |
+
}
|
inference/README.md
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Inference code for DeepSeek models
|
| 2 |
+
|
| 3 |
+
First convert huggingface model weight files to the format of this project.
|
| 4 |
+
```bash
|
| 5 |
+
export EXPERTS=256
|
| 6 |
+
export MP=4
|
| 7 |
+
export CONFIG=config.json
|
| 8 |
+
python convert.py --hf-ckpt-path ${HF_CKPT_PATH} --save-path ${SAVE_PATH} --n-experts ${EXPERTS} --model-parallel ${MP}
|
| 9 |
+
```
|
| 10 |
+
|
| 11 |
+
Then chat with DeepSeek model at will!
|
| 12 |
+
```bash
|
| 13 |
+
torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --interactive
|
| 14 |
+
```
|
| 15 |
+
|
| 16 |
+
Or batch inference from file.
|
| 17 |
+
```bash
|
| 18 |
+
torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --input-file ${FILE}
|
| 19 |
+
```
|
| 20 |
+
|
| 21 |
+
Or multi nodes inference.
|
| 22 |
+
```bash
|
| 23 |
+
torchrun --nnodes ${NODES} --nproc-per-node $((MP / NODES)) --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --input-file ${FILE}
|
| 24 |
+
```
|
| 25 |
+
|
| 26 |
+
If you want to use fp8, just remove `"expert_dtype": "fp4"` in `config.json` and specify `--expert-dtype fp8` in `convert.py`.
|
inference/convert.py
ADDED
|
@@ -0,0 +1,168 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import shutil
|
| 3 |
+
from argparse import ArgumentParser
|
| 4 |
+
from glob import glob
|
| 5 |
+
from tqdm import tqdm, trange
|
| 6 |
+
|
| 7 |
+
import torch
|
| 8 |
+
from safetensors.torch import safe_open, save_file
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
FP4_TABLE = torch.tensor([
|
| 12 |
+
0.0, 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 6.0,
|
| 13 |
+
0.0, -0.5, -1.0, -1.5, -2.0, -3.0, -4.0, -6.0
|
| 14 |
+
], dtype=torch.float32)
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
def cast_e2m1fn_to_e4m3fn(x: torch.Tensor, scale: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
|
| 18 |
+
"""
|
| 19 |
+
Casts a tensor from e2m1fn to e4m3fn losslessly.
|
| 20 |
+
"""
|
| 21 |
+
assert x.dtype == torch.int8
|
| 22 |
+
assert x.ndim == 2
|
| 23 |
+
out_dim, in_dim = x.size()
|
| 24 |
+
in_dim *= 2
|
| 25 |
+
fp8_block_size = 128
|
| 26 |
+
fp4_block_size = 32
|
| 27 |
+
assert in_dim % fp8_block_size == 0 and out_dim % fp8_block_size == 0
|
| 28 |
+
assert scale.size(0) == out_dim and scale.size(1) == in_dim // fp4_block_size
|
| 29 |
+
|
| 30 |
+
x = x.view(torch.uint8)
|
| 31 |
+
low = x & 0x0F
|
| 32 |
+
high = (x >> 4) & 0x0F
|
| 33 |
+
x = torch.stack([FP4_TABLE[low.long()], FP4_TABLE[high.long()]], dim=-1).flatten(2)
|
| 34 |
+
|
| 35 |
+
# max_fp4 (6.0) * MAX_OFFSET must fit in e4m3fn (max 448)
|
| 36 |
+
# 6.0 * 2^6 = 384 < 448; 6.0 * 2^7 = 768 > 448; so MAX_OFFSET_BITS = 6
|
| 37 |
+
MAX_OFFSET_BITS = 6
|
| 38 |
+
|
| 39 |
+
bOut = out_dim // fp8_block_size
|
| 40 |
+
bIn = in_dim // fp8_block_size
|
| 41 |
+
# bOut, bIn, 128, 128
|
| 42 |
+
x = x.view(bOut, fp8_block_size, bIn, fp8_block_size).transpose(1, 2)
|
| 43 |
+
# bOut, bIn, 128*4
|
| 44 |
+
scale = scale.float().view(bOut, fp8_block_size, bIn, -1).transpose(1, 2).flatten(2)
|
| 45 |
+
## bOut, bIn, 1
|
| 46 |
+
scale_max_offset_bits = scale.amax(dim=-1, keepdim=True) / (2**MAX_OFFSET_BITS)
|
| 47 |
+
# bOut, bIn, 128*4
|
| 48 |
+
offset = scale / scale_max_offset_bits
|
| 49 |
+
# bOut, bIn, 128, 128
|
| 50 |
+
offset = offset.unflatten(-1, (fp8_block_size, -1)).repeat_interleave(fp4_block_size, dim=-1)
|
| 51 |
+
x = (x * offset).transpose(1, 2).reshape(out_dim, in_dim)
|
| 52 |
+
return x.to(torch.float8_e4m3fn), scale_max_offset_bits.squeeze(-1).to(torch.float8_e8m0fnu)
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
mapping = {
|
| 56 |
+
"embed_tokens": ("embed", 0),
|
| 57 |
+
"input_layernorm": ("attn_norm", None),
|
| 58 |
+
"post_attention_layernorm": ("ffn_norm", None),
|
| 59 |
+
"q_proj": ("wq", 0),
|
| 60 |
+
"q_a_proj": ("wq_a", None),
|
| 61 |
+
"q_a_layernorm": ("q_norm", None),
|
| 62 |
+
"q_b_proj": ("wq_b", 0),
|
| 63 |
+
"kv_a_proj_with_mqa": ("wkv_a", None),
|
| 64 |
+
"kv_a_layernorm": ("kv_norm", None),
|
| 65 |
+
"kv_b_proj": ("wkv_b", 0),
|
| 66 |
+
"o_proj": ("wo", 1),
|
| 67 |
+
"gate_proj": ("w1", 0),
|
| 68 |
+
"down_proj": ("w2", 1),
|
| 69 |
+
"up_proj": ("w3", 0),
|
| 70 |
+
"lm_head": ("head", 0),
|
| 71 |
+
|
| 72 |
+
"embed": ("embed", 0),
|
| 73 |
+
"wq_b": ("wq_b", 0),
|
| 74 |
+
"wo_a": ("wo_a", 0),
|
| 75 |
+
"wo_b": ("wo_b", 1),
|
| 76 |
+
"head": ("head", 0),
|
| 77 |
+
"attn_sink": ("attn_sink", 0),
|
| 78 |
+
"weights_proj": ("weights_proj", 0),
|
| 79 |
+
}
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
def main(hf_ckpt_path, save_path, n_experts, mp, expert_dtype):
|
| 83 |
+
"""
|
| 84 |
+
Converts and saves model checkpoint files into a specified format.
|
| 85 |
+
|
| 86 |
+
Args:
|
| 87 |
+
hf_ckpt_path (str): Path to the directory containing the input checkpoint files.
|
| 88 |
+
save_path (str): Path to the directory where the converted checkpoint files will be saved.
|
| 89 |
+
n_experts (int): Total number of experts in the model.
|
| 90 |
+
mp (int): Model parallelism factor.
|
| 91 |
+
|
| 92 |
+
Returns:
|
| 93 |
+
None
|
| 94 |
+
"""
|
| 95 |
+
torch.set_num_threads(8)
|
| 96 |
+
n_local_experts = n_experts // mp
|
| 97 |
+
state_dicts = [{} for _ in range(mp)]
|
| 98 |
+
|
| 99 |
+
for file_path in tqdm(glob(os.path.join(hf_ckpt_path, "*.safetensors"))):
|
| 100 |
+
with safe_open(file_path, framework="pt", device="cpu") as f:
|
| 101 |
+
for name in f.keys():
|
| 102 |
+
param: torch.Tensor = f.get_tensor(name)
|
| 103 |
+
if name.startswith("model."):
|
| 104 |
+
name = name[len("model."):]
|
| 105 |
+
if name.startswith("mtp.") and ("emb" in name or name.endswith("head.weight")):
|
| 106 |
+
continue
|
| 107 |
+
name = name.replace("self_attn", "attn")
|
| 108 |
+
name = name.replace("mlp", "ffn")
|
| 109 |
+
name = name.replace("weight_scale_inv", "scale")
|
| 110 |
+
name = name.replace("e_score_correction_bias", "bias")
|
| 111 |
+
if any(x in name for x in ["hc", "attn_sink", "tie2eid", "ape"]): # without .weight
|
| 112 |
+
key = name.split(".")[-1]
|
| 113 |
+
else:
|
| 114 |
+
key = name.split(".")[-2]
|
| 115 |
+
if key in mapping:
|
| 116 |
+
new_key, dim = mapping[key]
|
| 117 |
+
else:
|
| 118 |
+
new_key, dim = key, None
|
| 119 |
+
name = name.replace(key, new_key)
|
| 120 |
+
for i in range(mp):
|
| 121 |
+
new_param = param
|
| 122 |
+
if "experts" in name and "shared_experts" not in name:
|
| 123 |
+
idx = int(name.split(".")[-3])
|
| 124 |
+
if idx < i * n_local_experts or idx >= (i + 1) * n_local_experts:
|
| 125 |
+
continue
|
| 126 |
+
elif dim is not None:
|
| 127 |
+
assert param.size(dim) % mp == 0, f"Dimension {dim} must be divisible by {mp}"
|
| 128 |
+
shard_size = param.size(dim) // mp
|
| 129 |
+
new_param = param.narrow(dim, i * shard_size, shard_size).contiguous()
|
| 130 |
+
state_dicts[i][name] = new_param
|
| 131 |
+
|
| 132 |
+
os.makedirs(save_path, exist_ok=True)
|
| 133 |
+
|
| 134 |
+
for i in trange(mp):
|
| 135 |
+
names = list(state_dicts[i].keys())
|
| 136 |
+
for name in names:
|
| 137 |
+
if name.endswith("wo_a.weight"):
|
| 138 |
+
weight = state_dicts[i][name]
|
| 139 |
+
scale = state_dicts[i].pop(name.replace("weight", "scale"))
|
| 140 |
+
weight = weight.unflatten(0, (-1, 128)).unflatten(-1, (-1, 128)).float() * scale[:, None, :, None].float()
|
| 141 |
+
state_dicts[i][name] = weight.flatten(2, 3).flatten(0, 1).bfloat16()
|
| 142 |
+
elif "experts" in name and state_dicts[i][name].dtype == torch.int8:
|
| 143 |
+
if expert_dtype == "fp8":
|
| 144 |
+
scale_name = name.replace("weight", "scale")
|
| 145 |
+
weight = state_dicts[i].pop(name)
|
| 146 |
+
scale = state_dicts[i].pop(scale_name)
|
| 147 |
+
state_dicts[i][name], state_dicts[i][scale_name] = cast_e2m1fn_to_e4m3fn(weight, scale)
|
| 148 |
+
else:
|
| 149 |
+
state_dicts[i][name] = state_dicts[i][name].view(torch.float4_e2m1fn_x2)
|
| 150 |
+
save_file(state_dicts[i], os.path.join(save_path, f"model{i}-mp{mp}.safetensors"))
|
| 151 |
+
|
| 152 |
+
for file in ["tokenizer.json", "tokenizer_config.json"]:
|
| 153 |
+
old_file_path = os.path.join(hf_ckpt_path, file)
|
| 154 |
+
new_file_path = os.path.join(save_path, file)
|
| 155 |
+
if os.path.exists(old_file_path):
|
| 156 |
+
shutil.copyfile(old_file_path, new_file_path)
|
| 157 |
+
|
| 158 |
+
|
| 159 |
+
if __name__ == "__main__":
|
| 160 |
+
parser = ArgumentParser()
|
| 161 |
+
parser.add_argument("--hf-ckpt-path", type=str, required=True)
|
| 162 |
+
parser.add_argument("--save-path", type=str, required=True)
|
| 163 |
+
parser.add_argument("--n-experts", type=int, required=True)
|
| 164 |
+
parser.add_argument("--model-parallel", type=int, required=True)
|
| 165 |
+
parser.add_argument("--expert-dtype", type=str, choices=["fp8", "fp4"], required=False, default=None)
|
| 166 |
+
args = parser.parse_args()
|
| 167 |
+
assert args.n_experts % args.model_parallel == 0, "Number of experts must be divisible by model parallelism"
|
| 168 |
+
main(args.hf_ckpt_path, args.save_path, args.n_experts, args.model_parallel, args.expert_dtype)
|
model-00005-of-00046.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9fda158bc636215aea4f6834821c81f59eea3733223c874ab66b9f3d6740c4c1
|
| 3 |
+
size 3568768976
|
model-00013-of-00046.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:47c5e416b60b9bef9e9005cdad9c991a306ab2dd25a95e1994dda30bd4011905
|
| 3 |
+
size 3568770544
|
model-00021-of-00046.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f270bf4d0f0067165020baf3c11264a177182918c1ebeec21d2bf33166b44592
|
| 3 |
+
size 3568770544
|
model-00029-of-00046.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d10bf34c789f9294d2cc50b695d259dc1d0d5b2303105329be370eb55f0fd882
|
| 3 |
+
size 3568770544
|
model-00037-of-00046.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:93d68bcfc36fdf239f901653c0e96c5d45d8fce4f5be633bbbf93cc75067ec5d
|
| 3 |
+
size 3568770544
|
model-00045-of-00046.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9a0fd242134e9ebe4e6993a7631692944838e4fdf20067b3219caa48eab68045
|
| 3 |
+
size 1059332516
|