danielhanchen commited on
Commit
05766f7
·
verified ·
1 Parent(s): 6c58e49

Mirror worker 5

Browse files
encoding/tests/test_output_4.txt ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <|begin▁of▁sentence|>该助手为DeepSeek-V3,由深度求索公司创造。
2
+ 今天是2025年10月17日,星期五。<|latest_reminder|>2024-11-15,上海市,App,中文<|User|>热海大滚锅是世界著名温泉吗<|Assistant|></think>关于热海大滚锅是否是世界著名温泉,可以这样总结:它在**中国乃至全球的地热奇观中占有重要地位**,但“世界著名”的称号更侧重于它作为**独特的地质现象和旅游景点**,而非普遍的温泉疗养体验。
3
+
4
+ 为了让你快速了解,我整理了一个简要的表格:
5
+
6
+ | 维度 | 热海大滚锅的地位与特点 |
7
+ | :--- | :--- |
8
+ | **地质奇观** | **世界罕见**的高温地热系统,泉眼多、水温高、形态多样。 |
9
+ | **国内声誉** | **中国三大地热区之一**,**国家5A级旅游景区**,自古闻名(徐霞客曾游历并记载)。 |
10
+ | **国际知名度** | 在一些旅游平台被提及为“世界六大温泉”之一,但此说法流传不广,其国际声誉更多建立在地质独特性上。 |
11
+ | **核心体验** | **观赏地热奇观**(如97℃沸腾的“大滚锅”)、**体验温泉煮鸡蛋**。 |
12
+
13
+ ### 💡 游玩攻略与温馨提示
14
+
15
+ 如果你计划前往热海大滚锅,这里有一些实用信息供你参考:
16
+
17
+ - **门票与开放时间**:
18
+ - **门票**:景区门票约为**50元/人**。如果选择包含温泉沐浴的套餐,价格会更高,例如约**288元**。
19
+ - **开放时间**:景区一般**08:00-18:00**开放,但具体时间可能变动,建议提前核实。
20
+
21
+ - **特色体验**:
22
+ - **温泉煮鸡蛋**:这几乎是必试项目。可以在景区门口购买用草绳串起的生鸡蛋(约5-8元/串),然后到“大滚锅”旁的指定区域蒸煮,几分钟便可熟食,趣味十足。
23
+ - **金汤足浴**:可以直接用从“大滚锅”流出的温泉水泡脚,缓解旅途疲劳。
24
+
25
+ - **注意事项**:
26
+ - **安全第一**:“大滚锅”水温极高,务必遵守游览规则,在指定区域内观赏,切勿随意触碰泉水。
27
+ - **规划行程**:建议为热海景区预留**3-4小时**的游览时间。景区内步道不走回头路,出入口有观光车接送。
28
+
29
+ 希望这些信息能帮助你更好地了解热海大滚锅。如果你对腾冲的其他景点或者行程规划有更多疑问,我很乐意提供进一步的信息。<|end▁of▁sentence|><|User|>世界著名温泉有哪些<|Assistant|></think><|action|>Search<|end▁of▁sentence|>
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 1,
5
+ "do_sample": true,
6
+ "temperature": 1.0,
7
+ "top_p": 1.0,
8
+ "transformers_version": "4.46.3"
9
+ }
inference/README.md ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Inference code for DeepSeek models
2
+
3
+ First convert huggingface model weight files to the format of this project.
4
+ ```bash
5
+ export EXPERTS=256
6
+ export MP=4
7
+ export CONFIG=config.json
8
+ python convert.py --hf-ckpt-path ${HF_CKPT_PATH} --save-path ${SAVE_PATH} --n-experts ${EXPERTS} --model-parallel ${MP}
9
+ ```
10
+
11
+ Then chat with DeepSeek model at will!
12
+ ```bash
13
+ torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --interactive
14
+ ```
15
+
16
+ Or batch inference from file.
17
+ ```bash
18
+ torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --input-file ${FILE}
19
+ ```
20
+
21
+ Or multi nodes inference.
22
+ ```bash
23
+ torchrun --nnodes ${NODES} --nproc-per-node $((MP / NODES)) --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --input-file ${FILE}
24
+ ```
25
+
26
+ If you want to use fp8, just remove `"expert_dtype": "fp4"` in `config.json` and specify `--expert-dtype fp8` in `convert.py`.
inference/convert.py ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import shutil
3
+ from argparse import ArgumentParser
4
+ from glob import glob
5
+ from tqdm import tqdm, trange
6
+
7
+ import torch
8
+ from safetensors.torch import safe_open, save_file
9
+
10
+
11
+ FP4_TABLE = torch.tensor([
12
+ 0.0, 0.5, 1.0, 1.5, 2.0, 3.0, 4.0, 6.0,
13
+ 0.0, -0.5, -1.0, -1.5, -2.0, -3.0, -4.0, -6.0
14
+ ], dtype=torch.float32)
15
+
16
+
17
+ def cast_e2m1fn_to_e4m3fn(x: torch.Tensor, scale: torch.Tensor) -> tuple[torch.Tensor, torch.Tensor]:
18
+ """
19
+ Casts a tensor from e2m1fn to e4m3fn losslessly.
20
+ """
21
+ assert x.dtype == torch.int8
22
+ assert x.ndim == 2
23
+ out_dim, in_dim = x.size()
24
+ in_dim *= 2
25
+ fp8_block_size = 128
26
+ fp4_block_size = 32
27
+ assert in_dim % fp8_block_size == 0 and out_dim % fp8_block_size == 0
28
+ assert scale.size(0) == out_dim and scale.size(1) == in_dim // fp4_block_size
29
+
30
+ x = x.view(torch.uint8)
31
+ low = x & 0x0F
32
+ high = (x >> 4) & 0x0F
33
+ x = torch.stack([FP4_TABLE[low.long()], FP4_TABLE[high.long()]], dim=-1).flatten(2)
34
+
35
+ # max_fp4 (6.0) * MAX_OFFSET must fit in e4m3fn (max 448)
36
+ # 6.0 * 2^6 = 384 < 448; 6.0 * 2^7 = 768 > 448; so MAX_OFFSET_BITS = 6
37
+ MAX_OFFSET_BITS = 6
38
+
39
+ bOut = out_dim // fp8_block_size
40
+ bIn = in_dim // fp8_block_size
41
+ # bOut, bIn, 128, 128
42
+ x = x.view(bOut, fp8_block_size, bIn, fp8_block_size).transpose(1, 2)
43
+ # bOut, bIn, 128*4
44
+ scale = scale.float().view(bOut, fp8_block_size, bIn, -1).transpose(1, 2).flatten(2)
45
+ ## bOut, bIn, 1
46
+ scale_max_offset_bits = scale.amax(dim=-1, keepdim=True) / (2**MAX_OFFSET_BITS)
47
+ # bOut, bIn, 128*4
48
+ offset = scale / scale_max_offset_bits
49
+ # bOut, bIn, 128, 128
50
+ offset = offset.unflatten(-1, (fp8_block_size, -1)).repeat_interleave(fp4_block_size, dim=-1)
51
+ x = (x * offset).transpose(1, 2).reshape(out_dim, in_dim)
52
+ return x.to(torch.float8_e4m3fn), scale_max_offset_bits.squeeze(-1).to(torch.float8_e8m0fnu)
53
+
54
+
55
+ mapping = {
56
+ "embed_tokens": ("embed", 0),
57
+ "input_layernorm": ("attn_norm", None),
58
+ "post_attention_layernorm": ("ffn_norm", None),
59
+ "q_proj": ("wq", 0),
60
+ "q_a_proj": ("wq_a", None),
61
+ "q_a_layernorm": ("q_norm", None),
62
+ "q_b_proj": ("wq_b", 0),
63
+ "kv_a_proj_with_mqa": ("wkv_a", None),
64
+ "kv_a_layernorm": ("kv_norm", None),
65
+ "kv_b_proj": ("wkv_b", 0),
66
+ "o_proj": ("wo", 1),
67
+ "gate_proj": ("w1", 0),
68
+ "down_proj": ("w2", 1),
69
+ "up_proj": ("w3", 0),
70
+ "lm_head": ("head", 0),
71
+
72
+ "embed": ("embed", 0),
73
+ "wq_b": ("wq_b", 0),
74
+ "wo_a": ("wo_a", 0),
75
+ "wo_b": ("wo_b", 1),
76
+ "head": ("head", 0),
77
+ "attn_sink": ("attn_sink", 0),
78
+ "weights_proj": ("weights_proj", 0),
79
+ }
80
+
81
+
82
+ def main(hf_ckpt_path, save_path, n_experts, mp, expert_dtype):
83
+ """
84
+ Converts and saves model checkpoint files into a specified format.
85
+
86
+ Args:
87
+ hf_ckpt_path (str): Path to the directory containing the input checkpoint files.
88
+ save_path (str): Path to the directory where the converted checkpoint files will be saved.
89
+ n_experts (int): Total number of experts in the model.
90
+ mp (int): Model parallelism factor.
91
+
92
+ Returns:
93
+ None
94
+ """
95
+ torch.set_num_threads(8)
96
+ n_local_experts = n_experts // mp
97
+ state_dicts = [{} for _ in range(mp)]
98
+
99
+ for file_path in tqdm(glob(os.path.join(hf_ckpt_path, "*.safetensors"))):
100
+ with safe_open(file_path, framework="pt", device="cpu") as f:
101
+ for name in f.keys():
102
+ param: torch.Tensor = f.get_tensor(name)
103
+ if name.startswith("model."):
104
+ name = name[len("model."):]
105
+ if name.startswith("mtp.") and ("emb" in name or name.endswith("head.weight")):
106
+ continue
107
+ name = name.replace("self_attn", "attn")
108
+ name = name.replace("mlp", "ffn")
109
+ name = name.replace("weight_scale_inv", "scale")
110
+ name = name.replace("e_score_correction_bias", "bias")
111
+ if any(x in name for x in ["hc", "attn_sink", "tie2eid", "ape"]): # without .weight
112
+ key = name.split(".")[-1]
113
+ else:
114
+ key = name.split(".")[-2]
115
+ if key in mapping:
116
+ new_key, dim = mapping[key]
117
+ else:
118
+ new_key, dim = key, None
119
+ name = name.replace(key, new_key)
120
+ for i in range(mp):
121
+ new_param = param
122
+ if "experts" in name and "shared_experts" not in name:
123
+ idx = int(name.split(".")[-3])
124
+ if idx < i * n_local_experts or idx >= (i + 1) * n_local_experts:
125
+ continue
126
+ elif dim is not None:
127
+ assert param.size(dim) % mp == 0, f"Dimension {dim} must be divisible by {mp}"
128
+ shard_size = param.size(dim) // mp
129
+ new_param = param.narrow(dim, i * shard_size, shard_size).contiguous()
130
+ state_dicts[i][name] = new_param
131
+
132
+ os.makedirs(save_path, exist_ok=True)
133
+
134
+ for i in trange(mp):
135
+ names = list(state_dicts[i].keys())
136
+ for name in names:
137
+ if name.endswith("wo_a.weight"):
138
+ weight = state_dicts[i][name]
139
+ scale = state_dicts[i].pop(name.replace("weight", "scale"))
140
+ weight = weight.unflatten(0, (-1, 128)).unflatten(-1, (-1, 128)).float() * scale[:, None, :, None].float()
141
+ state_dicts[i][name] = weight.flatten(2, 3).flatten(0, 1).bfloat16()
142
+ elif "experts" in name and state_dicts[i][name].dtype == torch.int8:
143
+ if expert_dtype == "fp8":
144
+ scale_name = name.replace("weight", "scale")
145
+ weight = state_dicts[i].pop(name)
146
+ scale = state_dicts[i].pop(scale_name)
147
+ state_dicts[i][name], state_dicts[i][scale_name] = cast_e2m1fn_to_e4m3fn(weight, scale)
148
+ else:
149
+ state_dicts[i][name] = state_dicts[i][name].view(torch.float4_e2m1fn_x2)
150
+ save_file(state_dicts[i], os.path.join(save_path, f"model{i}-mp{mp}.safetensors"))
151
+
152
+ for file in ["tokenizer.json", "tokenizer_config.json"]:
153
+ old_file_path = os.path.join(hf_ckpt_path, file)
154
+ new_file_path = os.path.join(save_path, file)
155
+ if os.path.exists(old_file_path):
156
+ shutil.copyfile(old_file_path, new_file_path)
157
+
158
+
159
+ if __name__ == "__main__":
160
+ parser = ArgumentParser()
161
+ parser.add_argument("--hf-ckpt-path", type=str, required=True)
162
+ parser.add_argument("--save-path", type=str, required=True)
163
+ parser.add_argument("--n-experts", type=int, required=True)
164
+ parser.add_argument("--model-parallel", type=int, required=True)
165
+ parser.add_argument("--expert-dtype", type=str, choices=["fp8", "fp4"], required=False, default=None)
166
+ args = parser.parse_args()
167
+ assert args.n_experts % args.model_parallel == 0, "Number of experts must be divisible by model parallelism"
168
+ main(args.hf_ckpt_path, args.save_path, args.n_experts, args.model_parallel, args.expert_dtype)
model-00005-of-00046.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9fda158bc636215aea4f6834821c81f59eea3733223c874ab66b9f3d6740c4c1
3
+ size 3568768976
model-00013-of-00046.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47c5e416b60b9bef9e9005cdad9c991a306ab2dd25a95e1994dda30bd4011905
3
+ size 3568770544
model-00021-of-00046.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f270bf4d0f0067165020baf3c11264a177182918c1ebeec21d2bf33166b44592
3
+ size 3568770544
model-00029-of-00046.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d10bf34c789f9294d2cc50b695d259dc1d0d5b2303105329be370eb55f0fd882
3
+ size 3568770544
model-00037-of-00046.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93d68bcfc36fdf239f901653c0e96c5d45d8fce4f5be633bbbf93cc75067ec5d
3
+ size 3568770544
model-00045-of-00046.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9a0fd242134e9ebe4e6993a7631692944838e4fdf20067b3219caa48eab68045
3
+ size 1059332516