| --- |
| license: mit |
| base_model: |
| - deepseek-ai/DeepSeek-V4-Pro |
| --- |
| # 轻量版 DeepSeek-V4-Pro (6 Hidden Layers, 维度缩小版) |
|
|
| 本项目使用官方的 [**DeepSeek-V4-Pro**](https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro) 模型架构创建。实现了一个 **6 层版本**的 DeepSeek-V4-Pro,权重随机初始化,并缩小了部分维度。 |
|
|
| [English README](./README.md) |
|
|
| ## 目的 |
| 为研究人员提供一个**轻量级实现**,方便在**有限硬件资源下研究和快速本地运行**。 |
|
|
| 原始 **DeepSeek-V4-Pro** 需要大量 GPU 资源,基于 **vLLM/SGLang** 框架和**TileLang**编写的定制Kernel运行,难以在普通硬件上部署。 |
|
|
| 此模型与原始 **DeepSeek-V4-Pro** 的区别如下: |
| ```json |
| { |
| "hidden_size": 500, // 原始: 7168 |
| "moe_intermediate_size": 300, // 原始: 3072 |
| "n_routed_experts": 32, // 原始: 384 |
| "num_hidden_layers": 6 // 原始: 61 |
| } |
| ``` |
|
|
| ## 使用示例 |
|
|
| ```python |
| from transformers import AutoConfig, AutoModelForCausalLM |
| from transformers import AutoTokenizer |
| import torch |
| |
| device = "cuda" if torch.cuda.is_available() else "cpu" |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| 'silence09/DeepSeek-V4-Pro-Tiny', |
| torch_dtype=torch.bfloat16, |
| trust_remote_code=True |
| ).to(device) |
| tokenizer = AutoTokenizer.from_pretrained('silence09/DeepSeek-V4-Pro-Tiny', trust_remote_code=True) |
| |
| prompt = "Who are you?" |
| prompt_tokens = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device) |
| generated_ids = model.generate(prompt_tokens, max_new_tokens=100, do_sample=False) |
| generated_ids = [ |
| output_ids[len(input_ids):] |
| for input_ids, output_ids in zip(prompt_tokens, generated_ids) |
| ] |
| completion = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] |
| print(completion) |
| ``` |
|
|
| **注意:** DeepSeek-V4-Pro 需要使用源码安装的最新版 `transformers`: |
| ```bash |
| pip install git+https://github.com/huggingface/transformers |
| ``` |
|
|
| ## 更多信息 |
| 创建脚本见 [此仓库](https://github.com/silencelamb/naked_llama/blob/main/hf_example/create_deepseek_v4_pro_tiny.py),参考了 [DeepSeek-R1-Small-2layers](https://huggingface.co/silence09/DeepSeek-R1-Small-2layers) 的实现方式。 |
|
|