File size: 2,460 Bytes
f9a55f2
 
 
 
 
 
 
 
 
 
 
 
 
29ba7b4
 
 
f9a55f2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
license: mit
base_model:
- deepseek-ai/DeepSeek-V4-Pro
---
#  LightWeight DeepSeek-V4-Pro (6 Hidden Layers Version with Smaller Dimensions)

This project is created using the official **DeepSeek-V4-Pro** model architecture from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro). It implements a **6-layer version** of DeepSeek-V4-Pro with randomly initialized weights and smaller dimensions.

[中文说明](./README_cn.md)

## Purpose

The purpose of these weights is to provide a **lightweight** implementation for researchers who want to **study the model architecture and run locally quickly**.

The original **DeepSeek-V4-Pro model** requires significant GPU resources and runs on frameworks like **vLLM/SGLang** and custom kernels written by **TileLang**, making it difficult to deploy on standard hardware.

The difference between this model and the original **DeepSeek-V4-Pro** is shown below:
```json
{
	"hidden_size": 500,              // Original: 7168
	"moe_intermediate_size": 300,    // Original: 3072
	"n_routed_experts": 32,          // Original: 384
	"num_hidden_layers": 6           // Original: 61
}
```

## Usage

```python
from transformers import AutoConfig, AutoModelForCausalLM
from transformers import AutoTokenizer
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModelForCausalLM.from_pretrained(
    'silence09/DeepSeek-V4-Pro-Tiny',
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
).to(device)
tokenizer = AutoTokenizer.from_pretrained('silence09/DeepSeek-V4-Pro-Tiny', trust_remote_code=True)

prompt = "Who are you?"
prompt_tokens = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
generated_ids = model.generate(prompt_tokens, max_new_tokens=100, do_sample=False)
generated_ids = [
    output_ids[len(input_ids):]
    for input_ids, output_ids in zip(prompt_tokens, generated_ids)
]
completion = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(completion)
```

**Note:** DeepSeek-V4-Pro requires the latest `transformers` library from source:
```bash
pip install git+https://github.com/huggingface/transformers
```

## More Info
It was created using the python script available at [this repository](https://github.com/silencelamb/naked_llama/blob/main/hf_example/create_deepseek_v4_pro_tiny.py), based on the same approach used by [DeepSeek-R1-Small-2layers](https://huggingface.co/silence09/DeepSeek-R1-Small-2layers).