Update README.md

29ba7b4 verified 1 day ago

2.46 kB

license: mit
base_model:
  - deepseek-ai/DeepSeek-V4-Pro

LightWeight DeepSeek-V4-Pro (6 Hidden Layers Version with Smaller Dimensions)

This project is created using the official DeepSeek-V4-Pro model architecture from Hugging Face. It implements a 6-layer version of DeepSeek-V4-Pro with randomly initialized weights and smaller dimensions.

中文说明

Purpose

The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run locally quickly.

The original DeepSeek-V4-Pro model requires significant GPU resources and runs on frameworks like vLLM/SGLang and custom kernels written by TileLang, making it difficult to deploy on standard hardware.

The difference between this model and the original DeepSeek-V4-Pro is shown below:

{
    "hidden_size": 500,              // Original: 7168
    "moe_intermediate_size": 300,    // Original: 3072
    "n_routed_experts": 32,          // Original: 384
    "num_hidden_layers": 6           // Original: 61
}

Usage

from transformers import AutoConfig, AutoModelForCausalLM
from transformers import AutoTokenizer
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModelForCausalLM.from_pretrained(
    'silence09/DeepSeek-V4-Pro-Tiny',
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
).to(device)
tokenizer = AutoTokenizer.from_pretrained('silence09/DeepSeek-V4-Pro-Tiny', trust_remote_code=True)

prompt = "Who are you?"
prompt_tokens = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
generated_ids = model.generate(prompt_tokens, max_new_tokens=100, do_sample=False)
generated_ids = [
    output_ids[len(input_ids):]
    for input_ids, output_ids in zip(prompt_tokens, generated_ids)
]
completion = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(completion)

Note: DeepSeek-V4-Pro requires the latest transformers library from source:

pip install git+https://github.com/huggingface/transformers

More Info

It was created using the python script available at this repository, based on the same approach used by DeepSeek-R1-Small-2layers.