--- license: mit base_model: - deepseek-ai/DeepSeek-V4-Pro --- # LightWeight DeepSeek-V4-Pro (6 Hidden Layers Version with Smaller Dimensions) This project is created using the official **DeepSeek-V4-Pro** model architecture from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro). It implements a **6-layer version** of DeepSeek-V4-Pro with randomly initialized weights and smaller dimensions. [中文说明](./README_cn.md) ## Purpose The purpose of these weights is to provide a **lightweight** implementation for researchers who want to **study the model architecture and run locally quickly**. The original **DeepSeek-V4-Pro model** requires significant GPU resources and runs on frameworks like **vLLM/SGLang** and custom kernels written by **TileLang**, making it difficult to deploy on standard hardware. The difference between this model and the original **DeepSeek-V4-Pro** is shown below: ```json { "hidden_size": 500, // Original: 7168 "moe_intermediate_size": 300, // Original: 3072 "n_routed_experts": 32, // Original: 384 "num_hidden_layers": 6 // Original: 61 } ``` ## Usage ```python from transformers import AutoConfig, AutoModelForCausalLM from transformers import AutoTokenizer import torch device = "cuda" if torch.cuda.is_available() else "cpu" model = AutoModelForCausalLM.from_pretrained( 'silence09/DeepSeek-V4-Pro-Tiny', torch_dtype=torch.bfloat16, trust_remote_code=True ).to(device) tokenizer = AutoTokenizer.from_pretrained('silence09/DeepSeek-V4-Pro-Tiny', trust_remote_code=True) prompt = "Who are you?" prompt_tokens = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device) generated_ids = model.generate(prompt_tokens, max_new_tokens=100, do_sample=False) generated_ids = [ output_ids[len(input_ids):] for input_ids, output_ids in zip(prompt_tokens, generated_ids) ] completion = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] print(completion) ``` **Note:** DeepSeek-V4-Pro requires the latest `transformers` library from source: ```bash pip install git+https://github.com/huggingface/transformers ``` ## More Info It was created using the python script available at [this repository](https://github.com/silencelamb/naked_llama/blob/main/hf_example/create_deepseek_v4_pro_tiny.py), based on the same approach used by [DeepSeek-R1-Small-2layers](https://huggingface.co/silence09/DeepSeek-R1-Small-2layers).