silence09
/

DeepSeek-V4-Pro-Tiny

Model card Files Files and versions

DeepSeek-V4-Pro-Tiny / README.md

silence09's picture

Update README.md

29ba7b4 verified 1 day ago

|

history blame contribute delete

2.46 kB

	---
	license: mit
	base_model:
	- deepseek-ai/DeepSeek-V4-Pro
	---
	# LightWeight DeepSeek-V4-Pro (6 Hidden Layers Version with Smaller Dimensions)

	This project is created using the official DeepSeek-V4-Pro model architecture from [Hugging Face](https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro). It implements a 6-layer version of DeepSeek-V4-Pro with randomly initialized weights and smaller dimensions.

	[中文说明](./README_cn.md)

	## Purpose

	The purpose of these weights is to provide a lightweight implementation for researchers who want to study the model architecture and run locally quickly.

	The original DeepSeek-V4-Pro model requires significant GPU resources and runs on frameworks like vLLM/SGLang and custom kernels written by TileLang, making it difficult to deploy on standard hardware.

	The difference between this model and the original DeepSeek-V4-Pro is shown below:
	```json
	{
	"hidden_size": 500, // Original: 7168
	"moe_intermediate_size": 300, // Original: 3072
	"n_routed_experts": 32, // Original: 384
	"num_hidden_layers": 6 // Original: 61
	}
	```

	## Usage

	```python
	from transformers import AutoConfig, AutoModelForCausalLM
	from transformers import AutoTokenizer
	import torch

	device = "cuda" if torch.cuda.is_available() else "cpu"

	model = AutoModelForCausalLM.from_pretrained(
	'silence09/DeepSeek-V4-Pro-Tiny',
	torch_dtype=torch.bfloat16,
	trust_remote_code=True
	).to(device)
	tokenizer = AutoTokenizer.from_pretrained('silence09/DeepSeek-V4-Pro-Tiny', trust_remote_code=True)

	prompt = "Who are you?"
	prompt_tokens = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
	generated_ids = model.generate(prompt_tokens, max_new_tokens=100, do_sample=False)
	generated_ids = [
	output_ids[len(input_ids):]
	for input_ids, output_ids in zip(prompt_tokens, generated_ids)
	]
	completion = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
	print(completion)
	```

	Note: DeepSeek-V4-Pro requires the latest `transformers` library from source:
	```bash
	pip install git+https://github.com/huggingface/transformers
	```

	## More Info
	It was created using the python script available at [this repository](https://github.com/silencelamb/naked_llama/blob/main/hf_example/create_deepseek_v4_pro_tiny.py), based on the same approach used by [DeepSeek-R1-Small-2layers](https://huggingface.co/silence09/DeepSeek-R1-Small-2layers).