mlx-community
/

DeepSeek-V4-Flash-2bit-DQ

Text Generation

4-bit precision

Model card Files Files and versions

DeepSeek-V4-Flash-2bit-DQ / README.md

prince-canuma's picture

Upload README.md with huggingface_hub

722bf55 verified 12 days ago

|

history blame contribute delete

953 Bytes

	---
	language: en
	tags:
	- mlx
	library_name: mlx
	pipeline_tag: text-generation
	---

	# mlx-community/DeepSeek-V4-Flash-2bit-DQ

	Made possible by [Lambda.ai](https://huggingface.co/lambda) ❤️

	DeepSeek-V4-Flash-2bit-DQ uses a dynamic mixed-precision quantization policy. Most routed MoE expert weights are packed to 2-bit, while sensitive layers and projections remain in higher-quality 4-bit, 6-bit or 8-bit quantization. This keeps memory use much lower than the baseline 4-bit checkpoint.

	## Use with mlx

	```bash
	pip install mlx-lm
	```

	```python
	from mlx_lm import load, generate

	model, tokenizer = load("mlx-community/DeepSeek-V4-Flash-2bit-DQ")

	prompt = "hello"

	if tokenizer.chat_template is not None:
	messages = [{"role": "user", "content": prompt}]
	prompt = tokenizer.apply_chat_template(
	messages, add_generation_prompt=True, return_dict=False,
	)

	response = generate(model, tokenizer, prompt=prompt, verbose=True)
	```