How to use from
SGLangUse Docker images
docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "yujiepan/deepseek-v4-tiny-random" \
--host 0.0.0.0 \
--port 30000# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "yujiepan/deepseek-v4-tiny-random",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'Quick Links
Tiny random version of DeepSeek-V4.
In development. Might not work!
vllm serve yujiepan/deepseek-v4-tiny-random \
--trust-remote-code \
--block-size 256 \
--kv-cache-dtype fp8 \
--data-parallel-size 1 \
--max-model-len 12000 \
--gpu-memory-utilization 0.5 \
--max-num-seqs 512 \
--max-num-batched-tokens 512 \
--no-enable-flashinfer-autotune \
--compilation-config '{"mode": 0, "cudagraph_mode": "FULL_DECODE_ONLY"}' \
--tokenizer-mode deepseek_v4 \
--tool-call-parser deepseek_v4 \
--enable-auto-tool-choice \
--reasoning-parser deepseek_v4 \
--speculative_config '{"method":"mtp","num_speculative_tokens":1}'
- Downloads last month
- 451
Model tree for yujiepan/deepseek-v4-tiny-random
Base model
deepseek-ai/DeepSeek-V4-Pro
Install from pip and serve model
# Install SGLang from pip: pip install sglang# Start the SGLang server: python3 -m sglang.launch_server \ --model-path "yujiepan/deepseek-v4-tiny-random" \ --host 0.0.0.0 \ --port 30000# Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yujiepan/deepseek-v4-tiny-random", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'